[Wien] Re: Strange problem: Different Energies, same machine

Peter Blaha pblaha at theochem.tuwien.ac.at
Wed Mar 15 13:08:00 CET 2006


This looks very much as the    NFS-bug  reported previously by L.Marks 
(we also experienced this in our group).
For some reason the ifort 9.0 seems to have problems with writing the 
outputfiles in a way, that they are properly communicated to the 
NFS server/client. 
It seems to occur only with ifort 9, together with certain versions of 
Linux Kernels (probably they have been solved very recently, but I'm not 
sure about that).

We have e.g. verified this but using

ssh node1 "cd xxx; head -1 xxx.vsp"

this may yield that we have iteration 15, while the same command using 
node2 may still yield the old iteration number 1.

In the latest WIEN2k_06 version I've added a few commands to work against 
this LINUX/ifort problem (e.g. by   rm case.vsp,...) but unfortunately so 
far I could not make it completely save and we adopted an internal 
strategy using special servers (mostly an old Linux one) to avoid these 
problems.

If anybody knows a definite strategy (eg. upgrade all machines to Kernel 
XXX) I'd appreciate the reports. (I don't want to upgrade all my machines 
just because it MAY solve the problems)

PS: It is ok that on a modern Intel cpu the mkl8 and ifort9 may lead 
to a large speadup.

> I recompiled WIEN2k_04 using intel mkl 8.0.1 and ifort 9.0 on a linux cluster (Intel EM64T Xeon) with
>  16 nodes (2 processors per node) after I increased NMATMAX. I successfully recompiled by exactly 
> following the instructions by Gerhard H. Fecher given in FAQ using siteconfig_lapw. 
>  After recompilation I tested it. In serial mode (calculation on a single node), I get the same 
> energies  for WIEN before and after compilation.
>  
>  But if I run the recompiled WIEN for parallel k-points using (.machines), the energies that I get
>  are completely different from the original except the very first energy (ITERATION 1). 
>  Kindly look at the energies below. I did not change anything in lapw1para etc. before or after recompilation.
>  
> I must state however that the computing time is reduced by about 50% after recompilation (why?) 
>  
> serial mode energies after recompilations (Same as energies before recompilation)
> 
> :ENE  : ********** TOTAL ENERGY IN Ry =       -59384.060785
> :ENE  : ********** TOTAL ENERGY IN Ry =       -59384.056999
> :ENE  : ********** TOTAL ENERGY IN Ry =       -59384.022748
> :ENE  : ********** TOTAL ENERGY IN Ry =       -59384.023829
> :ENE  : ********** TOTAL ENERGY IN Ry =       -59384.021702
> :ENE  : ********** TOTAL ENERGY IN Ry =       -59384.010373
> :ENE  : ********** TOTAL ENERGY IN Ry =       -59384.002657
> :ENE  : ********** TOTAL ENERGY IN Ry =       -59384.002598
> :ENE  : ********** TOTAL ENERGY IN Ry =       -59384.001408
> :ENE  : ********** TOTAL ENERGY IN Ry =       -59384.001345
> :ENE  : ********** TOTAL ENERGY IN Ry =       -59384.002692
> :ENE  : ********** TOTAL ENERGY IN Ry =       -59384.003448
> :ENE  : ********** TOTAL ENERGY IN Ry =       -59384.003165
> :ENE  : ********** TOTAL ENERGY IN Ry =       -59384.003322
> 
>  
> 
>  
> 
> Using 7 nodes in parallel  after recompilation (energies are different from serial mode)
> 
> 
> :ENE  : ********** TOTAL ENERGY IN Ry =       -59384.060785 <-------- equals to head node energy (ITER = 1)
> :ENE  : ********** TOTAL ENERGY IN Ry =       -59384.082041
> :ENE  : ********** TOTAL ENERGY IN Ry =       -59384.215193
> :ENE  : ********** TOTAL ENERGY IN Ry =       -59384.392857
> :ENE  : ********** TOTAL ENERGY IN Ry =       -59384.372439
> :ENE  : ********** TOTAL ENERGY IN Ry =       -59384.396461
> :ENE  : ********** TOTAL ENERGY IN Ry =       -59384.391014
> :ENE  : ********** TOTAL ENERGY IN Ry =       -59384.754255
> :ENE  : ********** TOTAL ENERGY IN Ry =       -59384.883938
> :ENE  : ********** TOTAL ENERGY IN Ry =       -59385.084485
> :ENE  : ********** TOTAL ENERGY IN Ry =       -59385.079495
> :ENE  : ********** TOTAL ENERGY IN Ry =       -59385.163579
> :ENE  : ********** TOTAL ENERGY IN Ry =       -59385.465532
> :ENE  : ********** TOTAL ENERGY IN Ry =       -59385.482198
> 
>  
> 
> COMPIlER OPTONS used are:
> 
> current:FOPT:-FR -mp1 -w -prec_div -pad -ip -DINTEL_VML
> current:LDFLAGS:-L/opt/intel/fce/9.0/lib -L/opt/intel/cmkl/8.0.1/lib/em64t -lsvml
> current:R_LIBS:-lmkl_lapack -lmkl_em64t -lguide -lguide_stats -lpthread
> 
> 
> Can anyone help?
> 
> Thanks
> 
> Ray Atta-Fynn
> 
>  
> 
> 


                                      P.Blaha
--------------------------------------------------------------------------
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-15671             FAX: +43-1-58801-15698
Email: blaha at theochem.tuwien.ac.at    WWW: http://info.tuwien.ac.at/theochem/
--------------------------------------------------------------------------


More information about the Wien mailing list