[Wien] Re: Strange problem: Different Energies, same machine
Peter Blaha
pblaha at theochem.tuwien.ac.at
Wed Mar 15 13:08:00 CET 2006
This looks very much as the NFS-bug reported previously by L.Marks
(we also experienced this in our group).
For some reason the ifort 9.0 seems to have problems with writing the
outputfiles in a way, that they are properly communicated to the
NFS server/client.
It seems to occur only with ifort 9, together with certain versions of
Linux Kernels (probably they have been solved very recently, but I'm not
sure about that).
We have e.g. verified this but using
ssh node1 "cd xxx; head -1 xxx.vsp"
this may yield that we have iteration 15, while the same command using
node2 may still yield the old iteration number 1.
In the latest WIEN2k_06 version I've added a few commands to work against
this LINUX/ifort problem (e.g. by rm case.vsp,...) but unfortunately so
far I could not make it completely save and we adopted an internal
strategy using special servers (mostly an old Linux one) to avoid these
problems.
If anybody knows a definite strategy (eg. upgrade all machines to Kernel
XXX) I'd appreciate the reports. (I don't want to upgrade all my machines
just because it MAY solve the problems)
PS: It is ok that on a modern Intel cpu the mkl8 and ifort9 may lead
to a large speadup.
> I recompiled WIEN2k_04 using intel mkl 8.0.1 and ifort 9.0 on a linux cluster (Intel EM64T Xeon) with
> 16 nodes (2 processors per node) after I increased NMATMAX. I successfully recompiled by exactly
> following the instructions by Gerhard H. Fecher given in FAQ using siteconfig_lapw.
> After recompilation I tested it. In serial mode (calculation on a single node), I get the same
> energies for WIEN before and after compilation.
>
> But if I run the recompiled WIEN for parallel k-points using (.machines), the energies that I get
> are completely different from the original except the very first energy (ITERATION 1).
> Kindly look at the energies below. I did not change anything in lapw1para etc. before or after recompilation.
>
> I must state however that the computing time is reduced by about 50% after recompilation (why?)
>
> serial mode energies after recompilations (Same as energies before recompilation)
>
> :ENE : ********** TOTAL ENERGY IN Ry = -59384.060785
> :ENE : ********** TOTAL ENERGY IN Ry = -59384.056999
> :ENE : ********** TOTAL ENERGY IN Ry = -59384.022748
> :ENE : ********** TOTAL ENERGY IN Ry = -59384.023829
> :ENE : ********** TOTAL ENERGY IN Ry = -59384.021702
> :ENE : ********** TOTAL ENERGY IN Ry = -59384.010373
> :ENE : ********** TOTAL ENERGY IN Ry = -59384.002657
> :ENE : ********** TOTAL ENERGY IN Ry = -59384.002598
> :ENE : ********** TOTAL ENERGY IN Ry = -59384.001408
> :ENE : ********** TOTAL ENERGY IN Ry = -59384.001345
> :ENE : ********** TOTAL ENERGY IN Ry = -59384.002692
> :ENE : ********** TOTAL ENERGY IN Ry = -59384.003448
> :ENE : ********** TOTAL ENERGY IN Ry = -59384.003165
> :ENE : ********** TOTAL ENERGY IN Ry = -59384.003322
>
>
>
>
>
> Using 7 nodes in parallel after recompilation (energies are different from serial mode)
>
>
> :ENE : ********** TOTAL ENERGY IN Ry = -59384.060785 <-------- equals to head node energy (ITER = 1)
> :ENE : ********** TOTAL ENERGY IN Ry = -59384.082041
> :ENE : ********** TOTAL ENERGY IN Ry = -59384.215193
> :ENE : ********** TOTAL ENERGY IN Ry = -59384.392857
> :ENE : ********** TOTAL ENERGY IN Ry = -59384.372439
> :ENE : ********** TOTAL ENERGY IN Ry = -59384.396461
> :ENE : ********** TOTAL ENERGY IN Ry = -59384.391014
> :ENE : ********** TOTAL ENERGY IN Ry = -59384.754255
> :ENE : ********** TOTAL ENERGY IN Ry = -59384.883938
> :ENE : ********** TOTAL ENERGY IN Ry = -59385.084485
> :ENE : ********** TOTAL ENERGY IN Ry = -59385.079495
> :ENE : ********** TOTAL ENERGY IN Ry = -59385.163579
> :ENE : ********** TOTAL ENERGY IN Ry = -59385.465532
> :ENE : ********** TOTAL ENERGY IN Ry = -59385.482198
>
>
>
> COMPIlER OPTONS used are:
>
> current:FOPT:-FR -mp1 -w -prec_div -pad -ip -DINTEL_VML
> current:LDFLAGS:-L/opt/intel/fce/9.0/lib -L/opt/intel/cmkl/8.0.1/lib/em64t -lsvml
> current:R_LIBS:-lmkl_lapack -lmkl_em64t -lguide -lguide_stats -lpthread
>
>
> Can anyone help?
>
> Thanks
>
> Ray Atta-Fynn
>
>
>
>
P.Blaha
--------------------------------------------------------------------------
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-15671 FAX: +43-1-58801-15698
Email: blaha at theochem.tuwien.ac.at WWW: http://info.tuwien.ac.at/theochem/
--------------------------------------------------------------------------
More information about the Wien
mailing list