[Wien] Installation with MPI and GNU compilers

Thu May 3 11:04:17 CEST 2018

Rui Costa píše v St 02. 05. 2018 v 22:29 +0100:
> Changing "#if defined (INTEL_VML)" to "#if defined
> (INTEL_VML_HAMILT)" in SRC_lapw1/hamilt.F really improved Hamilt but
> seems like DIAG is a little slower. In my pc (Intel(R) Core(TM) i7-
> 2630QM CPU @ 2.00GHz, 4 cores, 8 Gb RAM) the benchmark tests went
> from:

Good to hear this :-) Not sure about the apparent slowdown of the DIAG
part, IMO changes to the hamilt.f file should have zero influence on
the DIAG part. Might be some background task running messing with your
cache hits, etc. But it seems fairly consistent.
> 
> > Simulation  		       Total (CPU/Wall)   Hamilt
> > (CPU/Wall)   HNS (CPU/Wall)   DIAG (CPU/Wall)
> > 
> > Serial       1 kpt, 1 thread   82/82              41/41           
> >    6/6              35/35
> >              1 kpt, 2 thread   88/66              41/41           
> >    7/4              40/21
> >              1 kpt, 4 thread   112/61             41/41           
> >    9/3              64/17
> > kparallel    2 kpts, 1 th.     83/83              42/42           
> >    6/6              35/35
> >              2 kpts, 2 th.     117/82             44/44           
> >    8/4              65/34
> >              4 kpts, 1 th.     126/126            49/49           
> >    9/9              68/68
> > MPI         1 kpt, 1 mpi       1078/1080          618/620         
> >    77/77            383/383
> >             1 kpt, 2 mpi       1014/1112          392/394         
> >    104/104          518/618 <- pc stopped for a few minutes
> >             1 kpt, 4 mpi       699/701            210/211         
> >    87/88            402/403
> 
> to
> 
> > Serial       1 kpt, 1 thread   50/50              8/8             
> >    6/6              36/36
> >              1 kpt, 2 thread   59/35              8/8             
> >    8/4              43/23
> >              1 kpt, 4 thread   89/30              8/8             
> >    10/3             71/19
> > kparallel    2 kpts, 1 th.     56/56              9/9             
> >    6/6              41/41
> >              2 kpts, 2 th.     86/50              9/9             
> >    9/5              68/36
> >              4 kpts, 1 th.     126/126            10/10           
> >    10/10            73/73
> > MPI         1 kpt, 1 mpi       540/541            78/79           
> >    77/77            385/385
> >             1 kpt, 2 mpi       695/699            81/83           
> >    96/96            518/520  <- ran this simulation twice, don't
> > understand
> >             1 kpt, 4 mpi       509/511            45/46           
> >    81/81            383/384     why it isslower than the above
> 
> Now my only problem seems to be "-it" flag not working.
> 
> Best regards,
> Rui Costa.
> 

Regarding the -it problem, as a temporary solution before a proper fix
is provided the -noHinv flag to run_lapw should help (e.g. hopefully
skip the problematic part of the code) while still retaining the
benefits of -it. Some people use it with -it all the time (see for
example this thread https://www.mail-archive.com/wien@zeus.theochem.tuw
ien.ac.at/msg15385.html ), IIRC there might be some tradeof between the
speed of one cycle vs the speed of the overal convergence, but I never
ran any benchmarks.

BTW it would be definitely interesting to run few cases while
experimenting with the -it -noHinv -vec2pratt and -fd x flags to have
some hard data on the cycle speed and overal convergence speed
differences, did anyone run any similar tests?

Best regards
Pavel