[Wien] Installation with MPI and GNU compilers
Pavel Ondračka
pavel.ondracka at email.cz
Thu May 3 11:04:17 CEST 2018
Rui Costa píše v St 02. 05. 2018 v 22:29 +0100:
> Changing "#if defined (INTEL_VML)" to "#if defined
> (INTEL_VML_HAMILT)" in SRC_lapw1/hamilt.F really improved Hamilt but
> seems like DIAG is a little slower. In my pc (Intel(R) Core(TM) i7-
> 2630QM CPU @ 2.00GHz, 4 cores, 8 Gb RAM) the benchmark tests went
> from:
Good to hear this :-) Not sure about the apparent slowdown of the DIAG
part, IMO changes to the hamilt.f file should have zero influence on
the DIAG part. Might be some background task running messing with your
cache hits, etc. But it seems fairly consistent.
>
> > Simulation Total (CPU/Wall) Hamilt
> > (CPU/Wall) HNS (CPU/Wall) DIAG (CPU/Wall)
> >
> > Serial 1 kpt, 1 thread 82/82 41/41
> > 6/6 35/35
> > 1 kpt, 2 thread 88/66 41/41
> > 7/4 40/21
> > 1 kpt, 4 thread 112/61 41/41
> > 9/3 64/17
> > kparallel 2 kpts, 1 th. 83/83 42/42
> > 6/6 35/35
> > 2 kpts, 2 th. 117/82 44/44
> > 8/4 65/34
> > 4 kpts, 1 th. 126/126 49/49
> > 9/9 68/68
> > MPI 1 kpt, 1 mpi 1078/1080 618/620
> > 77/77 383/383
> > 1 kpt, 2 mpi 1014/1112 392/394
> > 104/104 518/618 <- pc stopped for a few minutes
> > 1 kpt, 4 mpi 699/701 210/211
> > 87/88 402/403
>
> to
>
> > Serial 1 kpt, 1 thread 50/50 8/8
> > 6/6 36/36
> > 1 kpt, 2 thread 59/35 8/8
> > 8/4 43/23
> > 1 kpt, 4 thread 89/30 8/8
> > 10/3 71/19
> > kparallel 2 kpts, 1 th. 56/56 9/9
> > 6/6 41/41
> > 2 kpts, 2 th. 86/50 9/9
> > 9/5 68/36
> > 4 kpts, 1 th. 126/126 10/10
> > 10/10 73/73
> > MPI 1 kpt, 1 mpi 540/541 78/79
> > 77/77 385/385
> > 1 kpt, 2 mpi 695/699 81/83
> > 96/96 518/520 <- ran this simulation twice, don't
> > understand
> > 1 kpt, 4 mpi 509/511 45/46
> > 81/81 383/384 why it isslower than the above
>
> Now my only problem seems to be "-it" flag not working.
>
> Best regards,
> Rui Costa.
>
Regarding the -it problem, as a temporary solution before a proper fix
is provided the -noHinv flag to run_lapw should help (e.g. hopefully
skip the problematic part of the code) while still retaining the
benefits of -it. Some people use it with -it all the time (see for
example this thread https://www.mail-archive.com/wien@zeus.theochem.tuw
ien.ac.at/msg15385.html ), IIRC there might be some tradeof between the
speed of one cycle vs the speed of the overal convergence, but I never
ran any benchmarks.
BTW it would be definitely interesting to run few cases while
experimenting with the -it -noHinv -vec2pratt and -fd x flags to have
some hard data on the cycle speed and overal convergence speed
differences, did anyone run any similar tests?
Best regards
Pavel
More information about the Wien
mailing list