[Wien] Installation with MPI and GNU compilers
Pavel Ondračka
pavel.ondracka at email.cz
Thu Apr 5 12:18:39 CEST 2018
Laurence Marks píše v St 04. 04. 2018 v 16:01 +0000:
> I confess to being rather doubtful that gfortran+... is comparable to
> ifort+... for Intel cpu, it might be for AMD. While the mkl vector
> libraries are useful in a few codes such as aim, they are minor for
> the main lapw[0-2].
Well, some fast benchmark data then (serial benchmark single core):
Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz (haswell)
Wien2k 17.1
-------------
gfortran 7.3.1 + OPENBLAS 0.2.20 + glibc 2.26 (with the custom patch to
use libmvec):
Time for al,bl (hamilt, cpu/wall) : 0.2 0.2
Time for legendre (hamilt, cpu/wall) : 0.1 0.2
Time for phase (hamilt, cpu/wall) : 1.2 1.2
Time for us (hamilt, cpu/wall) : 1.2 1.2
Time for overlaps (hamilt, cpu/wall) : 2.6 2.8
Time for distrib (hamilt, cpu/wall) : 0.1 0.1
Time sum iouter (hamilt, cpu/wall) : 5.5 5.8
number of local orbitals, nlo (hamilt) 304
allocate YL 2.5
MB dimensions 15 3481 3
allocate phsc 0.1 MB dimensions 3481
Time for los (hamilt, cpu/wall) : 0.4 0.3
Time for alm (hns) : 0.1
Time for vector (hns) : 0.3
Time for vector2 (hns) : 0.3
Time for VxV (hns) : 2.1
Wall Time for VxV (hns) : 0.1
245 Eigenvalues computed
Seclr4(Cholesky complete (CPU)) : 1.380 40754.14
Mflops
Seclr4(Transform to eig.problem (CPU)) : 4.470 37745.44
Mflops
Seclr4(Compute eigenvalues (CPU)) : 12.750 17643.13
Mflops
Seclr4(Backtransform (CPU)) : 0.290 10237.08
Mflops
TIME HAMILT (CPU) = 5.8, HNS = 2.5, HORB = 0.0,
DIAG = 18.9
TIME HAMILT (WALL) = 6.1, HNS = 2.5, HORB = 0.0,
DIAG = 19.0
real 0m28.610s
user 0m27.817s
sys 0m0.394s
-----------
Ifort 17.0.0 + MKL 2017.0:
Time for al,bl (hamilt, cpu/wall) : 0.2 0.2
Time for legendre (hamilt, cpu/wall) : 0.1 0.2
Time for phase (hamilt, cpu/wall) : 1.2 1.3
Time for us (hamilt, cpu/wall) : 1.0 1.0
Time for overlaps (hamilt, cpu/wall) : 2.6 2.8
Time for distrib (hamilt, cpu/wall) : 0.1 0.1
Time sum iouter (hamilt, cpu/wall) : 5.4 5.6
number of local orbitals, nlo (hamilt) 304
allocate YL 2.5
MB dimensions 15 3481 3
allocate phsc 0.1 MB dimensions 3481
Time for los (hamilt, cpu/wall) : 0.2 0.2
Time for alm (hns) : 0.0
Time for vector (hns) : 0.4
Time for vector2 (hns) : 0.4
Time for VxV (hns) : 2.1
Wall Time for VxV (hns) : 0.1
245 Eigenvalues computed
Seclr4(Cholesky complete (CPU)) : 1.110 50667.31
Mflops
Seclr4(Transform to eig.problem (CPU)) : 3.580 47129.09
Mflops
Seclr4(Compute eigenvalues (CPU)) : 11.320 19873.04
Mflops
Seclr4(Backtransform (CPU)) : 0.250 11875.01
Mflops
TIME HAMILT (CPU) = 5.7, HNS = 2.6, HORB = 0.0,
DIAG = 16.3
TIME HAMILT (WALL) = 5.9, HNS = 2.6, HORB = 0.0,
DIAG = 16.3
real 0m25.587s
user 0m24.857s
sys 0m0.321s
-------------
So I apologize for my statement in the last email that was too
ambitious. Indeed in this particular case the opensource stack is ~12%
slower (25 vs 28 seconds). Most of this is in the DIAG part (which I
believe is where OpenBLAS comes to play). However on some other (older)
Intel CPUs the DIAG part can be even faster with OpenBLAS, see the
already mentioned email by prof. Blaha https://www.mail-archive.com/wie
n at zeus.theochem.tuwien.ac.at/msg15106.html where he tested on i7-3930K
(sandybridge), hence for those older CPUs I would expect the
performance to be really comparable (with the small patch to utilize
the libmvec in order to speed up the HAMILT part).
In general the opensource support is usually slow to materialize hence
the performance on older CPUs is better. Especially in the OpenBLAS
where the optimizations for new CPUs and instruction sets are not
provided by Intel (contrary to the gcc, gfrortran and glibc where Intel
engineers contribute directly) while the MKL and ifort have good
support from day 1.
I do agree that it is better to advise users to use MKL+ifort since
when they have it properly installed the siteconfig is almost always
able to detect and build everything out of the box with default config.
This is unfortunately not the case with the opensource libraries, where
the detection does not work most of time due to distro differences and
the unfortunate fact that majority of the needed libraries does not
provide any good means for autodetection (e.g. proper package config
files), hence the user must edit the compiler flags by hand. I just
believe that the "ifort is always much faster that gfortran" dogma is
no longer always true.
Best regards
Pavel
More information about the Wien
mailing list