<div dir="ltr">I was able to install wien2k with gfortran+MKL. Apparently the MKL libraries are free [<a href="https://software.intel.com/en-us/performance-libraries">https://software.intel.com/en-us/performance-libraries</a>] but not the compilers.<div><br></div><div>While doing the benchmark tests we noticed that during the Hamilt there was a huge difference between this and an ifort+MKL compilation, and as Pavel said, this comes from the VML functions. This is not the case during DIAG because while the DIAG belongs to MKL, Hamilt is from wien2k. I then tried to compile with these VML functions but I couldn't because I need an ifcore.mod file that comes with intel compilers I think, at least it is not in the free MKL version.</div><div><br></div><div>Do you have any recommendation about the compilation options that could better optimize wien2k?</div><div><br></div><div>The ones I used are the following:</div><div><br></div><blockquote style="margin:0px 0px 0px 40px;border:none;padding:0px"><div><div><font face="monospace, monospace"> ***********************************************************************</font></div></div><div><div><font face="monospace, monospace"> * Specify compiler and linker options *</font></div></div><div><div><font face="monospace, monospace"> ***********************************************************************</font></div></div><div><div><font face="monospace, monospace"><br></font></div></div><div><div><font face="monospace, monospace"><br></font></div></div><div><div><font face="monospace, monospace"> Recommended options for system linuxgfortran are:</font></div></div><div><div><font face="monospace, monospace"> Compiler options: -ffree-form -O2 -ffree-line-length-none</font></div></div><div><div><font face="monospace, monospace"> Linker Flags: $(FOPT) -L../SRC_lib</font></div></div><div><div><font face="monospace, monospace"> Preprocessor flags: '-DParallel'</font></div></div><div><div><font face="monospace, monospace"> R_LIB (LAPACK+BLAS): -lopenblas -llapack -lpthread</font></div></div><div><div><font face="monospace, monospace"><br></font></div></div><div><div><font face="monospace, monospace"> Current settings:</font></div></div><div><div><font face="monospace, monospace"> O Compiler options: -ffree-form -O2 -ftree-vectorize -ffree-line-length-none -fopenmp -m64 -I$(MKLROOT)/include -I/opt/openmpi/include</font></div></div><div><div><font face="monospace, monospace"> L Linker Flags: $(FOPT) -L$(MKLROOT)/lib/$(MKL_TARGET_ARCH) -L/opt/openmpi/lib -L/opt/fftw3/lib -pthread</font></div></div><div><div><font face="monospace, monospace"> P Preprocessor flags '-DParallel'</font></div></div><div><div><font face="monospace, monospace"> R R_LIBS (LAPACK+BLAS): /opt/intel/compilers_and_libraries_2018.2.199/linux/mkl/lib/intel64_lin/libmkl_blas95_lp64.a /opt/intel/compilers_and_libraries_2018.2.199/linux/mkl/lib/intel64_lin/libmkl_lapack95_lp64.a -Wl,--no-as-needed -lmkl_gf_lp64 -lmkl_gnu_thread -lmkl_core -lgomp -lpthread -lm -ldl</font></div></div><div><div><font face="monospace, monospace"> X LIBX options: -DLIBXC -I/opt/etsf/include</font></div></div><div><div><font face="monospace, monospace"> LIBXC-LIBS: -L/opt/etsf/lib -lxcf03 -lxc</font></div></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace"><div> ***********************************************************************</div></font></div><div><font face="monospace, monospace"><div> * Specify parallel options and library settings *</div></font></div><div><font face="monospace, monospace"><div> ***********************************************************************</div></font></div><div><font face="monospace, monospace"><div><br></div></font></div><div><font face="monospace, monospace"><div> Your current parallel settings (options and libraries) are:</div></font></div><div><font face="monospace, monospace"><div> </div></font></div><div><font face="monospace, monospace"><div> C Parallel Compiler: mpifort</div></font></div><div><font face="monospace, monospace"><div> FP Parallel Compiler Options: -ffree-form -O2 -ftree-vectorize -ffree-line-length-none -fopenmp -m64 -I$(MKLROOT)/include -I/opt/openmpi/include</div></font></div><div><font face="monospace, monospace"><div> MP MPIRUN command: mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_</div></font></div><div><font face="monospace, monospace"><div> </div></font></div><div><font face="monospace, monospace"><div> Additional setting for SLURM batch systems (is set to 1 otherwise):</div></font></div><div><font face="monospace, monospace"><div> </div></font></div><div><font face="monospace, monospace"><div> CN Number of Cores: 1</div></font></div><div><font face="monospace, monospace"><div><br></div></font></div><div><font face="monospace, monospace"><div> Libraries:</div></font></div><div><font face="monospace, monospace"><div> </div></font></div><div><font face="monospace, monospace"><div> F FFTW options: -DFFTW3 -I/opt/fftw3/include</div></font></div><div><font face="monospace, monospace"><div> FFTW-LIBS: -L/opt/fftw3/lib -lfftw3</div></font></div><div><font face="monospace, monospace"><div> FFTW-PLIBS: -lfftw3_mpi</div></font></div><div><font face="monospace, monospace"><div> Sp SCALAPACK: -L/opt/intel/compilers_and_libraries_2018.2.199/linux/mkl/lib/intel64 </div></font></div><div><font face="monospace, monospace"><div> -lmkl_scalapack_lp64 </div></font></div><div><font face="monospace, monospace"><div> -L/opt/intel/compilers_and_libraries_2018.2.199/linux/mkl/lib/intel64 -lmkl_blacs_openmpi_lp64</div></font></div><div><font face="monospace, monospace"><div> E ELPA options:</div></font></div><div><font face="monospace, monospace"><div> ELPA-LIBS:</div></font></div><div><font face="monospace, monospace"><div><br></div></font></div><div><font face="monospace, monospace"><div> Since you use gfortran you might need to specify additional libraries.</div></font></div><div><font face="monospace, monospace"><div> You have to make sure that all necessary libraries are present (e.g. MPI, ...)</div></font></div><div><font face="monospace, monospace"><div> and can be found by the linker (specify, if necessary, -L/Path_to_library )!</div></font></div><div><font face="monospace, monospace"><div><br></div></font></div><div><font face="monospace, monospace"><div> RP Parallel-Libs for gfortran: </div></font></div></blockquote><div><font face="monospace, monospace"><br></font></div><div><br></div><div>Additionally, whenever I try to run a simulation with "-it" flag the simulations fail in the second cycle with a "Fortran runtime error". In this example I am doing TiC from the UG and executing the command "run_lapw -it":</div><div><br></div><blockquote style="margin:0 0 0 40px;border:none;padding:0px"><div><div><font face="monospace, monospace">hup: Command not found.</font></div></div><div><div><font face="monospace, monospace">STOP LAPW0 END</font></div></div><div><div><font face="monospace, monospace">foreach: No match.</font></div></div><div><div><font face="monospace, monospace">Note: The following floating-point exceptions are signalling: IEEE_DENORMAL</font></div></div><div><div><font face="monospace, monospace">STOP LAPW1 END</font></div></div><div><div><font face="monospace, monospace">STOP LAPW2 END</font></div></div><div><div><font face="monospace, monospace">STOP CORE END</font></div></div><div><div><font face="monospace, monospace">STOP MIXER END</font></div></div><div><div><font face="monospace, monospace">ec cc and fc_conv 0 1 1</font></div></div><div><div><font face="monospace, monospace">in cycle 2 ETEST: 0 CTEST: 0</font></div></div><div><div><font face="monospace, monospace">hup: Command not found.</font></div></div><div><div><font face="monospace, monospace">STOP LAPW0 END</font></div></div><div><div><font face="monospace, monospace">At line 140 of file jacdavblock_tmp_.F (unit = 200, file = './TiC.storeHinv_proc_0')</font></div></div><div><div><font face="monospace, monospace">Fortran runtime error: Sequential READ or WRITE not allowed after EOF marker, possibly use REWIND or BACKSPACE</font></div></div><div><div><font face="monospace, monospace"><br></font></div></div><div><div><font face="monospace, monospace">> stop error</font></div></div></blockquote><div><br></div><div>The "TiC.storeHinv_proc_0" file is empty and I can't find the file "<span style="color:rgb(34,34,34);font-family:monospace,monospace;font-size:small;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">jacdavblock_tmp_.F"</span>. What could be the problem?</div><div><br></div><div>Best regards,</div><div>Rui Costa.</div></div><div class="gmail_extra"><br><div class="gmail_quote">On 5 April 2018 at 11:18, Pavel Ondračka <span dir="ltr"><<a href="mailto:pavel.ondracka@email.cz" target="_blank">pavel.ondracka@email.cz</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Laurence Marks píše v St 04. 04. 2018 v 16:01 +0000:<br>
<span class="">> I confess to being rather doubtful that gfortran+... is comparable to<br>
> ifort+... for Intel cpu, it might be for AMD. While the mkl vector<br>
> libraries are useful in a few codes such as aim, they are minor for<br>
> the main lapw[0-2].<br>
<br>
</span>Well, some fast benchmark data then (serial benchmark single core):<br>
Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz (haswell)<br>
Wien2k 17.1<br>
<br>
-------------<br>
<br>
gfortran 7.3.1 + OPENBLAS 0.2.20 + glibc 2.26 (with the custom patch to<br>
use libmvec):<br>
<br>
Time for al,bl (hamilt, cpu/wall) : 0.2 0.2<br>
Time for legendre (hamilt, cpu/wall) : 0.1 0.2<br>
Time for phase (hamilt, cpu/wall) : 1.2 1.2<br>
Time for us (hamilt, cpu/wall) : 1.2 1.2<br>
Time for overlaps (hamilt, cpu/wall) : 2.6 2.8<br>
Time for distrib (hamilt, cpu/wall) : 0.1 0.1<br>
Time sum iouter (hamilt, cpu/wall) : 5.5 5.8<br>
number of local orbitals, nlo (hamilt) 304<br>
allocate YL 2.5<br>
MB dimensions 15 3481 3<br>
allocate phsc 0.1 MB dimensions 3481<br>
Time for los (hamilt, cpu/wall) : 0.4 0.3<br>
Time for alm (hns) : 0.1<br>
Time for vector (hns) : 0.3<br>
Time for vector2 (hns) : 0.3<br>
Time for VxV (hns) : 2.1<br>
Wall Time for VxV (hns) : 0.1<br>
245 Eigenvalues computed <br>
Seclr4(Cholesky complete (CPU)) : 1.380 40754.14<br>
Mflops<br>
Seclr4(Transform to eig.problem (CPU)) : 4.470 37745.44<br>
Mflops<br>
Seclr4(Compute eigenvalues (CPU)) : 12.750 17643.13<br>
Mflops<br>
Seclr4(Backtransform (CPU)) : 0.290 10237.08<br>
Mflops<br>
TIME HAMILT (CPU) = 5.8, HNS = 2.5, HORB = 0.0,<br>
DIAG = 18.9<br>
TIME HAMILT (WALL) = 6.1, HNS = 2.5, HORB = 0.0,<br>
DIAG = 19.0<br>
<br>
real 0m28.610s<br>
user 0m27.817s<br>
sys 0m0.394s<br>
<br>
-----------<br>
<br>
Ifort 17.0.0 + MKL 2017.0:<br>
<br>
Time for al,bl (hamilt, cpu/wall) : 0.2 0.2<br>
Time for legendre (hamilt, cpu/wall) : 0.1 0.2<br>
Time for phase (hamilt, cpu/wall) : 1.2 1.3<br>
Time for us (hamilt, cpu/wall) : 1.0 1.0<br>
Time for overlaps (hamilt, cpu/wall) : 2.6 2.8<br>
Time for distrib (hamilt, cpu/wall) : 0.1 0.1<br>
Time sum iouter (hamilt, cpu/wall) : 5.4 5.6<br>
number of local orbitals, nlo (hamilt) 304<br>
allocate YL 2.5<br>
MB dimensions 15 3481 3<br>
allocate phsc 0.1 MB dimensions 3481<br>
Time for los (hamilt, cpu/wall) : 0.2 0.2<br>
Time for alm (hns) : 0.0<br>
Time for vector (hns) : 0.4<br>
Time for vector2 (hns) : 0.4<br>
Time for VxV (hns) : 2.1<br>
Wall Time for VxV (hns) : 0.1<br>
245 Eigenvalues computed <br>
Seclr4(Cholesky complete (CPU)) : 1.110 50667.31<br>
Mflops<br>
Seclr4(Transform to eig.problem (CPU)) : 3.580 47129.09<br>
Mflops<br>
Seclr4(Compute eigenvalues (CPU)) : 11.320 19873.04<br>
Mflops<br>
Seclr4(Backtransform (CPU)) : 0.250 11875.01<br>
Mflops<br>
TIME HAMILT (CPU) = 5.7, HNS = 2.6, HORB = 0.0,<br>
DIAG = 16.3<br>
TIME HAMILT (WALL) = 5.9, HNS = 2.6, HORB = 0.0,<br>
DIAG = 16.3<br>
<br>
real 0m25.587s<br>
user 0m24.857s<br>
sys 0m0.321s<br>
-------------<br>
<br>
So I apologize for my statement in the last email that was too<br>
ambitious. Indeed in this particular case the opensource stack is ~12%<br>
slower (25 vs 28 seconds). Most of this is in the DIAG part (which I<br>
believe is where OpenBLAS comes to play). However on some other (older)<br>
Intel CPUs the DIAG part can be even faster with OpenBLAS, see the<br>
already mentioned email by prof. Blaha <a href="https://www.mail-archive.com/wie" rel="noreferrer" target="_blank">https://www.mail-archive.com/<wbr>wie</a><br>
<a href="http://n@zeus.theochem.tuwien.ac.at/msg15106.html" rel="noreferrer" target="_blank">n@zeus.theochem.tuwien.ac.at/<wbr>msg15106.html</a> where he tested on i7-3930K<br>
(sandybridge), hence for those older CPUs I would expect the<br>
performance to be really comparable (with the small patch to utilize<br>
the libmvec in order to speed up the HAMILT part).<br>
<br>
In general the opensource support is usually slow to materialize hence<br>
the performance on older CPUs is better. Especially in the OpenBLAS<br>
where the optimizations for new CPUs and instruction sets are not<br>
provided by Intel (contrary to the gcc, gfrortran and glibc where Intel<br>
engineers contribute directly) while the MKL and ifort have good<br>
support from day 1.<br>
<br>
I do agree that it is better to advise users to use MKL+ifort since<br>
when they have it properly installed the siteconfig is almost always<br>
able to detect and build everything out of the box with default config.<br>
This is unfortunately not the case with the opensource libraries, where<br>
the detection does not work most of time due to distro differences and<br>
the unfortunate fact that majority of the needed libraries does not<br>
provide any good means for autodetection (e.g. proper package config<br>
files), hence the user must edit the compiler flags by hand. I just<br>
believe that the "ifort is always much faster that gfortran" dogma is<br>
no longer always true.<br>
<br>
Best regards<br>
Pavel<br>
<span class="">______________________________<wbr>_________________<br>
Wien mailing list<br>
<a href="mailto:Wien@zeus.theochem.tuwien.ac.at">Wien@zeus.theochem.tuwien.ac.<wbr>at</a><br>
</span><a href="http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien" rel="noreferrer" target="_blank">http://zeus.theochem.tuwien.<wbr>ac.at/mailman/listinfo/wien</a><br>
SEARCH the MAILING-LIST at: <a href="http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html" rel="noreferrer" target="_blank">http://www.mail-archive.com/<wbr>wien@zeus.theochem.tuwien.ac.<wbr>at/index.html</a><br>
</blockquote></div><br></div>