[Wien] Installation with MPI and GNU compilers
Pavel Ondračka
pavel.ondracka at email.cz
Wed May 2 23:04:54 CEST 2018
---------- Původní e-mail ----------
Od: Fecher, Gerhard <fecher at uni-mainz.de>
Komu: Pavel Ondračka <pavel.ondracka at email.cz>, wien at zeus.theochem.tuwien.
ac.at <wien at zeus.theochem.tuwien.ac.at>
Datum: 2. 5. 2018 16:08:06
Předmět: AW: [Wien] Installation with MPI and GNU compilers
"Dear Pavel,
maybe it's better to ask Laurence, seems he was writing the VML things.
I didn't look into the code within the last years, what I found on a fast
look is:
The only place where the INTEL_VML is used any longer seems to be in Hamilt.
f of LAPW1
I found that it is commented in all other cases where it was once used.
If you don't use INTEL_VML, the INTEL ifort will vectorice the loops in
vectf.f of LAPW1 (see code in Hamilt.f that calls it)
(as I mentioned, maybe one has to link the libsvml explicitely """
""
BTW is svml part of the MKL or do you need the ifort for that?
"
For example
-O2 -xHost -qopt-report=1 -qopt-report-phase=vec
will show you which loops were vectorized"
Indeed, if I add the -O2 and -xHost to the default Wien2k flags (with ifort
and MKL) there is no performance hit if I remove the -DINTEL_VML.
"I could not see that the svml has a reduced accuracy, however, you can set
the performance/accuracy level in the VML.
What you can do is to set a threshhold for the loop size (similar to
unroll), might need some short study of the manual. "
Interesting, I will try to run some tests for the speed and accuracy of some
basic trigonometric functions for ifort vs gfortran and standard glibc vs
libmvec vs VML vs svml.
"
I could not see that in W2kinit.F a threshold for the loops (size of the
arrays) was set,
only the precision was set there for the INTEL_VML script, however,
I guess that Laurence used it where only large arrays appeared.
NB: I enjoy more questions about how to increase the speed or how to improve
the code. "
Well, I do believe that the code is well optimized when you have the ifort
+ MKL, however the rest of the options is a somewhat worse.
Since you can nowadays get the MKL library for free (but not the ifort)
there is the combination of gfortran + MKL, which does not have any default
config and is slow as was reported by Rui in beginning of the thread. I'm
quite sure this combination can be made almost as fast as the ifort + MKL
(either by somewhat fixing the INTEL_VML define to fix the missing ifcore
problem, or possibly by using the -mveclibabi=svml gfortran switch or some
other trick). I'm not sure how many people have this setup though.
The most problematic is the gfortran + OpenBLAS combination, where I was not
able to force gfortran use the vectorized (SIMD) math. It works with C code
(which is why my approach to making lapw1 fast includes porting the vectf.f
to C) but not with Fortran. It is possible there is some way to make this
work but I had no luck so far. The libmvec has a public interface so it
might be possible to call it directly similarly to the VML, however it would
introduce a lot of #ifdef LIBMVEC to the code which I guess is not a good
idea. I would like to have this working better out of the box so I'll keep
looking for some solution which would not require extensive changes in the
code or siteconfig script. Dunno if the authors are accepting patches
anyway...
Best regards
Pavel
"
Ciao
Gerhard
DEEP THOUGHT in D. Adams; Hitchhikers Guide to the Galaxy:
"I think the problem, to be quite honest with you,
is that you have never actually known what the question is."
====================================
Dr. Gerhard H. Fecher
Institut of Inorganic and Analytical Chemistry
Johannes Gutenberg - University
55099 Mainz
and
Max Planck Institute for Chemical Physics of Solids
01187 Dresden
________________________________________
Von: Pavel Ondračka [pavel.ondracka at email.cz]
Gesendet: Mittwoch, 2. Mai 2018 12:05
An: Fecher, Gerhard
Betreff: Re: [Wien] Installation with MPI and GNU compilers
I'm using private answer since this might be getting too technical for
the list and in fact not interesting for majority of users...
Fecher, Gerhard píše v St 02. 05. 2018 v 09:00 +0000:
> I never checked that: does the -DINTEL_VML switch correspond to the
> VML library routines of MKL
> or to the
> SVML library routines of the compiler
The lapw1 calls directly the VML library, for example the vdcos, vdsin
functions, but I have not checked the rest of Wien2k.
> this makes a difference, the svml routines are automatically invoked
> by the INTEL compiler if one uses -O2 optimization or higher.
> (check also the usage of the switches -vec, -no-vec, -vec-report)
>
> The VML routines of the MKL make only sense for appropriate sizes of
> the vectors, otherwise, they may even slow down the program (how much
> might also depend on threads etc.).
The common usage of the VML in Wien2k is to call the VML functions with
a _large_ array as an argument. So if I understand it correctly the
vectorization is done inside the VML and the VML chooses the best
intrinsic. Since the arrays are large, there is a speedup in all cases.
BTW are you sure the -O2 switch alone will give you the svml
intrinsic? IMO the svml intrinsic have different accuracy (might not be
strictly IEEE compliant as compared to the scalar variants) so I would
expect you need to specify it explicitly with some additional flag that
you are OK with this (e.g. for GCC you need the -ffast-math switch to
get the vectorized sse,avx goniometric fuctions from the libmvec).
> A note (for the INTEL Fortran):
> I vaguely remember that the -DINTEL_VML switch did not bring any
> better performance, at that time one needed to give the -lsvml (with
> path to the compiler libs) explicitely.
>
> Ciao
> Gerhard
>
Best regards
Pavel
"
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20180502/d8d8e38f/attachment.html>
More information about the Wien
mailing list