[Wien] Installation with MPI and GNU compilers

Wed May 2 23:17:45 CEST 2018

When you say "as fast" do you mean for single core machines or multicore
with threads and/or mpi? Almost everything slow in Wien2k is
lapack/scalapack/elpa. For most parts of the code with 30-200 atom problems
ifort is good but not as critical as the libraries and network.

On Wed, May 2, 2018, 16:05 Pavel Ondračka <pavel.ondracka at email.cz> wrote:

>
> ---------- Původní e-mail ----------
> Od: Fecher, Gerhard <fecher at uni-mainz.de>
> Komu: Pavel Ondračka <pavel.ondracka at email.cz>,
> wien at zeus.theochem.tuwien.ac.at <wien at zeus.theochem.tuwien.ac.at>
> Datum: 2. 5. 2018 16:08:06
> Předmět: AW: [Wien] Installation with MPI and GNU compilers
>
> Dear Pavel,
> maybe it's better to ask Laurence, seems he was writing the VML things.
>
> I didn't look into the code within the last years, what I found on a fast
> look is:
>
> The only place where the INTEL_VML is used any longer seems to be in
> Hamilt.f of LAPW1
> I found that it is commented in all other cases where it was once used.
>
> If you don't use INTEL_VML, the INTEL ifort will vectorice the loops in
> vectf.f of LAPW1 (see code in Hamilt.f that calls it)
> (as I mentioned, maybe one has to link the libsvml explicitely
>
>
>
> BTW is svml part of the MKL or do you need the ifort for that?
>
>
> For example
> -O2 -xHost -qopt-report=1 -qopt-report-phase=vec
> will show you which loops were vectorized
>
>
> Indeed, if I add the -O2 and -xHost to the default Wien2k flags (with
> ifort and MKL) there is no performance hit if I remove the -DINTEL_VML.
>
>
> I could not see that the svml has a reduced accuracy, however, you can set
> the performance/accuracy level in the VML.
> What you can do is to set a threshhold for the loop size (similar to
> unroll), might need some short study of the manual.
>
>
> Interesting, I will try to run some tests for the speed and accuracy of
> some basic trigonometric functions for ifort vs gfortran and standard glibc
> vs libmvec vs VML vs svml.
>
>
> I could not see that in W2kinit.F a threshold for the loops (size of the
> arrays) was set,
> only the precision was set there for the INTEL_VML script, however,
> I guess that Laurence used it where only large arrays appeared.
>
> NB: I enjoy more questions about how to increase the speed or how to
> improve the code.
>
>
> Well,  I do believe that the code is well optimized when you have the
> ifort + MKL, however the rest of the options is a somewhat worse.
>
>
> Since you can nowadays get the MKL library for free (but not the ifort)
> there is the combination of gfortran + MKL, which does not have any default
> config  and is slow as was reported by Rui in beginning of the thread. I'm
> quite sure this combination can be made almost as fast as the ifort + MKL
> (either by somewhat fixing the INTEL_VML define to fix the missing ifcore
> problem, or possibly by using the -mveclibabi=svml gfortran switch or some
> other trick). I'm not sure how many people have this setup though.
>
>
> The most problematic is the gfortran + OpenBLAS combination, where I was
> not able to force gfortran use the vectorized (SIMD) math. It works with C
> code (which is why my approach to making lapw1 fast includes porting the
> vectf.f to C) but not with Fortran. It is possible there is some way to
> make this work but I had no luck so far. The libmvec has a public interface
> so it might be possible to call it directly similarly to the VML, however
> it would introduce a lot of #ifdef LIBMVEC to the code which I guess is not
> a good idea. I would like to have this working better out of the box so
> I'll keep looking for some solution which would not require extensive
> changes in the code or siteconfig script. Dunno if the authors are
> accepting patches anyway...
>
>
> Best regards
>
> Pavel
>
>
>
>
> Ciao
> Gerhard
>
> DEEP THOUGHT in D. Adams; Hitchhikers Guide to the Galaxy:
> "I think the problem, to be quite honest with you,
> is that you have never actually known what the question is."
>
> ====================================
> Dr. Gerhard H. Fecher
> Institut of Inorganic and Analytical Chemistry
> Johannes Gutenberg - University
> 55099 Mainz
> and
> Max Planck Institute for Chemical Physics of Solids
> 01187 Dresden
> ________________________________________
> Von: Pavel Ondračka [pavel.ondracka at email.cz]
> Gesendet: Mittwoch, 2. Mai 2018 12:05
> An: Fecher, Gerhard
> Betreff: Re: [Wien] Installation with MPI and GNU compilers
>
> I'm using private answer since this might be getting too technical for
> the list and in fact not interesting for majority of users...
>
> Fecher, Gerhard píše v St 02. 05. 2018 v 09:00 +0000:
> > I never checked that: does the -DINTEL_VML switch correspond to the
> > VML library routines of MKL
> > or to the
> > SVML library routines of the compiler
>
> The lapw1 calls directly the VML library, for example the vdcos, vdsin
> functions, but I have not checked the rest of Wien2k.
>
> > this makes a difference, the svml routines are automatically invoked
> > by the INTEL compiler if one uses -O2 optimization or higher.
> > (check also the usage of the switches -vec, -no-vec, -vec-report)
> >
> > The VML routines of the MKL make only sense for appropriate sizes of
> > the vectors, otherwise, they may even slow down the program (how much
> > might also depend on threads etc.).
>
> The common usage of the VML in Wien2k is to call the VML functions with
> a _large_ array as an argument. So if I understand it correctly the
> vectorization is done inside the VML and the VML chooses the best
> intrinsic. Since the arrays are large, there is a speedup in all cases.
>
> BTW are you sure the -O2 switch alone will give you the svml
> intrinsic? IMO the svml intrinsic have different accuracy (might not be
> strictly IEEE compliant as compared to the scalar variants) so I would
> expect you need to specify it explicitly with some additional flag that
> you are OK with this (e.g. for GCC you need the -ffast-math switch to
> get the vectorized sse,avx goniometric fuctions from the libmvec).
>
> > A note (for the INTEL Fortran):
> > I vaguely remember that the -DINTEL_VML switch did not bring any
> > better performance, at that time one needed to give the -lsvml (with
> > path to the compiler libs) explicitely.
> >
> > Ciao
> > Gerhard
> >
> Best regards
> Pavel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20180502/3fc3be92/attachment.html>