[Wien] Installation with MPI and GNU compilers

Wed May 2 16:08:00 CEST 2018

Dear Pavel,
maybe it's better to ask Laurence, seems he was writing the VML things. 

I didn't look into the code within the last years, what I found on a fast look is:

The only place where the INTEL_VML is used  any longer seems to be in Hamilt.f of LAPW1
I found that it is commented in all other cases where it was once used.

If you don't use INTEL_VML, the INTEL ifort will vectorice the loops in vectf.f of LAPW1 (see code in Hamilt.f that calls it)
(as I mentioned, maybe one has to link the libsvml explicitely)

For example 
-O2 -xHost -qopt-report=1 -qopt-report-phase=vec
will show you which loops were vectorized

I could not see that the svml has a reduced accuracy, however, you can set the performance/accuracy level in the VML.
What you can do is to set a threshhold for the loop size (similar to unroll), might need some short study of the manual.

I could not see that in W2kinit.F a threshold for the loops (size of the arrays) was set,
only the precision was set there for the INTEL_VML script, however,
I guess that Laurence used it where only large arrays appeared.

NB: I enjoy more questions about how to increase the speed or how to improve the code.

Ciao
Gerhard

DEEP THOUGHT in D. Adams; Hitchhikers Guide to the Galaxy:
"I think the problem, to be quite honest with you,
is that you have never actually known what the question is."

====================================
Dr. Gerhard H. Fecher
Institut of Inorganic and Analytical Chemistry
Johannes Gutenberg - University
55099 Mainz
and
Max Planck Institute for Chemical Physics of Solids
01187 Dresden
________________________________________
Von: Pavel Ondračka [pavel.ondracka at email.cz]
Gesendet: Mittwoch, 2. Mai 2018 12:05
An: Fecher, Gerhard
Betreff: Re: [Wien] Installation with MPI and GNU compilers

I'm using private answer since this might be getting too technical for
the list and in fact not interesting for majority of users...

Fecher, Gerhard píše v St 02. 05. 2018 v 09:00 +0000:
> I never checked that: does the  -DINTEL_VML switch correspond to the
> VML library routines of MKL
> or to the
> SVML library routines of the compiler

The lapw1 calls directly the VML library, for example the vdcos, vdsin
functions, but I have not checked the rest of Wien2k.

> this makes a difference, the svml routines are automatically invoked
> by the INTEL compiler if one uses -O2 optimization or higher.
> (check also the usage of the switches -vec, -no-vec, -vec-report)
>
> The VML routines of the MKL make only sense for appropriate sizes of
> the vectors, otherwise, they may even slow down the program (how much
> might also depend on threads etc.).

The common usage of the VML in Wien2k is to call the VML functions with
 a _large_ array as an argument. So if I understand it correctly the
vectorization is done inside the VML and the VML chooses the best
intrinsic. Since the arrays are large, there is a speedup in all cases.

BTW are you sure the -O2 switch alone will give you the svml
intrinsic? IMO the svml intrinsic have different accuracy (might not be
strictly IEEE compliant as compared to the scalar variants) so I would
expect you need to specify it explicitly with some additional flag that
you are OK with this (e.g. for GCC you need the -ffast-math switch to
get the vectorized sse,avx goniometric fuctions from the libmvec).

> A note (for the INTEL Fortran):
> I vaguely remember that the -DINTEL_VML switch did not bring any
> better performance, at that time one needed to give the -lsvml (with
> path to the compiler libs) explicitely.
>
> Ciao
> Gerhard
>
Best regards
Pavel