[Wien] WIEN2k and gfortran II

Peter Blaha pblaha at theochem.tuwien.ac.at
Mon Dec 12 11:04:35 CET 2016


Inspired by the recent posts about   gfortran and openblas, I made some 
timing tests myself.

I was using the "Test-Case" (serial benchmark) from our website (a 
complex case with NMAT=3481.

I tested it on an Intel I7-3939 (6 core) processor with either ifort+mkl 
(2016.3.210)  or   gfortran+openblas.
I was using 1, 2, 4 or 6 cores (set via OMP_NUM_TRHEADS) of one PC:

1 core:
Intel     TIME HAMILT (WALL) =     5.2, HNS =     4.2, DIAG =    25.8
gfortran: TIME HAMILT (WALL) =    36.3, HNS =     4.0, DIAG =    25.0

2 cores:
Intel     TIME HAMILT (WALL) =     5.3, HNS =     2.5, DIAG =    14.4
gfortran: TIME HAMILT (WALL) =    36.3, HNS =     2.4, DIAG =    13.4

4 cores:
Intel     TIME HAMILT (WALL) =     5.3, HNS =     1.7, DIAG =     7.7
gfortran: TIME HAMILT (WALL) =    36.6, HNS =     1.7, DIAG =     7.9

6 cores:
Intel     TIME HAMILT (WALL) =     5.3, HNS =     1.5, DIAG =     6.4
gfortran: TIME HAMILT (WALL) =    36.4, HNS =     2.0, DIAG =     7.4

So obviously, the openblas is really VERY good and basically of the same 
quality as the MKL (if not faster !!).

But: Setting up the eigenvalue problems (HAMILT) involves the 
calculation of many cosines (exponentials) and we can use the 
"vector-cosines" from the mkl. This makes ifort in this part 7 times 
faster !!!!
This can also be seen from the partial timing in case.output1 of the 
hamilt-times, where phase and us are significantly faster:

ifort
Time for al,bl    (hamilt, cpu/wall) :          0.3         0.3
Time for legendre (hamilt, cpu/wall) :          0.1         0.1
Time for phase    (hamilt, cpu/wall) :          1.1         1.3
Time for us       (hamilt, cpu/wall) :          1.2         1.2
Time for overlaps (hamilt, cpu/wall) :          2.0         1.9
Time for distrib  (hamilt, cpu/wall) :          0.1         0.0
gfortran
Time for al,bl    (hamilt, cpu/wall) :          0.2         0.3
Time for legendre (hamilt, cpu/wall) :          0.2         0.2
Time for phase    (hamilt, cpu/wall) :         25.9        25.3
Time for us       (hamilt, cpu/wall) :          6.3         6.8
Time for overlaps (hamilt, cpu/wall) :          2.8         3.0
Time for distrib  (hamilt, cpu/wall) :          0.0         0.0

This limits gfortan significantly, making it in these tests a factor of 
two (or, when using 4 cores a factor of 3) slower.

Anyway, the openblas is really good, and if somebody would know how to 
"vectorize" the cos, sin (exp) calls in gfortran this would be very 
valuable.

Peter Blaha

On 12/08/2016 01:51 PM, John Rundgren wrote:
> Dear Arthur,
>
> "Linker Flags" and "R_LIB" are found by consulting google on
> "xianyi-openblas user manual".
>
> The "include" flag is necessary, otherwise there is a conflict with
> /usr/link/ld.
>
> Xianyi recommends -lopenblas and adds -lpthread -lgfortran with
> motivations understood by wise Linuxers. They have not done any harm.
>
> Could you improve calculation time ...? In a previous wien-bounces you
> find a test where gfortran+openblas is fully competitive with intel+mkl.
> A try is worthwhile.
>
> Best regards / John
>
>
> John Rundgren
> Department of Theoretical Physics, KTH Royal Institute of Technology
>
>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:  http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>

-- 

                                       P.Blaha
--------------------------------------------------------------------------
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-165300             FAX: +43-1-58801-165982
Email: blaha at theochem.tuwien.ac.at    WIEN2k: http://www.wien2k.at
WWW:   http://www.imc.tuwien.ac.at/TC_Blaha
--------------------------------------------------------------------------


More information about the Wien mailing list