[Wien] Wien2k on AVX512 CPUs

Pavel Ondračka pavel.ondracka at email.cz
Wed Feb 27 11:55:39 CET 2019


On Wed, 2019-02-27 at 04:23 -0600, Laurence Marks wrote:
> Agreed. For the update see:
> 
> https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg17832.html


Yeah, sorry I should have been more specific...

I'm running with the seclr4.F posted in the aforementioned thread.
Otherwise it would not compile due to the interface change, the Wien2k
part seems to be working OK, since if I compile ELPA with AVX2 only,
everything works fine.

I'm aware of the mixed avx2/avx512 nodes elpa problems however ATM I
have homogeneous nodes, so this should not be a problem.

I used the instructions posted here: 
https://gitlab.mpcdf.mpg.de/elpa/elpa/wikis/installation-examples-for-different-architecures
(they suggest using mpiifort for FC and gcc mpicc for CC), I also tried
to reduce the optimization level, as I don't in general trust ifort to
do the right thing at -O3, but as long as I pass --enable-AVX512
nothing works...

Laurence, would you be so kind to dig up your ELPA config (and compiler
versions), I'll try that and if it does not work, I'll go bother the
ELPA guys.

I also need to figure up the serial hamilt slowdown, I see similar
speedups to prof. Blaha in the HAMILT and HNS, but the HAMILT is
reproducibly slower (this is with 19.0.1.144 MKL, will try to
experiment with different MKL versions).

Intel(R) Xeon(R) Platinum 8160 CPU @ 2.10GHz:
MKL_ENABLE_INSTRUCTIONS=AVX512 
       TIME HAMILT (CPU)  =     2.9, HNS =     2.2, HORB =     0.0,
DIAG =    18.4
       TIME HAMILT (WALL) =     3.2, HNS =     2.2, HORB =     0.0,
DIAG =    18.4

MKL_ENABLE_INSTRUCTIONS=AVX2
      TIME HAMILT (CPU)  =     3.6, HNS =     2.6, HORB =     0.0, DIAG
=    17.0
       TIME HAMILT (WALL) =     3.9, HNS =     2.7, HORB =     0.0,
DIAG =    17.1

Best regards and thanks for the advice
Pavel

> ____
> Professor Laurence Marks
> "Research is to see what everybody else has seen, and to think what
> nobody else has thought", Albert Szent-Gyorgi
> www.numis.northwestern.edu
> 
> On Wed, Feb 27, 2019, 03:24 Peter Blaha <pblaha at theochem.tuwien.ac.at
>  wrote:
> > That's exactly what I said:
> > 
> > The current WIEN2k_18 release cannot be used with ELPA versions
> > more 
> > recent than 2017.
> > And I don't think that ELPA-2015 had AVX512 support.
> > 
> > On 2/27/19 10:14 AM, Laurence Marks wrote:
> > > N.B., there was an seclr4 update posted some time ago, I think by
> > Thomas 
> > > Ruh. This may be needed, and may not be in the current Wien2k
> > release on 
> > > the web page.
> > > 
> > > The next release will do a better job I suspect.
> > > 
> > > _____
> > > Professor Laurence Marks
> > > "Research is to see what everybody else has seen, and to think
> > what 
> > > nobody else has thought", Albert Szent-Gyorgi
> > > http://www.numis.northwestern.edu <
> > http://www.numis.northwestern.edu>
> > > 
> > > On Wed, Feb 27, 2019, 03:07 Laurence Marks <
> > L-marks at northwestern.edu 
> > > <mailto:L-marks at northwestern.edu> wrote:
> > > 
> > >     I think Peter may have mispoke about the latest elpa. I
> > believe it
> > >     will run OK if you compile it (--enable-AVX512 etc) so the
> > highest
> > >     kernel is equal to the lowest instruction set you use. You
> > may also
> > >     get it to work by using their environmental variables. With
> > the
> > >     current Wien2k you cannot exploit elpa optimally if you have
> > a
> > >     heterogeneous set of nodes.
> > > 
> > >     I would say 30% faster comparing a 6130 to a E5-2650.
> > However, ifort
> > >     compiler switches can make a big difference, as can the mpi
> > version.
> > > 
> > >     N.B., I can dig up my elpa compiler options later if needed.
> > I use
> > >     ifort/icc/mpiifort/mpiicc.
> > > 
> > >     _____
> > >     Professor Laurence Marks
> > >     "Research is to see what everybody else has seen, and to
> > think what
> > >     nobody else has thought", Albert Szent-Gyorgi
> > >     http://www.numis.northwestern.edu <
> > http://www.numis.northwestern.edu>
> > > 
> > >     On Wed, Feb 27, 2019, 02:50 Peter Blaha
> > >     <pblaha at theochem.tuwien.ac.at <mailto:
> > pblaha at theochem.tuwien.ac.at>
> > >     wrote:
> > > 
> > >         We have an Intel I7-7820X CPU @ 3.60GHz with 8 cores and
> > avx512.
> > > 
> > >         The testcase with OMP_NUM_THREADS=1 runs a bit faster
> > with
> > >         avx512 than
> > >         with avx2, but it is a rather small effect (at least when
> > >         working with
> > >         this MKL_ENABLE_INSTRUCTIONS variable:
> > >         ----------------------avx512
> > >                  TIME HAMILT (CPU)  =     5.1, HNS =     2.1,
> > HORB =   
> > >           0.0,
> > >         DIAG =    15.3
> > >                  TIME HAMILT (WALL) =     5.4, HNS =     2.1,
> > HORB =   
> > >           0.0,
> > >         DIAG =    15.3
> > >         ----------------------avx2
> > >                  TIME HAMILT (CPU)  =     5.8, HNS =     2.5,
> > HORB =   
> > >           0.0,
> > >         DIAG =    16.3
> > >                  TIME HAMILT (WALL) =     6.1, HNS =     2.5,
> > HORB =   
> > >           0.0,
> > >         DIAG =    16.3
> > > 
> > >         However, when using OMP_NUM_THREADS=8, this difference is
> > further
> > >         reduced (probably due to memory bounds ?)
> > >         -----------------------avx512
> > >                  TIME HAMILT (CPU)  =    19.9, HNS =     7.7,
> > HORB =   
> > >           0.0,
> > >         DIAG =    24.2
> > >                  TIME HAMILT (WALL) =     2.6, HNS =     1.0,
> > HORB =   
> > >           0.0,
> > >         DIAG =     3.2
> > >         ------------------------avx2
> > >                  TIME HAMILT (CPU)  =    20.0, HNS =     7.4,
> > HORB =   
> > >           0.0,
> > >         DIAG =    27.0
> > >                  TIME HAMILT (WALL) =     2.6, HNS =     1.0,
> > HORB =   
> > >           0.0,
> > >         DIAG =     3.5
> > >         ---------------------------------------------------------
> > ----------------
> > > 
> > >         Yes, we have the latest ELPA elpa-2018.11.001 installed.
> > Seems
> > >         to run
> > >         without problems and is overall significantly better than
> > the
> > >         old ELPA),
> > >         but it requires a change in the user interface. The next
> > release of
> > >         WIEN2k will have two elpa versions supported, a ELPA15
> > (which is in
> > >         WIEN2k_18), and a new ELPA interface for elpa versions
> > later
> > >         than 2017
> > >         (this is somehow like FFTW2 and FFTW3 versions).
> > > 
> > >         So in essence: with the present code one cannot use
> > >         ELPA-versions from
> > >         2017 or later.
> > > 
> > >         On 2/27/19 7:34 AM, Pavel Ondračka wrote:
> > >          > Dear mailing list,
> > >          >
> > >          > just out of curiosity has anyone any experience
> > running
> > >         Wien2k on a
> > >          > AVX512 capable machine (eg. the KNL accelerators or
> > recent Intel
> > >          > skylake-avx512 CPUs)?
> > >          >
> > >          > Recently my cluster updated to this skylake-avx512
> > machines
> > >         however I'm
> > >          > unable to get any better performance for Wien2k. In
> > >         particular MKL seem
> > >          > to suck, for example in single core performance (with
> > the serial
> > >          > test_case) the eigenvalue problem is actually faster
> > when I
> > >         forbid the
> > >          > usage of AVX512 instructions:
> > >          >
> > >          > running with MKL_VERBOSE=1
> > MKL_ENABLE_INSTRUCTIONS=AVX2
> > >          > MKL_VERBOSE
> > >          >
> > >       
> >  ZHETRD(L,3481,0x2b74d8567cc0,3481,0x2b74d82121c0,0x2b74d8218e88,0x
> > 2b74e
> > >          > f769b00,0x2b74ef777490,452530,0) 10.21s CNR:OFF Dyn:1
> > FastMM:1
> > >          > TID:0  NThr:1
> > >          >
> > >          > with MKL_ENABLE_INSTRUCTIONS=AVX512
> > >          > MKL_VERBOSE
> > >          >
> > >       
> >  ZHETRD(L,3481,0x2b5397c96cc0,3481,0x2b53979411c0,0x2b5397947e88,0x
> > 2b53a
> > >          > ee98b00,0x2b53aeea6490,452530,0) 12.31s CNR:OFF Dyn:1
> > FastMM:1
> > >          > TID:0  NThr:1
> > >          >
> > >          > This is somewhat compensated by speedups in the hamilt
> > part
> > >         (the VML
> > >          > stuff and various ?GEMMs seem to be actually slightly
> > >         faster), but
> > >          > overall the performance is mostly the same with and
> > without
> > >         the AVX512
> > >          > stuff. OpenBLAS is maybe 15% slower so not an option
> > as well...
> > >          >
> > >          > Moreover for MPI version I'm not able to get a
> > correctly
> > >         working ELPA
> > >          > compiled with the AVX512 support (I went for the
> > latest elpa-
> > >          > 2018.11.001 version), it just returns bogus results
> > and
> > >         diverges after
> > >          > few iterations. If someone has this working I'd be
> > really
> > >         grateful for
> > >          > a working configure line, and advice with which elpa
> > and
> > >         which compiler
> > >          > version this was.
> > >          >
> > >          > Unfortunately I was not able to get any support from
> > the
> > >         cluster admins
> > >          > beyond "We see a 30% per-core performance increase in
> > average"
> > >          > therefore asking here if anyone has experience with
> > such
> > >         machines.
> > >          >
> > >          > Any advice would be appreciated.
> > >          > Best regards
> > >          > Pavel
> > >          >
> > >          > _______________________________________________
> > >          > Wien mailing list
> > >          > Wien at zeus.theochem.tuwien.ac.at
> > >         <mailto:Wien at zeus.theochem.tuwien.ac.at>
> > >          >
> > >         
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__zeus.theochem.tuwien.ac.at_mailman_listinfo_wien&d=DwIGaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0&m=0vwn_c2KmvYL2EmszqmMAxn22_AHFhqVwSIMrLn_c_8&s=9rbXdyGFAJctXB2SLaOcC0V-kJ5Pi8IEjT4Rh-WXr7E&e=
> > >          > SEARCH the MAILING-LIST at:
> > >         
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.mail-2Darchive.com_wien-40zeus.theochem.tuwien.ac.at_index.html&d=DwIGaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0&m=0vwn_c2KmvYL2EmszqmMAxn22_AHFhqVwSIMrLn_c_8&s=qjTxSMAPwx29qPYmofuPDU3WxGJX4Yw4QkCHJKo7T8g&e=
> > >          >
> > > 
> > >         -- 
> > > 
> > >                                                 P.Blaha
> > >         ---------------------------------------------------------
> > -----------------
> > >         Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-
> > 1060 Vienna
> > >         Phone: +43-1-58801-165300             FAX: +43-1-58801-
> > 165982
> > >         Email: blaha at theochem.tuwien.ac.at
> > >         <mailto:blaha at theochem.tuwien.ac.at>    WIEN2k:
> > >         
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.wien2k.at&d=DwIGaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0&m=0vwn_c2KmvYL2EmszqmMAxn22_AHFhqVwSIMrLn_c_8&s=TFV0KhtG7EcQlTVqkdKqOmMJVdxRAy3ZuDrld-uWvIM&e=
> > >         WWW:
> > >         
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.imc.tuwien.ac.at_TC-5FBlaha&d=DwIGaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0&m=0vwn_c2KmvYL2EmszqmMAxn22_AHFhqVwSIMrLn_c_8&s=YmE7c8gn2QT2WRBkXhUey5BerwAAUH0MfBj8RNBoNNQ&e=
> > >         ---------------------------------------------------------
> > -----------------
> > >         _______________________________________________
> > >         Wien mailing list
> > >         Wien at zeus.theochem.tuwien.ac.at
> > >         <mailto:Wien at zeus.theochem.tuwien.ac.at>
> > >         
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__zeus.theochem.tuwien.ac.at_mailman_listinfo_wien&d=DwIGaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0&m=0vwn_c2KmvYL2EmszqmMAxn22_AHFhqVwSIMrLn_c_8&s=9rbXdyGFAJctXB2SLaOcC0V-kJ5Pi8IEjT4Rh-WXr7E&e=
> > >         SEARCH the MAILING-LIST at:
> > >         
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.mail-2Darchive.com_wien-40zeus.theochem.tuwien.ac.at_index.html&d=DwIGaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0&m=0vwn_c2KmvYL2EmszqmMAxn22_AHFhqVwSIMrLn_c_8&s=qjTxSMAPwx29qPYmofuPDU3WxGJX4Yw4QkCHJKo7T8g&e=
> > > 
> > > 
> > > _______________________________________________
> > > Wien mailing list
> > > Wien at zeus.theochem.tuwien.ac.at
> > > 
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__zeus.theochem.tuwien.ac.at_mailman_listinfo_wien&d=DwIGaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0&m=ixwqDyFrFZK5HeqVRLgEGnKqIeXtbt8b3WbbwjGwrYA&s=jeB3Y6lQFzNmBfzq_MzUmFlq-5TZb4Na_dj5CEo8nnc&e=
> > > SEARCH the MAILING-LIST at:  
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.mail-2Darchive.com_wien-40zeus.theochem.tuwien.ac.at_index.html&d=DwIGaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0&m=ixwqDyFrFZK5HeqVRLgEGnKqIeXtbt8b3WbbwjGwrYA&s=fk7dZSSEbZXAehSdMVp26lqP_9R_GHn_gA5MpwZ7pAA&e=
> > > 
> > 
> > _______________________________________________
> > Wien mailing list
> > Wien at zeus.theochem.tuwien.ac.at
> > http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> > SEARCH the MAILING-LIST at:  
> > http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html



More information about the Wien mailing list