[Wien] Wien2k 19.1 with linux+gfortran benchmarks

Pavel Ondračka pavel.ondracka at email.cz
Thu Dec 12 14:06:42 CET 2019


I concur.

In general for the serial test case on modern CPU (avx2 instructions)
your runtime should be around or below 30seconds for single thread.

However as this is almost 10 years old mobile CPU with just avx
instructions the total runtime of slightly above 1 minute is expected.

Regarding the scaling even when not memory bound, I can get around 35%
runtime compared to serial run with openBLAS (MKL scales slightly
better). Small speedups could be probably gained with some work on HNS
section (as this is the worst scaling part which we have more or less
under control) but for the DIAG part we just depend on the BLAS/LAPACK
to scale properly.

If you have multiple k-points and your total memory permits it, its
best to use k-point parallelization and use OpenMP just for lapw0 and
mixer...

Pavel

On Thu, 2019-12-12 at 13:42 +0100, Peter Blaha wrote:
> It is perfectly ok for your hardware.
> 
> The cpu time is not so important for you, what counts is the WALL-
> time 
> (this is the time it really takes until it finishes).
> 
> You can see that Hamilt parallelizes fairly well (3.7 vs. 12.3
> seconds / 
> speedup factor 3.3), HNS is not so good (3.8 vs. 8.8 s / factor 2.3)
> and 
> DIAG is worse (23.2 vs. 48.2 / factor 2.1).
> 
> Part of the reason that you can never see a factor of 4 is the slow 
> memory access, so when 4 cores do some calculations, they have to
> wait 
> sometimes for data from the memory.
> 
> On machines with more cores and a better memory bus, you will get
> other 
> speed-ups, but basically no machine can use all cores with 100% 
> efficiency because of this limited memory access.
> 
> 
> On 12/12/19 1:07 PM, Hemza wrote:
> > Hi everybody:
> > I just finished updating my wien2k installation to 19.1 with
> > openMP 
> > support (linux (4.19.88), gfortran (9.2.0), openblas-lapack-openmp 
> > (0.3.7), fftw3 (3.3.8), libxc (4.3.4)), and patches from 
> > "https://github.com/gsabo/WIEN2k-Patches".
> > I intend to use it for relatively small cases (less than 25
> > atoms/unit 
> > cell). I run 'x lapw1' on the test_case.
> > With OMP_NUM_THREAD=4 in bashrc:
> > --------------------------
> > $ x lapw1
> > STOP  LAPW1 END
> > 113.876u 2.097s 0:31.36 369.7%  0+0k 424+37840io 2pf+0w
> > $ grep HORB *output1*
> > test_case.output1:       TIME HAMILT (CPU)  =    13.5, HNS =  
> >  12.6, 
> > HORB =     0.0, DIAG =    87.3, SYNC =     0.0
> > test_case.output1:       TIME HAMILT (WALL) =     3.7, HNS =    
> > 3.8, 
> > HORB =     0.0, DIAG =    23.2, SYNC =     0.0
> > --------------------------
> > 
> > and with OMP_NUM_THREAD=1 , I got:
> > -------------------------------------
> > $ x lapw
> > STOP  LAPW1 END
> > 69.380u 0.339s 1:09.88 99.7%    0+0k 352+37848io 2pf+0w
> > $ grep HORB *output1*
> > test_case.output1:       TIME HAMILT (CPU)  =    12.0, HNS =    
> > 8.8, 
> > HORB =     0.0, DIAG =    48.1, SYNC =     0.0
> > test_case.output1:       TIME HAMILT (WALL) =    12.3, HNS =    
> > 8.8, 
> > HORB =     0.0, DIAG =    48.2, SYNC =     0.0
> > ------------------------------------
> > I do not feel i really understand the output and I do not know if
> > this 
> > timing are good, so I eager to read your comments!
> > 
> > My machine ('inix -dm' output)
> > ------------------------
> > System:    Host: dojo Kernel: 4.19.88-1-lts x86_64 bits: 64
> > Desktop: i3 
> > 4.17.1 Distro: Artix rolling
> > Machine:   Type: Laptop System: ASUSTeK product: K53SD v: 1.0
> > serial: 
> > <root required>
> >             Mobo: ASUSTeK model: K53SD v: 1.0 serial: <root
> > required> 
> > BIOS: American Megatrends v: K53SD.202
> >             date: 11/02/2011
> > Battery:   ID-1: BAT0 charge: 33.8 Wh condition: 33.8/59.4 Wh (57%)
> > Memory:    RAM: total: 7.57 GiB used: 4.84 GiB (63.9%)
> >             RAM Report: permissions: Unable to run dmidecode. Are
> > you root?
> > CPU:       Quad Core: Intel Core i7-2670QM type: MT MCP speed: 849
> > MHz 
> > min/max: 800/3100 MHz
> > Graphics:  Device-1: Intel 2nd Generation Core Processor Family 
> > Integrated Graphics driver: i915 v: kernel
> >             Device-2: NVIDIA GF119M [GeForce 610M] driver: nouveau
> > v: 
> > kernel
> >             Display: x11 server: X.org 1.20.6 driver:
> > intel,nouveau 
> > unloaded: fbdev,modesetting,vesa
> >             resolution: <xdpyinfo missing>
> >             Message: Unable to show advanced data. Required tool
> > glxinfo 
> > missing.
> > Network:   Device-1: Intel Centrino Wireless-N 100 driver: iwlwifi
> >             Device-2: Qualcomm Atheros AR8151 v2.0 Gigabit
> > Ethernet 
> > driver: atl1c
> > Drives:    Local Storage: total: 2.05 TiB used: 1.45 TiB (70.8%)
> > Info:      Processes: 300 Uptime: 1d 1h 46m Shell: bash inxi:
> > 3.0.26
> > -------------------------
> > 
> > regards
> > 
> > _______________________________________________
> > Wien mailing list
> > Wien at zeus.theochem.tuwien.ac.at
> > http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> > SEARCH the MAILING-LIST at:  
> > http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
> > 



More information about the Wien mailing list