[Wien] Parallel execution on new Intel CPUs

pluto pluto at physics.ucdavis.edu
Wed Feb 22 11:43:47 CET 2023


Dear Prof. Blaha, Prof. Marks, dear All,

Below some benchmark results. It seems that for a serial calculation 
using 8 OMP threads is optimal. This probably has something to do with 
having 8 fast and 8 slow cores.

Hardware:
13th Gen Intel(R) Core(TM) i7-13700K
64 GB of RAM DDR4-3600
2 TB drive Samsung NVMe
ASUS Z690-P D4 mainboard

I also looked at mpi-benchmark, but I don't have mpi, so I think these 
tests make no sense.

Let me know if I shoud add something to this.

Best,
Lukasz



bash-5.1$ pwd
(...)/WIEN2k_benchmark/Serial/test_case

bash-5.1$ export OMP_NUM_THREADS=1
bash-5.1$ echo $OMP_NUM_THREADS
1
bash-5.1$ x lapw1
  LAPW1 END
12.567u 0.216s 0:12.82 99.6%	0+0k 464+37840io 2pf+0w

bash-5.1$ export OMP_NUM_THREADS=2
bash-5.1$ echo $OMP_NUM_THREADS
2
bash-5.1$ x lapw1
  LAPW1 END
14.844u 0.248s 0:07.65 197.1%	0+0k 0+37840io 2pf+0w


bash-5.1$ export OMP_NUM_THREADS=4
bash-5.1$ echo $OMP_NUM_THREADS
4
bash-5.1$ x lapw1
  LAPW1 END
21.091u 0.372s 0:05.51 389.4%	0+0k 0+37840io 10pf+0w

bash-5.1$ export OMP_NUM_THREADS=6
bash-5.1$ echo $OMP_NUM_THREADS
6
bash-5.1$ x lapw1
  LAPW1 END
27.765u 0.490s 0:04.87 580.0%	0+0k 0+37824io 19pf+0w

bash-5.1$ export OMP_NUM_THREADS=8
bash-5.1$ echo $OMP_NUM_THREADS
8
bash-5.1$ x lapw1
  LAPW1 END
34.099u 0.605s 0:04.51 769.1%	0+0k 0+37824io 27pf+0w
bash-5.1$ x lapw1
  LAPW1 END
34.087u 0.616s 0:04.51 769.1%	0+0k 0+37824io 33pf+0w
bash-5.1$ x lapw1
  LAPW1 END
34.119u 0.629s 0:04.52 768.3%	0+0k 0+37824io 26pf+0w
bash-5.1$ x lapw1
  LAPW1 END
34.234u 0.579s 0:04.53 768.2%	0+0k 0+37824io 26pf+0w

bash-5.1$ export OMP_NUM_THREADS=12
bash-5.1$ echo $OMP_NUM_THREADS
12
bash-5.1$ x lapw1
  LAPW1 END
61.638u 2.193s 0:05.54 1151.9%	0+0k 0+37840io 44pf+0w

bash-5.1$ export OMP_NUM_THREADS=16
bash-5.1$ echo $OMP_NUM_THREADS
16
bash-5.1$ x lapw1
  LAPW1 END
82.629u 2.636s 0:05.55 1536.0%	0+0k 0+37840io 63pf+0w

bash-5.1$ export OMP_NUM_THREADS=24
bash-5.1$ echo $OMP_NUM_THREADS
24
bash-5.1$ x lapw1
  LAPW1 END
86.794u 3.724s 0:05.48 1651.6%	0+0k 0+37840io 57pf+0w




bash-5.1$ pwd
(...)/WIEN2k_benchmark/mpi-benchmark
bash-5.1$ export OMP_NUM_THREADS=1
bash-5.1$ echo $OMP_NUM_THREADS
1
bash-5.1$ x lapw1
  LAPW1 END
117.827u 0.921s 1:58.88 99.8%	0+0k 432+162616io 2pf+0w




On 2023-02-15 01:11, Laurence Marks wrote:
> Two things:
> 
> 1) The CPU you have looks interesting. Can you please run and post the
> benchmark from the Wien2k page for different omp (and mpi would be
> good). It would be good to know what the "Hybrid Core" architecture
> does with Wien2k. For mpi elpa is much better -- it can also be better
> for non-mpi.
> 
> 2) It is established lore in the DFT community that increasing the
> "smearing" assists convergence. However, not all lore is true. I am
> aware of zero evidence for this with the current Wien2k mixer, so I
> suggest sticking with room temperature rather than 1500K. More
> important is a well-posed problem. For more see
> http://www.numis.northwestern.edu/Presentations/DFT_Mixing_For_Dummies.pdf
> 
> On Tue, Feb 14, 2023 at 5:18 PM pluto via Wien
> <wien at zeus.theochem.tuwien.ac.at> wrote:
> 
>> Dear Prof. Blaha,
>> 
>> Thank you for comments.
>> 
>> At the moment I have 56 k-points in a big slab of one of the ternary
>> 
>> magnetic 2D materials. Perhaps I can reduce k-points, something to
>> test.
>> Also now I see that my 56 k-points are compatible with 1:localhost
>> lines
>> :-)
>> 
>> Also, for now it does not want to converge after 40 iterations with
>> TEMP
>> 0.002, for a while I was trying TEMP 0.004, and now I am trying TEMP
>> 
>> 0.01. Maybe I should start with a smaller slab...
>> 
>> Some info you asked for:
>> 
>> The i7-13700K CPU has 8 P-cores (fast) and 8 E-cores (slow), so 16
>> total
>> physical cores. Each P-core has 2 threads, so there are total of 24
>> threads. Many other new Intel CPUs are the same. I don't think there
>> is
>> an easy way to enforce certain task on a certain core, and probably
>> it
>> makes no sense, because the CPU for sure has thermal control over
>> different cores etc.
> 
>  --
> 
> Professor Laurence Marks
> Department of Materials Science and Engineering
> Northwestern University
> https://scholar.google.com/citations?user=zmHhI9gAAAAJ&hl=en [1]
> "Research is to see what everybody else has seen, and to think what
> nobody else has thought", Albert Szent-Györgyi
> 
> Links:
> ------
> [1] http://www.numis.northwestern.edu
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


More information about the Wien mailing list