[Wien] Parallel execution on new Intel CPUs
pluto
pluto at physics.ucdavis.edu
Sun Feb 12 20:47:39 CET 2023
Dear All,
I am now using a machine with i7-13700K. This CPU has 8 performance
cores (P-cores) and 8 efficient cores (E-cores). In addition each P-core
has 2 threads, so there is 24 threads alltogether. It is hard to find
some reasonable info online, but probably a P-core is approx. 2x faster
than an E-core:
https://www.anandtech.com/show/17047/the-intel-12th-gen-core-i912900k-review-hybrid-performance-brings-hybrid-complexity/10
This will of course depend on what is being calculated...
Do you have suggestions on how to optimize the .machines file for the
parallel execution of an scf cycle?
On my machine using OMP_NUM_THREADS leads to oscillations of the CPU use
(for a large slab maybe 40% of time is spent on a single thread),
suggesting that large OMP is not the optimal strategy.
Some examples of strategies:
One strategy would be to repeat the line
1:localhost
24 times, to have all the threads busy, and set OMP_NUM_THREADS=1.
Another would be set the line
1:localhost
8 times and set OMP_NUM_THREADS=2, this would mean using all 16 physical
cores.
Or perhaps one should better "overload" the CPU e.g. by doing
1:localhost 16 times and OMP=2 ?
Over time I will try to benchmark some the different options, but
perhaps there is some logic of how one should think about this.
In addition I have a comment on .machines file. It seems that for the
FM+SOC (runsp -so) calculations the
omp_global
setting in .machines is ignored. The
omp_lapw1
omp_lapw2
settings seem to work fine. So, I tried to set OMP for lapwso
separately, by including the line like:
omp_lapwso:2
but this gives an error when executing parallel scf.
Best,
Lukasz
More information about the Wien
mailing list