[Wien] Parallel execution on new Intel CPUs

Laurence Marks laurence.marks at gmail.com
Sun Feb 12 20:59:41 CET 2023


Don't use Intel Hyper threading. Unless something drastic has changed it
gets in the way.

Beyong that there is no single answer. For small problems k-pt parallel is
better, perhaps 2 threads. For medium problems (10-25 unique atoms) mpi
with/without omp is better. For a large slab (50+ unique) mpi is needed,
but you may run out of memory.

Recommendation: install mpi & experiment.

---
Professor Laurence Marks
Department of Materials Science and Engineering
Northwestern University
www.numis.northwestern.edu
"Research is to see what everybody else has seen, and to think what nobody
else has thought" Albert Szent-Györgyi

On Sun, Feb 12, 2023, 13:47 pluto via Wien <wien at zeus.theochem.tuwien.ac.at>
wrote:

> Dear All,
>
> I am now using a machine with i7-13700K. This CPU has 8 performance
> cores (P-cores) and 8 efficient cores (E-cores). In addition each P-core
> has 2 threads, so there is 24 threads alltogether. It is hard to find
> some reasonable info online, but probably a P-core is approx. 2x faster
> than an E-core:
>
> https://www.anandtech.com/show/17047/the-intel-12th-gen-core-i912900k-review-hybrid-performance-brings-hybrid-complexity/10
> This will of course depend on what is being calculated...
>
> Do you have suggestions on how to optimize the .machines file for the
> parallel execution of an scf cycle?
>
> On my machine using OMP_NUM_THREADS leads to oscillations of the CPU use
> (for a large slab maybe 40% of time is spent on a single thread),
> suggesting that large OMP is not the optimal strategy.
>
> Some examples of strategies:
>
> One strategy would be to repeat the line
> 1:localhost
> 24 times, to have all the threads busy, and set OMP_NUM_THREADS=1.
>
> Another would be set the line
> 1:localhost
> 8 times and set OMP_NUM_THREADS=2, this would mean using all 16 physical
> cores.
>
> Or perhaps one should better "overload" the CPU e.g. by doing
> 1:localhost 16 times and OMP=2 ?
>
> Over time I will try to benchmark some the different options, but
> perhaps there is some logic of how one should think about this.
>
> In addition I have a comment on .machines file. It seems that for the
> FM+SOC (runsp -so) calculations the
>
> omp_global
>
> setting in .machines is ignored. The
>
> omp_lapw1
> omp_lapw2
>
> settings seem to work fine. So, I tried to set OMP for lapwso
> separately, by including the line like:
>
> omp_lapwso:2
>
> but this gives an error when executing parallel scf.
>
> Best,
> Lukasz
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20230212/336fb828/attachment.htm>


More information about the Wien mailing list