<div dir="auto"><div>Don't use Intel Hyper threading. Unless something drastic has changed it gets in the way.<div dir="auto"><br></div><div dir="auto">Beyong that there is no single answer. For small problems k-pt parallel is better, perhaps 2 threads. For medium problems (10-25 unique atoms) mpi with/without omp is better. For a large slab (50+ unique) mpi is needed, but you may run out of memory.</div><div dir="auto"><br></div><div dir="auto">Recommendation: install mpi & experiment.<br><br><div data-smartmail="gmail_signature" dir="auto">---<br>Professor Laurence Marks<br>Department of Materials Science and Engineering<br>Northwestern University<br><a href="http://www.numis.northwestern.edu">www.numis.northwestern.edu</a><br>"Research is to see what everybody else has seen, and to think what nobody else has thought" Albert Szent-Györgyi</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sun, Feb 12, 2023, 13:47 pluto via Wien <<a href="mailto:wien@zeus.theochem.tuwien.ac.at">wien@zeus.theochem.tuwien.ac.at</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Dear All,<br>

<br>

I am now using a machine with i7-13700K. This CPU has 8 performance <br>

cores (P-cores) and 8 efficient cores (E-cores). In addition each P-core <br>

has 2 threads, so there is 24 threads alltogether. It is hard to find <br>

some reasonable info online, but probably a P-core is approx. 2x faster <br>

than an E-core:<br>

<a href="https://www.anandtech.com/show/17047/the-intel-12th-gen-core-i912900k-review-hybrid-performance-brings-hybrid-complexity/10" rel="noreferrer noreferrer" target="_blank">https://www.anandtech.com/show/17047/the-intel-12th-gen-core-i912900k-review-hybrid-performance-brings-hybrid-complexity/10</a><br>

This will of course depend on what is being calculated...<br>

<br>

Do you have suggestions on how to optimize the .machines file for the <br>

parallel execution of an scf cycle?<br>

<br>

On my machine using OMP_NUM_THREADS leads to oscillations of the CPU use <br>

(for a large slab maybe 40% of time is spent on a single thread), <br>

suggesting that large OMP is not the optimal strategy.<br>

<br>

Some examples of strategies:<br>

<br>

One strategy would be to repeat the line<br>

1:localhost<br>

24 times, to have all the threads busy, and set OMP_NUM_THREADS=1.<br>

<br>

Another would be set the line<br>

1:localhost<br>

8 times and set OMP_NUM_THREADS=2, this would mean using all 16 physical <br>

cores.<br>

<br>

Or perhaps one should better "overload" the CPU e.g. by doing <br>

1:localhost 16 times and OMP=2 ?<br>

<br>

Over time I will try to benchmark some the different options, but <br>

perhaps there is some logic of how one should think about this.<br>

<br>

In addition I have a comment on .machines file. It seems that for the <br>

FM+SOC (runsp -so) calculations the<br>

<br>

omp_global<br>

<br>

setting in .machines is ignored. The<br>

<br>

omp_lapw1<br>

omp_lapw2<br>

<br>

settings seem to work fine. So, I tried to set OMP for lapwso <br>

separately, by including the line like:<br>

<br>

omp_lapwso:2<br>

<br>

but this gives an error when executing parallel scf.<br>

<br>

Best,<br>

Lukasz<br>

_______________________________________________<br>

Wien mailing list<br>

<a href="mailto:Wien@zeus.theochem.tuwien.ac.at" target="_blank" rel="noreferrer">Wien@zeus.theochem.tuwien.ac.at</a><br>

<a href="http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien" rel="noreferrer noreferrer" target="_blank">http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien</a><br>

SEARCH the MAILING-LIST at:  <a href="http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html" rel="noreferrer noreferrer" target="_blank">http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html</a><br>

</blockquote></div></div></div>