[Wien] .machines for several nodes

Thu Oct 15 17:15:30 CEST 2020

Let me expand why not to use mpirun yourself, unless you are doing
something "special"

Wien2k uses the .machines file to setup how to use mpi and (in the most
recent versions) omp. As discussed by Peter, in most cases mpi is best with
close to square matrices and often powers of 2. OMP is good for having 2-4
cores collaborate, not more. Depending upon your architecture OMP may be
better than mpi or worse. (On my nodes mpi is always best; I know on some
of Peter's that OMP is better.)

The code internally sets the number of threads to use (for omp), and will
call mpirun or its equivalent depending upon what you have in
parallel_options. While many codes/programs are structured so they operate
mpi via "mpirun MyCode", Wien2k does not. The danger is that you will end
up with multiple copies of run_lapw running which is not what you want.

There might be special cases where you would want to use "mpirun run_lapw"
to remotely start a single version, but until you know how to use Wien2k do
not go this complicated, it is likely to create problems.

On Thu, Oct 15, 2020 at 5:01 AM Laurence Marks <laurence.marks at gmail.com>
wrote:

> As an addendum to what Peter said, "mpirun run_lapw" is totally wrong.
> Remove the mpirun.
>
> _____
> Professor Laurence Marks
> "Research is to see what everybody else has seen, and to think what nobody
> else has thought", Albert Szent-Gyorgi
> www.numis.northwestern.edu
>
> On Thu, Oct 15, 2020, 03:35 Peter Blaha <pblaha at theochem.tuwien.ac.at>
> wrote:
>
>> Well, 99% cpu efficiency does not mean that you run efficiently, but my
>> estimat is that you run at least 2 times slower than what is possible.
>>
>> Anyway, please save the dayfile and compare the wall time of the
>> different parts with a different setup.
>>
>> At least now we know that you have 24 cores/node. So the lapw0/dstart
>> lines are perfectly ok.
>>
>> However, lapw1 you run on 3 mpi cores. This is "maximally inefficient".
>> This gives a division of your matrix into 3x1, but it should be as close
>> as possible to an even decomposition. So 4x4=16 or 8x8=64 cores is
>> optimal. With your 24 cores and 96 atom/cell I'd probably go for 12
>> cores in mpi and 2-kparallel jobs per node:
>>
>> 1:x073:12
>> 1:x082:12
>> 1:x073:12
>> 1:x082:12
>>
>> Maybe one can even overload the nodes a bit using 16 instead of 12
>> cores, but this could be dangerous on some machines because of your
>> admins might have forced cpu-binding, .... (You can even change the
>> .machines file (12-->16) "by hand" while your job is running (and maybe
>> change it back once you have seen whether timing is better or worse).
>>
>> In any case, compare the timeings in the dayfile in order to find the
>> optimal setup.
>> _______________________________________________
>> Wien mailing list
>> Wien at zeus.theochem.tuwien.ac.at
>>
>> https://urldefense.com/v3/__http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien__;!!Dq0X2DkFhyF93HkjWTBQKhk!A8sB9-qFfbOGiCLPnA6iSE84ZZQy6mW4l0zuzz3NpWm1Wmn2GKqNPUMWg1UBjmQOGPID6g$
>> SEARCH the MAILING-LIST at:
>> https://urldefense.com/v3/__http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html__;!!Dq0X2DkFhyF93HkjWTBQKhk!A8sB9-qFfbOGiCLPnA6iSE84ZZQy6mW4l0zuzz3NpWm1Wmn2GKqNPUMWg1UBjmSyxhK3Ng$
>>
>

-- 
Professor Laurence Marks
Department of Materials Science and Engineering
Northwestern University
www.numis.northwestern.edu
Corrosion in 4D: www.numis.northwestern.edu/MURI
Co-Editor, Acta Cryst A
"Research is to see what everybody else has seen, and to think what nobody
else has thought"
Albert Szent-Gyorgi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20201015/93880e9a/attachment.htm>