[Wien] .machines for several nodes
Ruh, Thomas
thomas.ruh at tuwien.ac.at
Mon Oct 12 09:29:37 CEST 2020
Hi,
your .machines is wrong.
The nodes for lapw1 are prefaced not with "lapw1:" but only with "1:". lapw2 needs no line, as it takes the same nodes as lapw1 before.
So an example for your usecase would be:
#
dstart:g008:4 g021:4 g025:4 g028:4
lapw0:g008:4 g021:4 g025:4 g028:4
1:g008:4 g021:4 g025:4 g028:4
granularity:1
extrafine:1
The line starting with "1:" has to be repeated (with different nodes, of course) x times, if you want to run x k-points in parallel (you can find more details about this in the usersguide, pages 84-91).
Regards,
Thomas
PS: As a sidenote: Both dstart and lapw0 parallelize over atoms, so 16 nodes might not be the best choice for your example.
________________________________
Von: Wien <wien-bounces at zeus.theochem.tuwien.ac.at> im Auftrag von Christian Søndergaard Pedersen <chrsop at dtu.dk>
Gesendet: Montag, 12. Oktober 2020 09:06
An: wien at zeus.theochem.tuwien.ac.at
Betreff: [Wien] .machines for several nodes
Hello everybody
I am new to WIEN2k, and am struggling with parallellizing calculations on our HPC cluster beyond what can be achieved using OMP. In particular, I want to execute run_lapw and/or runsp_lapw running on four identical nodes (16 cores each), parallellizing over k points (unless there's a more efficient scheme). To achieve this, I try to mimic the example from the User Guide (without the extra Alpha node), but my .machines-file does not work the way I intended. This is what I have:
#
dstart:g008:4 g021:4 g025:4 g028:4
lapw0:g008:4 g021:4 g025:4 g028:4
lapw1:g008:4 g021:4 g025:4 g028:4
lapw2:g008:4 g021:4 g025:4 g028:4
granularity:1
extrafine:1
The node names gxxx are read from SLURM_JOB_NODELIST in the submit script, and a couple of regular expressions generate the above lines. Afterwards, my job script does the following:
srun hostname -s > slurm.hosts
run_lapw -p
which results in a job that idles for the entire walltime and finishes with a CPU efficiency of 0.00%. I would appreciate any help in figuring out where I've gone wrong.
Best regards
Christian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20201012/91883323/attachment.htm>
More information about the Wien
mailing list