[Wien] Parallel calculation in more than 2 nodes
MA Weiliang
weiliang.MA at etu.univ-amu.fr
Tue Jul 21 12:48:06 CEST 2020
Dear WIEN2K users,
The cluster we used is a memory shared system with 16 cpus per node. The calculation distributed in 2 nodes with 32 cpus. But actually all the mpi processes were running in the first node according to the attached top ouput. There were not processes in the second nodes. As you can see, the usage of cpu is around 50%. It seemes that the calculation didn't distribute in 2 nodes, but only splitted the fisrt node (16 cpus) into 32 prcesses with half computing power.
Do you have any ideas for this problem? The .machines, wien2k info, dayfile and job output are attached below. Thank you!
Best,
Weiliang
#========================================#
# output of top
#----------------------------------------#
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
43504 mc 20 0 614m 262m 27m R 50.2 0.3 21:45.54 lapw1c_mpi
43507 mc 20 0 611m 259m 26m R 50.2 0.3 21:50.76 lapw1c_mpi
43514 mc 20 0 614m 255m 22m R 50.2 0.3 21:51.37 lapw1c_mpi
...
32 lines in total
...
43508 mc 20 0 615m 260m 23m R 49.5 0.3 21:43.73 lapw1c_mpi
43513 mc 20 0 616m 257m 22m R 49.5 0.3 21:51.32 lapw1c_mpi
43565 mc 20 0 562m 265m 24m R 49.5 0.3 21:43.29 lapw1c_mpi
#========================================#
# .machines file
#----------------------------------------#
1:lame26:16
1:lame28:16
lapw0: lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28
dstart: lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28
nlvdw: lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28
lapw2_vector_split:2
granularity:1
extrafine:1
#========================================#
# wien2k info
#----------------------------------------#
wien2k version: 18.2
complier: ifort, icc, mpiifort (intel 2017 compliers)
parallel option file: # setenv WIEN_MPIRUN "srun -K1 _EXEC_"
Because of compatible issues, we don't use srun by commented the WIEN_MPIRUN line in parallel option file and use the mpirun directly.
#========================================#
# dayfile
#----------------------------------------#
cycle 7 (Mon Jul 20 20:56:01 CEST 2020) (194/93 to go)
> lapw0 -p (20:56:01) starting parallel lapw0 at Mon Jul 20 20:56:01 CEST 2020
-------- .machine0 : 32 processors
0.087u 0.176s 0:17.87 1.3% 0+0k 0+112io 0pf+0w
> lapw1 -p -c (20:56:19) starting parallel lapw1 at Mon Jul 20 20:56:19 CEST 2020
-> starting parallel LAPW1 jobs at Mon Jul 20 20:56:20 CEST 2020
running LAPW1 in parallel mode (using .machines)
2 number_of_parallel_jobs
lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26(16) 0.022u 0.049s 56:37.88 0.0%
0+0k 0+8io 0pf+0w
lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28(16) 0.031u 0.038s 56:00.24 0.0%
0+0k 0+8io 0pf+0w
Summary of lapw1para:
lame26 k=0 user=0 wallclock=0
lame28 k=0 user=0 wallclock=0
18.849u 18.501s 56:40.85 1.0% 0+0k 0+1032io 0pf+0w
> lapwso -p -c (21:53:00) running LAPWSO in parallel mode
lame26 0.026u 0.044s 2:20:06.55 0.0% 0+0k 0+8io 0pf+0w
lame28 0.027u 0.043s 2:18:40.89 0.0% 0+0k 0+8io 0pf+0w
Summary of lapwsopara:
lame26 user=0.026 wallclock=140
lame28 user=0.027 wallclock=138
0.235u 2.621s 2:20:13.57 0.0% 0+0k 0+864io 0pf+0w
> lapw2 -p -c -so (00:13:14) running LAPW2 in parallel mode
lame26 0.023u 0.044s 4:58.20 0.0% 0+0k 0+8io 0pf+0w
lame28 0.024u 0.044s 5:02.58 0.0% 0+0k 0+8io 0pf+0w
Summary of lapw2para:
lame26 user=0.023 wallclock=298.2
lame28 user=0.024 wallclock=302.58
5.836u 1.057s 5:11.94 2.2% 0+0k 0+166184io 0pf+0w
> lcore (00:18:26) 1.576u 0.042s 0:02.06 78.1% 0+0k 0+12888io 0pf+0w
> mixer (00:18:30) 6.472u 0.687s 0:07.97 89.7% 0+0k 0+308832io 0pf+0w
:ENERGY convergence: 0 0.000005 .0001215250000000
:CHARGE convergence: 0 0.00005 .0002538
ec cc and fc_conv 0 0 1
#========================================#
# job output
#----------------------------------------#
in cycle 3 ETEST: .5230513600000000 CTEST: .0049036
LAPW0 END
[1] Done mpirun -np 32 /home/mcs/work/wma/Package/wien2k.18m/lapw0_mpi lapw0.def >> .time00
LAPW1 END
[1] - Done ( cd $PWD; $t $ttt; rm -f .lock_$lockfile[$p] ) >> .time1_$loop
LAPW1 END
[2] Done ( cd $PWD; $t $ttt; rm -f .lock_$lockfile[$p] ) >> .time1_$loop
LAPWSO END
LAPWSO END
[2] Done ( cd $PWD; $t $ttt; rm -f .lock_$lockfile[$p] ) >> .timeso_$loop
[1] + Done ( cd $PWD; $t $ttt; rm -f .lock_$lockfile[$p] ) >> .timeso_$loop
LAPW2 - FERMI; weights written
LAPW2 END
LAPW2 END
[2] Done ( cd $PWD; $t $ttt $vector_split; rm -f .lock_$lockfile[$p] ) >> .time2_$loop
[1] + Done ( cd $PWD; $t $ttt $vector_split; rm -f .lock_$lockfile[$p] ) >> .time2_$loop
SUMPARA END
CORE END
MIXER END
ec cc and fc_conv 0 0 1
More information about the Wien
mailing list