[Wien] Parallel calculation in more than 2 nodes

MA Weiliang weiliang.MA at etu.univ-amu.fr
Tue Jul 21 12:48:06 CEST 2020


Dear WIEN2K users,

The cluster we used is a memory shared system with 16 cpus per node. The calculation distributed in 2 nodes with 32 cpus. But actually all the mpi processes were running in the first node according to the attached top ouput. There were not processes in the second nodes.  As you can see, the usage of cpu is around 50%. It seemes that the calculation didn't distribute in 2 nodes, but only splitted the fisrt node (16 cpus) into 32 prcesses with half computing power. 

Do you have any ideas for this problem? The .machines, wien2k info, dayfile and job output are attached below. Thank you!

Best,
Weiliang


#========================================#
#  output of top
#----------------------------------------#
  PID USER     PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
43504 mc        20   0  614m 262m  27m R 50.2  0.3  21:45.54 lapw1c_mpi
43507 mc        20   0  611m 259m  26m R 50.2  0.3  21:50.76 lapw1c_mpi
43514 mc        20   0  614m 255m  22m R 50.2  0.3  21:51.37 lapw1c_mpi
...
32 lines in total
...                                                                           
43508 mc        20   0  615m 260m  23m R 49.5  0.3  21:43.73 lapw1c_mpi
43513 mc        20   0  616m 257m  22m R 49.5  0.3  21:51.32 lapw1c_mpi
43565 mc        20   0  562m 265m  24m R 49.5  0.3  21:43.29 lapw1c_mpi 


#========================================#
# .machines file
#----------------------------------------#
1:lame26:16
1:lame28:16
lapw0: lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28
dstart: lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28
nlvdw: lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28
lapw2_vector_split:2
granularity:1
extrafine:1


#========================================#
# wien2k info
#----------------------------------------#
wien2k version: 18.2
complier: ifort, icc, mpiifort (intel 2017 compliers)
parallel option file: # setenv WIEN_MPIRUN "srun -K1 _EXEC_"
Because of compatible issues, we don't use srun by commented the WIEN_MPIRUN line in parallel option file and use the mpirun directly.


#========================================#
# dayfile
#----------------------------------------#
    cycle 7     (Mon Jul 20 20:56:01 CEST 2020)         (194/93 to go)

>   lapw0  -p   (20:56:01) starting parallel lapw0 at Mon Jul 20 20:56:01 CEST 2020
-------- .machine0 : 32 processors
0.087u 0.176s 0:17.87 1.3%      0+0k 0+112io 0pf+0w
>   lapw1  -p   -c      (20:56:19) starting parallel lapw1 at Mon Jul 20 20:56:19 CEST 2020
->  starting parallel LAPW1 jobs at Mon Jul 20 20:56:20 CEST 2020
running LAPW1 in parallel mode (using .machines)
2 number_of_parallel_jobs
     lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26 lame26(16) 0.022u 0.049s 56:37.88 0.0%
    0+0k 0+8io 0pf+0w
     lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28 lame28(16) 0.031u 0.038s 56:00.24 0.0%
    0+0k 0+8io 0pf+0w
   Summary of lapw1para:
   lame26        k=0     user=0  wallclock=0
   lame28        k=0     user=0  wallclock=0
18.849u 18.501s 56:40.85 1.0%   0+0k 0+1032io 0pf+0w
>   lapwso -p -c        (21:53:00) running LAPWSO in parallel mode
      lame26 0.026u 0.044s 2:20:06.55 0.0% 0+0k 0+8io 0pf+0w
      lame28 0.027u 0.043s 2:18:40.89 0.0% 0+0k 0+8io 0pf+0w
   Summary of lapwsopara:
   lame26        user=0.026      wallclock=140
   lame28        user=0.027      wallclock=138
0.235u 2.621s 2:20:13.57 0.0%   0+0k 0+864io 0pf+0w
>   lapw2 -p    -c -so  (00:13:14) running LAPW2 in parallel mode
      lame26 0.023u 0.044s 4:58.20 0.0% 0+0k 0+8io 0pf+0w
      lame28 0.024u 0.044s 5:02.58 0.0% 0+0k 0+8io 0pf+0w
   Summary of lapw2para:
   lame26        user=0.023      wallclock=298.2
   lame28        user=0.024      wallclock=302.58
5.836u 1.057s 5:11.94 2.2%      0+0k 0+166184io 0pf+0w
>   lcore       (00:18:26) 1.576u 0.042s 0:02.06 78.1%  0+0k 0+12888io 0pf+0w
>   mixer       (00:18:30) 6.472u 0.687s 0:07.97 89.7%  0+0k 0+308832io 0pf+0w
:ENERGY convergence:  0 0.000005 .0001215250000000
:CHARGE convergence:  0 0.00005 .0002538
ec cc and fc_conv 0 0 1


#========================================#
# job output
#----------------------------------------#
in cycle 3    ETEST: .5230513600000000   CTEST: .0049036
 LAPW0 END
[1]    Done                          mpirun -np 32 /home/mcs/work/wma/Package/wien2k.18m/lapw0_mpi lapw0.def >> .time00
 LAPW1 END
[1]  - Done                          ( cd $PWD; $t $ttt; rm -f .lock_$lockfile[$p] ) >> .time1_$loop
 LAPW1 END
[2]    Done                          ( cd $PWD; $t $ttt; rm -f .lock_$lockfile[$p] ) >> .time1_$loop
LAPWSO END
LAPWSO END
[2]    Done                          ( cd $PWD; $t $ttt; rm -f .lock_$lockfile[$p] ) >> .timeso_$loop
[1]  + Done                          ( cd $PWD; $t $ttt; rm -f .lock_$lockfile[$p] ) >> .timeso_$loop
LAPW2 - FERMI; weights written
 LAPW2 END
 LAPW2 END
[2]    Done                          ( cd $PWD; $t $ttt $vector_split; rm -f .lock_$lockfile[$p] ) >> .time2_$loop
[1]  + Done                          ( cd $PWD; $t $ttt $vector_split; rm -f .lock_$lockfile[$p] ) >> .time2_$loop
 SUMPARA END
 CORE  END
 MIXER END
ec cc and fc_conv 0 0 1



More information about the Wien mailing list