<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
Dear Wien users,
<div class=""><br class="">
</div>
<div class="">The wien2k 18.2 I used is compiled in a share memory cluster with intel compiler 2019, mkl 2019 and impi 2019. Because ‘srun' cannot get a correct parallel calculation in the system, I commented the line of "#setenv WIEN_MPIRUN "srun -K -N_nodes_
-n_NP_ -r_offset_ _PINNING_ _EXEC_” in the parallel_options file and used the second choice "mpirun='mpirun -np _NP_ _EXEC_”.</div>
<div class=""><br class="">
</div>
<div class="">Parallel jobs go well in scf cycles. But when I increase k points (about 5000) to calculate DOS, the lapw1 crashed with the cgroup out-of-memory handler halfway. That is very strange. With same parameters, job runs well with single core. </div>
<div class=""><br class="">
</div>
<div class="">The similar problem is encountered on nlvdw_mpi step. I also increase memory up to 50G for this less than 10 atoms cell, but it still didn’t work.</div>
<div class=""><br class="">
</div>
<div class=""><b class="">[Parallel job output:]</b></div>
<div class="">
<div class=""><u class="">starting parallel lapw1 at lun. mai 11 16:24:48 CEST 2020</u></div>
<div class=""><u class="">-> starting parallel LAPW1 jobs at lun. mai 11 16:24:48 CEST 2020</u></div>
<div class=""><u class="">running LAPW1 in parallel mode (using .machines)</u></div>
<div class=""><u class="">1 number_of_parallel_jobs</u></div>
<div class=""><u class="">[1] 12604</u></div>
<div class=""><u class="">[1] + Done ( cd $PWD; $t $ttt; rm -f .lock_$lockfile[$p] ) >> .time1_$loop</u></div>
<div class=""><u class=""> lame25 lame25 lame25 lame25 lame25 lame25 lame25 lame25(5038) 4641.609u 123.862s 10:00.69 793.3% 0+0k 489064+2505080io 7642pf+0w</u></div>
<div class=""><u class=""> Summary of lapw1para:</u></div>
<div class=""><u class=""> lame25 k=0 user=0 wallclock=0</u></div>
<div class=""><u class="">** LAPW1 crashed!</u></div>
<div class=""><u class="">4643.674u 126.539s 10:03.50 790.4% 0+0k 490512+2507712io 7658pf+0w</u></div>
<div class=""><u class="">error: command /home/mcsete/work/wma/Package/wien2k.18n/lapw1para lapw1.def failed</u></div>
<div class=""><u class=""><b class="">slurmstepd: error: Detected 1 oom-kill event(s) in step 86112.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.</b></u></div>
</div>
<div class=""><br class="">
</div>
<div class=""><b class="">[Single mode output: ]</b></div>
<div class="">
<div class=""><u class=""> LAPW1 END</u></div>
<div class=""><u class="">11651.205u 178.664s 3:23:49.07 96.7% 0+0k 19808+22433688io 26pf+0w</u></div>
</div>
<div class=""><br class="">
</div>
<div class="">Do you have any ideas? Thank you in advance!</div>
<div class=""><br class="">
</div>
<div class="">Best regards,</div>
<div class="">Liang</div>
</body>
</html>