<div dir="ltr"><div class="gmail_default" style="font-family:verdana,sans-serif;color:#000000">I suggest that you talk to a sysadmin to get some clarification. In particular, see if this is just memory, or a combination of memory and file space. From what I can see it is probably memory, but there seems to be some flexibility in how it is configured.</div><div class="gmail_default" style="font-family:verdana,sans-serif;color:#000000"><br></div><div class="gmail_default" style="font-family:verdana,sans-serif;color:#000000">One other possibility is a memory leak. What mpi are you using?</div><div class="gmail_default" style="font-family:verdana,sans-serif;color:#000000"><br></div><div class="gmail_default" style="font-family:verdana,sans-serif;color:#000000">N.B., I would be a bit concerned that srun is not working for you. Talk to a sysadmin, you might be running outside/around your memory allocation.</div><div class="gmail_default" style="font-family:verdana,sans-serif;color:#000000"><br></div><div class="gmail_default" style="font-family:verdana,sans-serif;color:#000000">Two relevant sources:</div><div class="gmail_default" style="font-family:verdana,sans-serif;color:#000000"><a href="https://community.pivotal.io/s/article/the-application-crashes-with-the-message-cgroup-out-of-memory?language=en_US">https://community.pivotal.io/s/article/the-application-crashes-with-the-message-cgroup-out-of-memory?language=en_US</a> <br></div><div class="gmail_default" style="font-family:verdana,sans-serif;color:#000000"><a href="https://bugs.schedmd.com/show_bug.cgi?id=2614">https://bugs.schedmd.com/show_bug.cgi?id=2614</a> <br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, May 11, 2020 at 10:55 AM MA Weiliang <<a href="mailto:weiliang.MA@etu.univ-amu.fr">weiliang.MA@etu.univ-amu.fr</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div style="overflow-wrap: break-word;">
Dear Wien users,
<div><br>
</div>
<div>The wien2k 18.2 I used is compiled in a share memory cluster with intel compiler 2019, mkl 2019 and impi 2019. Because ‘srun' cannot get a correct parallel calculation in the system, I commented the line of "#setenv WIEN_MPIRUN "srun -K -N_nodes_
-n_NP_ -r_offset_ _PINNING_ _EXEC_” in the parallel_options file and used the second choice "mpirun='mpirun -np _NP_ _EXEC_”.</div>
<div><br>
</div>
<div>Parallel jobs go well in scf cycles. But when I increase k points (about 5000) to calculate DOS, the lapw1 crashed with the cgroup out-of-memory handler halfway. That is very strange. With same parameters, job runs well with single core. </div>
<div><br>
</div>
<div>The similar problem is encountered on nlvdw_mpi step. I also increase memory up to 50G for this less than 10 atoms cell, but it still didn’t work.</div>
<div><br>
</div>
<div><b>[Parallel job output:]</b></div>
<div>
<div><u>starting parallel lapw1 at lun. mai 11 16:24:48 CEST 2020</u></div>
<div><u>-> starting parallel LAPW1 jobs at lun. mai 11 16:24:48 CEST 2020</u></div>
<div><u>running LAPW1 in parallel mode (using .machines)</u></div>
<div><u>1 number_of_parallel_jobs</u></div>
<div><u>[1] 12604</u></div>
<div><u>[1] + Done ( cd $PWD; $t $ttt; rm -f .lock_$lockfile[$p] ) >> .time1_$loop</u></div>
<div><u> lame25 lame25 lame25 lame25 lame25 lame25 lame25 lame25(5038) 4641.609u 123.862s 10:00.69 793.3% 0+0k 489064+2505080io 7642pf+0w</u></div>
<div><u> Summary of lapw1para:</u></div>
<div><u> lame25 k=0 user=0 wallclock=0</u></div>
<div><u>** LAPW1 crashed!</u></div>
<div><u>4643.674u 126.539s 10:03.50 790.4% 0+0k 490512+2507712io 7658pf+0w</u></div>
<div><u>error: command /home/mcsete/work/wma/Package/wien2k.18n/lapw1para lapw1.def failed</u></div>
<div><u><b>slurmstepd: error: Detected 1 oom-kill event(s) in step 86112.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.</b></u></div>
</div>
<div><br>
</div>
<div><b>[Single mode output: ]</b></div>
<div>
<div><u> LAPW1 END</u></div>
<div><u>11651.205u 178.664s 3:23:49.07 96.7% 0+0k 19808+22433688io 26pf+0w</u></div>
</div>
<div><br>
</div>
<div>Do you have any ideas? Thank you in advance!</div>
<div><br>
</div>
<div>Best regards,</div>
<div>Liang</div>
</div>
_______________________________________________<br>
Wien mailing list<br>
<a href="mailto:Wien@zeus.theochem.tuwien.ac.at" target="_blank">Wien@zeus.theochem.tuwien.ac.at</a><br>
<a href="https://urldefense.com/v3/__http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien__;!!Dq0X2DkFhyF93HkjWTBQKhk!FzOfv9gey6e2nE6BL116Cgoy1UpRBalprajLfQ67QqwAytt0uPXPCtFoozTIGaBYJjay5Q$" rel="noreferrer" target="_blank">https://urldefense.com/v3/__http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien__;!!Dq0X2DkFhyF93HkjWTBQKhk!FzOfv9gey6e2nE6BL116Cgoy1UpRBalprajLfQ67QqwAytt0uPXPCtFoozTIGaBYJjay5Q$</a> <br>
SEARCH the MAILING-LIST at: <a href="https://urldefense.com/v3/__http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html__;!!Dq0X2DkFhyF93HkjWTBQKhk!FzOfv9gey6e2nE6BL116Cgoy1UpRBalprajLfQ67QqwAytt0uPXPCtFoozTIGaAW3twYHA$" rel="noreferrer" target="_blank">https://urldefense.com/v3/__http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html__;!!Dq0X2DkFhyF93HkjWTBQKhk!FzOfv9gey6e2nE6BL116Cgoy1UpRBalprajLfQ67QqwAytt0uPXPCtFoozTIGaAW3twYHA$</a> <br>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr">Professor Laurence Marks<br>Department of Materials Science and Engineering<br>Northwestern University<br><a href="http://www.numis.northwestern.edu/" target="_blank">www.numis.northwestern.edu</a><div>Corrosion in 4D: <a href="http://www.numis.northwestern.edu/MURI" target="_blank">www.numis.northwestern.edu/MURI</a><br>Co-Editor, Acta Cryst A<br>"Research is to see what everybody else has seen, and to think what nobody else has thought"<br>Albert Szent-Gyorgi</div></div></div>