<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style type="text/css" style="display:none;"><!-- P {margin-top:0;margin-bottom:0;} --></style>
</head>
<body dir="ltr">
<div id="divtagdefaultwrapper" style="font-size:12pt;color:#000000;font-family:Calibri,Helvetica,sans-serif;" dir="ltr">
<div id="divtagdefaultwrapper" dir="ltr" style="font-size: 12pt; color: rgb(0, 0, 0); font-family: Calibri, Helvetica, sans-serif, "EmojiFont", "Apple Color Emoji", "Segoe UI Emoji", NotoColorEmoji, "Segoe UI Symbol", "Android Emoji", EmojiSymbols;">
<p>Dear everybody</p>
<p><br>
</p>
<p>I am following up on this thread to report on two separate errors in my attempts to properly parallellize a calculation. For the first, a calculation utilized 0.00% of available CPU resources. My .machines file looks like this:</p>
<p><br>
</p>
<div><span style="font-family:"Courier New",monospace">#</span><br>
<span style="font-family:"Courier New",monospace">dstart:g004:8 g010:8 g011:8 g040:8</span><br>
<span style="font-family:"Courier New",monospace">lapw0:g004:8 g010:8 g011:8 g040:8</span><br>
<span style="font-family:"Courier New",monospace">1:g004:16</span><br>
<span style="font-family:"Courier New",monospace">1:g010:16</span><br>
<span style="font-family:"Courier New",monospace">1:g011:16</span><br>
<span style="font-family:"Courier New",monospace">1:g040:16</span><br>
</div>
<div><br>
</div>
With my submit script calling the following commands:
<p><br>
</p>
<p><span style="font-family:"Courier New",monospace">srun hostname -s > slurm.hosts</span></p>
<p><span style="font-family:"Courier New",monospace">run_lapw -p</span></p>
<p><span style="font-family:"Courier New",monospace">x qtl -p -telnes</span><br>
</p>
<p><br>
</p>
<p>Of course, the job didn't reach x qtl. The resultant case.dayfile is short, so I am dumping all of it here:<br>
</p>
<div><br>
<span style="font-family:"Courier New",monospace">Calculating test-machines in /path/to/directory</span><br>
<span style="font-family:"Courier New",monospace">on node.host.name.dtu.dk with PID XXXXX</span><br>
<span style="font-family:"Courier New",monospace">using WIEN2k_19.1 (Release 25/6/2019) in /path/to/installation/directory/WIEN2k/19.1-intel-2019a</span><br>
<br>
<br>
<span style="font-family:"Courier New",monospace"> start (Mon Oct 12 19:04:06 CEST 2020) with lapw0 (40/99 to go)</span><br>
<br>
<span style="font-family:"Courier New",monospace"> cycle 1 (Mon Oct 12 19:04:06 CEST 2020) (40/99 to go)</span><br>
<br>
<span style="font-family:"Courier New",monospace">> lapw0 -p (19:04:06) starting parallel lapw0 at Mon Oct 12 19:04:06 CEST 2020</span><br>
<span style="font-family:"Courier New",monospace">-------- .machine0 : 32 processors</span><br>
<span style="font-family:"Courier New",monospace">[1] 16095</span></div>
<div><br>
</div>
<div><br>
</div>
<div>The .machine0 file displays the lines</div>
<div><br>
</div>
<div><span style="font-family: "Courier New", monospace;">g004 [repeated for 8 lines</span><span style="font-family: "Courier New", monospace;">]</span></div>
<span style="font-family: "Courier New", monospace;"></span>
<div><span style="font-family: "Courier New", monospace;">g010 [repeated for 8 lines</span><span style="font-family: "Courier New", monospace;">]</span></div>
<span style="font-family: "Courier New", monospace;"></span>
<div><span style="font-family: "Courier New", monospace;">g011 [repeated for 8 lines</span><span style="font-family: "Courier New", monospace;">]</span></div>
<span style="font-family: "Courier New", monospace;"></span>
<div><span style="font-family: "Courier New", monospace;">g040 [repeated for 8 lines</span><span style="font-family: "Courier New", monospace;">]</span></div>
<div><br>
</div>
<div>which tells me that the .machines file works as intended, and that the cause of the problem is located somewhere else. Which brings me to the second error, which occured when I tried calling mpirun explicitly like so:</div>
<div><br>
</div>
<div><span style="font-family:"Courier New",monospace">srun hostname -s > slurm.hosts</span></div>
<div><span style="font-family:"Courier New",monospace">mpirun run_lapw -p</span></div>
<div><span style="font-family:"Courier New",monospace">mpirun qtl -p -telnes</span><br>
</div>
<div><br>
</div>
from within the job script. This crashed the job right away. The lapw0.error file prints out "Error in Parallel lapw0" and "check ERROR FILES!" a number of times. The case.clmsum file is present and looks correct, and the .machines file looks like the one from
before (with different node numbers). However, the .machine0 file now looks like:</div>
<div dir="ltr" style="font-size: 12pt; color: rgb(0, 0, 0); font-family: Calibri, Helvetica, sans-serif, "EmojiFont", "Apple Color Emoji", "Segoe UI Emoji", NotoColorEmoji, "Segoe UI Symbol", "Android Emoji", EmojiSymbols;">
<br>
</div>
<div dir="ltr" style="font-size: 12pt; color: rgb(0, 0, 0); font-family: Calibri, Helvetica, sans-serif, "EmojiFont", "Apple Color Emoji", "Segoe UI Emoji", NotoColorEmoji, "Segoe UI Symbol", "Android Emoji", EmojiSymbols;">
<div><span style="font-family: "Courier New", monospace;">g094</span><br>
<span style="font-family: "Courier New", monospace;">g094</span><br>
<span style="font-family: "Courier New", monospace;">g094</span><br>
<span style="font-family: "Courier New", monospace;">g081</span><br>
<span style="font-family: "Courier New", monospace;">g081</span><br>
<span style="font-family: "Courier New", monospace;">g08g094</span><br>
<span style="font-family: "Courier New", monospace;">g094</span><br>
<span style="font-family: "Courier New", monospace;">g094</span><br>
<span style="font-family: "Courier New", monospace;">g094</span><br>
<span style="font-family: "Courier New", monospace;">g094</span></div>
<span style="font-family: "Courier New", monospace;">[...]</span></div>
<div dir="ltr" style="font-size: 12pt; color: rgb(0, 0, 0); font-family: Calibri, Helvetica, sans-serif, "EmojiFont", "Apple Color Emoji", "Segoe UI Emoji", NotoColorEmoji, "Segoe UI Symbol", "Android Emoji", EmojiSymbols;">
<br>
</div>
<div dir="ltr" style="font-size: 12pt; color: rgb(0, 0, 0); font-family: Calibri, Helvetica, sans-serif, "EmojiFont", "Apple Color Emoji", "Segoe UI Emoji", NotoColorEmoji, "Segoe UI Symbol", "Android Emoji", EmojiSymbols;">
I.e. there's an error on line 6, where a node is not properly named and a line break is missing. The dayfile repeatedly prints out "> stop error" a total of sixteen times. I don't know if the above .machine0 file is the culprit, but it seems the obvious conclusion.
Any help in this matter will be much appreciated.</div>
<div dir="ltr" style="font-size: 12pt; color: rgb(0, 0, 0); font-family: Calibri, Helvetica, sans-serif, "EmojiFont", "Apple Color Emoji", "Segoe UI Emoji", NotoColorEmoji, "Segoe UI Symbol", "Android Emoji", EmojiSymbols;">
<br>
</div>
<div dir="ltr" style="font-size: 12pt; color: rgb(0, 0, 0); font-family: Calibri, Helvetica, sans-serif, "EmojiFont", "Apple Color Emoji", "Segoe UI Emoji", NotoColorEmoji, "Segoe UI Symbol", "Android Emoji", EmojiSymbols;">
Best regards</div>
<div dir="ltr" style="font-size: 12pt; color: rgb(0, 0, 0); font-family: Calibri, Helvetica, sans-serif, "EmojiFont", "Apple Color Emoji", "Segoe UI Emoji", NotoColorEmoji, "Segoe UI Symbol", "Android Emoji", EmojiSymbols;">
Christian<br>
</div>
</div>
</body>
</html>