<html>
<head>
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
</head>
<body>
<p>Regarding [1], I did expect that you would have to submit the
commands within your job script via the SLURM workload manager on
your system with something like [5,6]<br>
</p>
<p><br>
</p>
<p> sbatch my_job_script.job<br>
</p>
<p><br>
</p>
<p> or by whatever method you have to use on your system.
Where, the commands at [7] are in the job file, such as:<br>
</p>
<p><br>
</p>
<p> my_job_script.job</p>
<p> -------------------------------------<br>
</p>
<p> #!/bin/bash<br>
</p>
<p> # ...</p>
<p> run_lapw -p<br>
x -qtl -p -telnes<br>
x telnes3<br>
</p>
<p> -------------------------------------</p>
<p><br>
</p>
<p> In my case, I don't have SLURM. So I'm unable to do any
testing in that environment. Maybe someone else in the mailing
list has a SLURM system that check if they are encountering the
same problem that you are having.<br>
</p>
<p><br>
</p>
<p> [5]
<a class="moz-txt-link-freetext" href="https://www.hpc2n.umu.se/documentation/batchsystem/basic-submit-example-scripts">https://www.hpc2n.umu.se/documentation/batchsystem/basic-submit-example-scripts</a><br>
</p>
<p> [6] <a class="moz-txt-link-freetext" href="https://doku.lrz.de/display/PUBLIC/WIEN2k">https://doku.lrz.de/display/PUBLIC/WIEN2k</a><br>
</p>
<p> [7]
<a class="moz-txt-link-freetext" href="https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg20597.html">https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg20597.html</a><br>
</p>
<p><br>
</p>
<p>Regarding [2], good to read that mpi parallel with "x -qtl -p
-telnes" works fine on your system with Vanadium Dioxide (VO2).
If you have control of what nodes the calculation will run on,
does the VO2 run fine on your 1st node (e.g., x073 [8]) with
multiple cores of a single CPU, then does it run fine on the 2nd
node (e.g., x082) with multiple cores of a single CPU? I have
read at [9] that some schedule managers automatically assign the
nodes on the fly such that the user might have no control in some
case on which nodes the job will run on. Does the VO2 run fine
with mpi parallel using 1 processor core on node 1 and 1 processor
core on node 2, if your able to control that as it may help to
narrow down the problem? <br>
</p>
<p><br>
</p>
<p> [8]
<a class="moz-txt-link-freetext" href="https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg20617.html">https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg20617.html</a><br>
</p>
<p> [9] <a class="moz-txt-link-freetext" href="http://susi.theochem.tuwien.ac.at/reg_user/faq/pbs.html">http://susi.theochem.tuwien.ac.at/reg_user/faq/pbs.html</a></p>
<p><br>
</p>
<p>Regarding [3], the output you posted looks as expected. So
nothing wrong with that.</p>
<p><br>
</p>
<p> In the past, I posted in the mailing list some things that I
found helpful for troubleshooting parallel issues, but you would
have to search the mailing list to find them. I believe a couple
of them may have been at the following two links:<br>
</p>
<p><br>
</p>
<p> [10]
<a class="moz-txt-link-freetext" href="https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg17973.html">https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg17973.html</a></p>
<p> [11]
<a class="moz-txt-link-freetext" href="http://zeus.theochem.tuwien.ac.at/pipermail/wien/2018-April/027944.html">http://zeus.theochem.tuwien.ac.at/pipermail/wien/2018-April/027944.html</a></p>
<p><br>
</p>
<p>Lastly, I have now tried a WIEN2k 19.2 calculation using mpi
parallel on my system with the struct file at
<a class="moz-txt-link-freetext" href="https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg20645.html">https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg20645.html</a>
.<br>
</p>
<p><br>
</p>
<p> It looks like it ran fine when it was set to run on two of the
four processors on my system:<br>
</p>
<p><br>
</p>
<p>username@computername:~/wiendata/diamond$ ls ~/wiendata/scratch<br>
username@computername:~/wiendata/diamond$ ls<br>
diamond.struct<br>
username@computername:~/wiendata/diamond$ init_lapw -b<br>
...<br>
username@computername:~/wiendata/diamond$ cat
$WIENROOT/parallel_options<br>
setenv TASKSET "no"<br>
if ( ! $?USE_REMOTE ) setenv USE_REMOTE 1<br>
if ( ! $?MPI_REMOTE ) setenv MPI_REMOTE 1<br>
setenv WIEN_GRANULARITY 1<br>
setenv DELAY 0.1<br>
setenv SLEEPY 1<br>
username@computername:~/wiendata/diamond$ cat .machines<br>
1:localhost:2<br>
granularity:1<br>
extrafine:1<br>
username@computername:~/wiendata/diamond$ run_lapw -p<br>
...<br>
in cycle 11 ETEST: .0001457550000000 CTEST: .0033029<br>
hup: Command not found.<br>
STOP LAPW0 END<br>
STOP LAPW1 END<br>
<br>
real 0m6.744s<br>
user 0m12.679s<br>
sys 0m0.511s<br>
STOP LAPW2 - FERMI; weights written<br>
STOP LAPW2 END<br>
<br>
real 0m1.123s<br>
user 0m1.785s<br>
sys 0m0.190s<br>
STOP SUMPARA END<br>
STOP CORE END<br>
STOP MIXER END<br>
ec cc and fc_conv 1 1 1<br>
<br>
> stop<br>
username@computername:~/wiendata/diamond$ cp
$WIENROOT/SRC_templates/case.innes diamond.innes<br>
username@computername:~/wiendata/diamond$ x qtl -p -telnes<br>
running QTL in parallel mode<br>
calculating QTL's from parallel vectors<br>
STOP QTL END<br>
6.5u 0.0s 0:06.77 98.3% 0+0k 928+8080io 4pf+0w<br>
username@computername:~/wiendata/diamond$ cat diamond.inq<br>
0 2.20000000000000000000<br>
1<br>
1 99 1 0<br>
4 0 1 2 3<br>
username@computername:~/wiendata/diamond$ x telnes3<br>
STOP TELNES3 DONE<br>
3.2u 0.0s 0:03.39 98.8% 0+0k 984+96io 3pf+0w<br>
username@computername:~/wiendata/diamond$ ls -l ~/wiendata/scratch<br>
total 624<br>
-rw-rw-r-- 1 username username 0 Oct 24 15:40 diamond.vector<br>
-rw-rw-r-- 1 username username 637094 Oct 24 15:43
diamond.vector_1<br>
-rw-rw-r-- 1 username username 0 Oct 24 15:44
diamond.vectordn<br>
-rw-rw-r-- 1 username username 0 Oct 24 15:44
diamond.vectordn_1<br>
</p>
<p><br>
</p>
<div class="moz-cite-prefix">On 10/24/2020 2:30 PM, Christian
Søndergaard Pedersen wrote:<br>
</div>
<blockquote type="cite"
cite="mid:aa947190de19456db9eca58775388ed2@dtu.dk">
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
<style type="text/css" style="display:none;"><!-- P {margin-top:0;margin-bottom:0;}</style>
<div id="divtagdefaultwrapper"
style="font-size:12pt;color:#000000;font-family:Calibri,Helvetica,sans-serif;"
dir="ltr">
<div id="divtagdefaultwrapper" dir="ltr" style="font-size: 12pt;
color: rgb(0, 0, 0); font-family: Calibri, Helvetica,
sans-serif, "EmojiFont", "Apple Color
Emoji", "Segoe UI Emoji", NotoColorEmoji,
"Segoe UI Symbol", "Android Emoji",
EmojiSymbols;">
<p>Hello Gavin</p>
<p><br>
</p>
<p>Thanks for your reply, and apologies for my tardiness. </p>
<p><br>
</p>
<p>[1] All my calculations are run in MPI-parallel on our HPC
cluster. I cannot execute any 'x lapw[0,1,2] -p' command in
the terminal (on the cluster login node); this results in
'pbsssh: command not found'. However, submitting via the
SLURM workload manager works fine. In all my submit scripts,
I specify 'setenv SCRATCH /scratch/$USER', which is the
proper location of scratch storage on our HPC cluster.</p>
<p><br>
</p>
<p>[2] Without having tried your example for diamond, I can
report that 'run_lapw -p' followed by 'x qtl -p -telnes'
works without problems for a single cell of Vanadium
dioxide. However, for other systems I get the error I
specified. The other systems (1) are larger, and (2) use two
CPU's instead of a single CPU (.machines file are modified
suitably).</p>
<p>Checking the qtl.def file for the calculation that _did_
work, I can see that the line specifying
<span>'/scratch/chrsop/VO2.vectordn'</span> is _also_
present here, so this is not to blame. This leaves me
baffled as to what the error can be - as far as I can tell,
I am trying to perform the exact same calculation for
different systems. I thought maybe insufficient scratch
storage could be to blame, but this would most likely show
up in the 'run_lapw' cycles (I believe).</p>
<p><br>
</p>
<p>[3] I am posting here the difference between qtlpara and
lapw2para:</p>
<div><span style="font-family:"Courier
New",monospace"> </span><span
style="font-family:"Courier New",monospace">$
grep "single" $WIENROOT/qtlpara_lapw</span><br>
<span style="font-family:"Courier New",monospace">
testinput .processes single</span><br>
<span style="font-family:"Courier New",monospace">
$ grep "single" $WIENROOT/lapw2para_lapw</span><br>
<span style="font-family:"Courier New",monospace">
testinput .processes single</span><br>
<span style="font-family:"Courier New",monospace">
single:</span><br>
<span style="font-family:"Courier New",monospace">
echo "running in single mode"</span></div>
<div><br>
</div>
<div>... if this is wrong, I kindly request advice on how to
fix it, so I can pass it on to our software maintenance guy.
If there's anything else I can try please let me know.</div>
<div><br>
</div>
<div>Best regards<br>
Christian<br>
</div>
</div>
</div>
</blockquote>
</body>
</html>