<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">The .machines file looks fine to me,
but one of the others might see something that I didn't notice
(besides the WIEN2k command not being there at the bottom of the
file - likely missed in the copy and paste).<br>
<br>
The main problem seems to the "bash: lapw1: command not found"
unless something happened earlier that is not shown. Tracking
down parallel error messages is more complicated. Unlike a serial
calculation that can output the standard output and error to the
display of a terminal on a desktop, a parallel calculation on a
cluster with a queue system can put them in a standard output (-o)
and standard error file (-e) or a combined output/error file (-j)
with user specified name(s) [1,2]. They can also be written to
the hidden dot files like .time* or .stdout* as mentioned before
[3,4,5].<br>
<br>
The "lapw1: command not found" might be because $WIENROOT didn't
get added to the PATH on one of the nodes [
<a class="moz-txt-link-freetext" href="http://www.supercluster.org/pipermail/torqueusers/2010-March/010143.html">http://www.supercluster.org/pipermail/torqueusers/2010-March/010143.html</a>
]. Did you try checking if the path to WIEN2k is in the PATH,
such as PBS_O_PATH with qstat -f jobid [
<a class="moz-txt-link-freetext" href="http://stackoverflow.com/questions/21248406/sleep-command-not-found-in-torque-pbs-but-works-in-shell">http://stackoverflow.com/questions/21248406/sleep-command-not-found-in-torque-pbs-but-works-in-shell</a>
].<br>
<br>
Did you try to ssh into all 8 nodes and see if you can see lapw1
on each node? For example,<br>
<br>
ssh n024<br>
ls -l $WIENROOT/lapw1<br>
<br>
ssh n225<br>
ls -l $WIENROOT/lapw1<br>
<br>
...<br>
<br>
Above, I'm just guessing about the commands/configuration for your
system, but the administrator or helpdesk for your cluster should
know everything about your system and be able to help you much
better with resolving the command not found error.<br>
<br>
[1] <a class="moz-txt-link-freetext" href="http://beige.ucs.indiana.edu/I590/node39.html">http://beige.ucs.indiana.edu/I590/node39.html</a><br>
[2]
<a class="moz-txt-link-freetext" href="https://wikis.nyu.edu/display/NYUHPC/Tutorial+-+Submitting+a+job+using+qsub">https://wikis.nyu.edu/display/NYUHPC/Tutorial+-+Submitting+a+job+using+qsub</a><br>
[3]
<a class="moz-txt-link-freetext" href="http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg13598.html">http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg13598.html</a><br>
[4]
<a class="moz-txt-link-freetext" href="http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg14148.html">http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg14148.html</a><br>
[5]
<a class="moz-txt-link-freetext" href="http://zeus.theochem.tuwien.ac.at/pipermail/wien/2017-March/026109.html">http://zeus.theochem.tuwien.ac.at/pipermail/wien/2017-March/026109.html</a><br>
<br>
On 3/13/2017 1:25 PM, shaymlal dayananda wrote:<br>
</div>
<blockquote
cite="mid:428444301.4638172.1489433102681@mail.yahoo.com"
type="cite">
<div style="color:#000; background-color:#fff; font-family:times
new roman, new york, times, serif;font-size:16px">
<div id="yui_3_16_0_ym19_1_1489431743651_4226">Dear developers
and users</div>
<div dir="ltr" id="yui_3_16_0_ym19_1_1489431743651_4377"><br>
</div>
<div dir="ltr" id="yui_3_16_0_ym19_1_1489431743651_20790">I was
trying to do a volume optimization and scf calculation with
spin polarization in parallel mode. But my both the jobs
crashes and I got the following error file. However both cases
run correctly when parallel mode is removed.<br>
</div>
<div dir="ltr" id="yui_3_16_0_ym19_1_1489431743651_6001">............................................................................<br>
</div>
<div dir="ltr" id="yui_3_16_0_ym19_1_1489431743651_6010">'LAPW2'
- can't open unit:
30 <br
id="yui_3_16_0_ym19_1_1489431743651_6045">
'LAPW2' - filename:
case.energyup_1 <br
id="yui_3_16_0_ym19_1_1489431743651_6009">
** testerror: Error in Parallel LAPW2</div>
<div dir="ltr" id="yui_3_16_0_ym19_1_1489431743651_7617">.................................................................................<br>
</div>
<div dir="ltr" id="yui_3_16_0_ym19_1_1489431743651_7618">Also in
STDOUT , I see the following particular errors. (<br>
</div>
<div dir="ltr" id="yui_3_16_0_ym19_1_1489431743651_12516"><br>
</div>
<div dir="ltr" id="yui_3_16_0_ym19_1_1489431743651_12534">.......................................................................<br>
</div>
<div dir="ltr" id="yui_3_16_0_ym19_1_1489431743651_10909">bash:
lapw1: command not found</div>
<div dir="ltr" id="yui_3_16_0_ym19_1_1489431743651_12559">...<br>
</div>
<div dir="ltr" id="yui_3_16_0_ym19_1_1489431743651_12560">....</div>
<div dir="ltr" id="yui_3_16_0_ym19_1_1489431743651_12561">.....<br>
</div>
<div dir="ltr" id="yui_3_16_0_ym19_1_1489431743651_12517">FERMI
- Error<br id="yui_3_16_0_ym19_1_1489431743651_12524">
grep: *scf1dn*: No such file or directory<br
id="yui_3_16_0_ym19_1_1489431743651_12525">
0.381u 0.507s 1:12.66 1.2% 0+0k 128+1736io 1pf+0w<br
id="yui_3_16_0_ym19_1_1489431743651_12526">
Test-TiC-VOl-parallel.scf1dn_1: No such file or directory.</div>
<div dir="ltr" id="yui_3_16_0_ym19_1_1489431743651_12543">.............................................................................</div>
<div dir="ltr" id="yui_3_16_0_ym19_1_1489431743651_12580"><br>
</div>
<div dir="ltr" id="yui_3_16_0_ym19_1_1489431743651_12581"><br>
</div>
<div dir="ltr" id="yui_3_16_0_ym19_1_1489431743651_12582">I
copied my machine file and the job file here. But I think this
is not correct and I am not sure whether I needs to have lines
for lapw2 and lapwsp separately. Any help to get corrected
this is highly appreciated. <br>
</div>
<div dir="ltr" id="yui_3_16_0_ym19_1_1489431743651_15855"><br>
</div>
<div dir="ltr">".machnes" file</div>
<div dir="ltr">.............................</div>
<div dir="ltr" id="yui_3_16_0_ym19_1_1489431743651_17459">#<br
id="yui_3_16_0_ym19_1_1489431743651_17448">
lapw0:n024 n225 n220 n218 n045 n044 n043 n043 <br
id="yui_3_16_0_ym19_1_1489431743651_17449">
1:n024<br id="yui_3_16_0_ym19_1_1489431743651_17450">
1:n225<br id="yui_3_16_0_ym19_1_1489431743651_17451">
1:n220<br id="yui_3_16_0_ym19_1_1489431743651_17452">
1:n218<br id="yui_3_16_0_ym19_1_1489431743651_17453">
1:n045<br id="yui_3_16_0_ym19_1_1489431743651_17454">
1:n044<br id="yui_3_16_0_ym19_1_1489431743651_17455">
1:n043<br id="yui_3_16_0_ym19_1_1489431743651_17456">
1:n043<br id="yui_3_16_0_ym19_1_1489431743651_17457">
granularity:1<br id="yui_3_16_0_ym19_1_1489431743651_17458">
extrafine:1</div>
<div dir="ltr" id="yui_3_16_0_ym19_1_1489431743651_17460"><br>
</div>
<div dir="ltr">......................................................</div>
<div dir="ltr" id="yui_3_16_0_ym19_1_1489431743651_17468"><br>
</div>
<div dir="ltr" id="yui_3_16_0_ym19_1_1489431743651_17469">job
file is copied below.</div>
<div dir="ltr"><br>
</div>
<div dir="ltr" id="yui_3_16_0_ym19_1_1489431743651_19061"><br>
</div>
<div dir="ltr" id="yui_3_16_0_ym19_1_1489431743651_19182">#
example for 8 nodes<br
id="yui_3_16_0_ym19_1_1489431743651_19143">
#PBS -l procs=8<br id="yui_3_16_0_ym19_1_1489431743651_19144">
#PBS -l pmem=2048mb<br
id="yui_3_16_0_ym19_1_1489431743651_19145">
#PBS -l walltime=4:00:00 <br
id="yui_3_16_0_ym19_1_1489431743651_19146">
<br id="yui_3_16_0_ym19_1_1489431743651_19147">
module load wien2k<br
id="yui_3_16_0_ym19_1_1489431743651_19148">
<br id="yui_3_16_0_ym19_1_1489431743651_19149">
# change into your working directory<br
id="yui_3_16_0_ym19_1_1489431743651_19150">
cd $PBS_O_WORKDIR<br
id="yui_3_16_0_ym19_1_1489431743651_19151">
#start creating .machines<br
id="yui_3_16_0_ym19_1_1489431743651_19153">
cat $PBS_NODEFILE |cut -c1-6 >.machines_current<br
id="yui_3_16_0_ym19_1_1489431743651_19154">
aa=`cat .machines_current | wc -l`<br
id="yui_3_16_0_ym19_1_1489431743651_19155">
echo '#' > .machines<br
id="yui_3_16_0_ym19_1_1489431743651_19156">
<br id="yui_3_16_0_ym19_1_1489431743651_19157">
# example for an MPI parallel lapw0 <br
id="yui_3_16_0_ym19_1_1489431743651_19158">
echo -n 'lapw0:' >> .machines<br
id="yui_3_16_0_ym19_1_1489431743651_19159">
i=1<br id="yui_3_16_0_ym19_1_1489431743651_19160">
while [ $i -lt $aa ]<br
id="yui_3_16_0_ym19_1_1489431743651_19161">
do<br id="yui_3_16_0_ym19_1_1489431743651_19162">
echo -n `cat $PBS_NODEFILE |head -$i | tail -1` ' '
>>.machines<br
id="yui_3_16_0_ym19_1_1489431743651_19163">
i=$((i+1))<br id="yui_3_16_0_ym19_1_1489431743651_19164">
done<br id="yui_3_16_0_ym19_1_1489431743651_19165">
echo `cat $PBS_NODEFILE |head -$i|tail -1` ' '
>>.machines<br
id="yui_3_16_0_ym19_1_1489431743651_19166">
<br id="yui_3_16_0_ym19_1_1489431743651_19167">
#example for k-point parallel lapw1/2<br
id="yui_3_16_0_ym19_1_1489431743651_19168">
i=1<br id="yui_3_16_0_ym19_1_1489431743651_19169">
while [ $i -le $aa ]<br
id="yui_3_16_0_ym19_1_1489431743651_19170">
do<br id="yui_3_16_0_ym19_1_1489431743651_19171">
echo -n '1:' >>.machines<br
id="yui_3_16_0_ym19_1_1489431743651_19172">
head -$i .machines_current |tail -1 >> .machines<br
id="yui_3_16_0_ym19_1_1489431743651_19173">
i=$((i+1))<br id="yui_3_16_0_ym19_1_1489431743651_19174">
done<br id="yui_3_16_0_ym19_1_1489431743651_19175">
<br id="yui_3_16_0_ym19_1_1489431743651_19176">
echo 'granularity:1' >>.machines<br
id="yui_3_16_0_ym19_1_1489431743651_19177">
echo 'extrafine:1' >>.machines<br
id="yui_3_16_0_ym19_1_1489431743651_19178">
<br id="yui_3_16_0_ym19_1_1489431743651_19179">
#define here your WIEN2k command<br
id="yui_3_16_0_ym19_1_1489431743651_19180">
<br id="yui_3_16_0_ym19_1_1489431743651_19181">
<br>
</div>
<div dir="ltr" id="yui_3_16_0_ym19_1_1489431743651_14266">....................................................................</div>
<div dir="ltr"><br>
</div>
<div dir="ltr" id="yui_3_16_0_ym19_1_1489431743651_20780"><br>
</div>
<div dir="ltr" id="yui_3_16_0_ym19_1_1489431743651_20781">Thank
you</div>
<div dir="ltr"><br>
</div>
<div dir="ltr">Chami<br>
</div>
</div>
</blockquote>
</body>
</html>