<div dir="ltr"><div class="gmail_default" style="font-size:large">Additional information (maybe this is the main cause of the lapw1 crash):</div><div class="gmail_default" style="font-size:large">bc is only working on the head node. node11 or other clint nodes are not having bc installed.</div><div class="gmail_default" style="font-size:large">If the bc is only the issue then is it possible to modify the job file such that it uses bc on the head node only.</div><div class="gmail_default" style="font-size:large"><br></div><div class="gmail_default" style="font-size:large">Thank you</div><div class="gmail_default" style="font-size:large">Bhamu</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sun, Nov 15, 2020 at 12:25 PM Dr. K. C. Bhamu <<a href="mailto:kcbhamu85@gmail.com">kcbhamu85@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div class="gmail_default" style="font-size:large">Dear Gavin and Prof. Marks</div><div class="gmail_default" style="font-size:large">Thank you for your inputs.</div><div class="gmail_default" style="font-size:large">qsub MyJobFIle.job creates the .machines file.</div><div class="gmail_default" style="font-size:large"><br></div><div class="gmail_default" style="font-size:large">With the below given job file, I could create the proper .machine files (equal to number of cores in the node and .machines file) but <span style="font-size:large;background-color:rgb(255,255,255);font-family:Arial,Helvetica,sans-serif;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;color:rgb(34,34,34);float:none;display:inline">lapw1 always </span>crashes</div><div class="gmail_default" style="font-size:large"><br></div><div class="gmail_default" style="font-size:large"><u><b>case.dayfile is</b></u></div><div class="gmail_default" style="font-size:large"><br>Calculating pbe in /home/kcbhamu/work/test/pbe<br>on node11 with PID 9241<br>using WIEN2k_19.1 (Release 25/6/2019) in /home/kcbhamu/soft/w2k192<br><br><br> start (Sun Nov 15 15:42:05 KST 2020) with lapw0 (40/99 to go)<br><br> cycle 1 (Sun Nov 15 15:42:05 KST 2020) (40/99 to go)<br><br>> lapw0 -p (15:42:05) starting parallel lapw0 at Sun Nov 15 15:42:05 KST 2020<br>-------- .machine0 : processors<br>running lapw0 in single mode<br>7.281u 0.272s 0:07.64 98.8% 0+0k 1000+1216io 0pf+0w<br>> lapw1 -p (15:42:13) starting parallel lapw1 at Sun Nov 15 15:42:13 KST 2020<br>-> starting parallel LAPW1 jobs at Sun Nov 15 15:42:13 KST 2020<br>running LAPW1 in parallel mode (using .machines)<br>16 number_of_parallel_jobs<br>0.200u 0.369s 0:00.59 94.9% 0+0k 208+456io 0pf+0w<br>error: command /home/kcbhamu/soft/w2k192/lapw1para lapw1.def failed<br><br>> stop error<br></div><div class="gmail_default" style="font-size:large"><br></div><div class="gmail_default" style="font-size:large"><b><u>the job.eout file indicates below error:</u></b></div><div class="gmail_default" style="font-size:large"><br></div><div class="gmail_default" style="font-size:large">But I am getting below error</div><div class="gmail_default" style="font-size:large"><br></div><div class="gmail_default" style="font-size:large">bc: Command not found.<br> LAPW0 END<br>bc: Command not found.<br>number_per_job: Subscript out of range.<br>grep: *scf1*: No such file or directory<br>grep: lapw2*.error: No such file or directory<br></div><div class="gmail_default" style="font-size:large"><br></div><div class="gmail_default" style="font-size:large"><b><u>.machines file is give below<br></u></b></div><div class="gmail_default" style="font-size:large"><br></div><div class="gmail_default" style="font-size:large">1:node11<br>1:node11<br>1:node11<br>1:node11<br>1:node11<br>1:node11<br>1:node11<br>1:node11<br>1:node11<br>1:node11<br>1:node11<br>1:node11<br>1:node11<br>1:node11<br>1:node11<br>1:node11<br>granularity:1<br>extrafine:1<br></div><div class="gmail_default" style="font-size:large"><br></div><div class="gmail_default" style="font-size:large"><br></div><div class="gmail_default" style="font-size:large"><b><u>parallel_options file</u></b></div><div class="gmail_default" style="font-size:large">setenv TASKSET "no"<br>if ( ! $?USE_REMOTE ) setenv USE_REMOTE 0<br>if ( ! $?MPI_REMOTE ) setenv MPI_REMOTE 0<br>setenv WIEN_GRANULARITY 1<br>setenv DELAY 0.1<br>setenv SLEEPY 1<br>setenv WIEN_MPIRUN "mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_"<br>setenv CORES_PER_NODE 16<br></div><div class="gmail_default" style="font-size:large"><br></div><div class="gmail_default" style="font-size:large"><b><u>job file</u></b></div><div class="gmail_default" style="font-size:large"><br></div><div class="gmail_default" style="font-size:large">#!/bin/sh<br>#SBATCH -J test<br>#SBATCH -p 52core # THis is the name of the partition.<br>#SBATCH -N 1<br>#SBATCH -n 16<br>#SBATCH -o %x.o%j<br>#SBATCH -e %x.e%j<br>#export I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so <br><br>export OMP_NUM_THREADS=16 # I have check with 1,2 4, 8 also.<br><br># Use , as list separator<br>IFS=','<br># Convert string to array<br>hcpus=($SLURM_JOB_CPUS_PER_NODE)<br>unset IFS<br><br>declare -a conv<br><br># Expand compressed slurm array<br>for cpu in ${hcpus[@]}; do<br> if [[ $cpu =~ (.*)\((.*)x\) ]]; then<br> # found compressed value<br> value=${BASH_REMATCH[1]}<br> factor=${BASH_REMATCH[2]}<br> for j in $(seq 1 $factor); do<br> conv=( ${conv[*]} $value )<br> done<br> else<br> conv=( ${conv[*]} $cpu )<br> fi<br>done<br><br># Build .machines file<br>rm -f .machines<br><br>nhost=0<br><br>echo ${conv[@]};<br><br>IFS=','<br>for node in $SLURM_NODELIST<br>do <br> declare -i cpuspernode=${conv[$nhost]};<br> for ((i=0; i<${cpuspernode}; i++)) <br> do<br> echo 1:$node >> .machines<br> done<br> let nhost+=1<br>done <br><br>echo 'granularity:1' >>.machines<br>echo 'extrafine:1' >>.machines<br><br><br>run_lapw -p<br></div><div class="gmail_default" style="font-size:large"><br></div><div class="gmail_default" style="font-size:large"><br></div><div class="gmail_default" style="font-size:large">Thank you very much</div><div class="gmail_default" style="font-size:large"><br></div><div class="gmail_default" style="font-size:large">Regards</div><div class="gmail_default" style="font-size:large">Bhamu</div><div class="gmail_default" style="font-size:large"><br></div><div class="gmail_default" style="font-size:large"><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Nov 13, 2020 at 7:04 PM Gavin Abo <<a href="mailto:gsabo@crimson.ua.edu" target="_blank">gsabo@crimson.ua.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p>If you have a look at [1], it can be seen that different cluster
systems have different commands for job submission.</p>
<p>I did not see it clearly shown in your post how the job was
submitted, for example did you maybe use something similar to that
at [2]:</p>
<p>$ sbatch MyJobScript.sh<br>
</p>
<p><b>What command creates your .machines file?</b><br>
</p>
<p>In your MyJobScript.sh below, I'm not seeing any lines that
create a .machines file.<br>
</p>
<font color="#808080">MyJobScript.sh<br>
--------------------------------------------------------------------------------------------------------<br>
#!/bin/sh<br>
#SBATCH -J test #job name<br>
#SBATCH -p 44core #partition name<br>
#SBATCH -N 1 #node<br>
#SBATCH -n 18 #core<br>
#SBATCH -o %x.o%j<br>
#SBATCH -e %x.e%j<br>
export I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so #Do not change
here!!<br>
srun ~/soft/qe66/bin/pw.x < <a href="http://case.in" target="_blank">case.in</a> > case.out<br>
--------------------------------------------------------------------------------------------------------
</font>
<p><font color="#808080">The available jobs files on FAQs are not
working. They give me<br>
.machine0 .machines .machines_current files
only wherein .machines has # and the other two are empty.</font><br>
</p>
<p>In the Slurm documentation at [3], it looks like there is
variable for helping creating a list of nodes on the fly that
would need to be written to the .machines file:</p>
<p>SLURM_JOB_NODELIST (and SLURM_NODELIST for backwards
compatibility)<br>
</p>
<p>I'm not seeing this in your MyJobScript.sh like that seen in
other job scripts found on the Internet, for example [4-7].<br>
</p>
[1] <a href="https://slurm.schedmd.com/rosetta.pdf" target="_blank">https://slurm.schedmd.com/rosetta.pdf</a><br>
[2] <a href="https://hpc-uit.readthedocs.io/en/latest/jobs/examples.html" target="_blank">https://hpc-uit.readthedocs.io/en/latest/jobs/examples.html</a><br>
[3] <a href="https://slurm.schedmd.com/sbatch.html" target="_blank">https://slurm.schedmd.com/sbatch.html</a><br>
[4] <a href="https://itp.uni-frankfurt.de/wiki-it/index.php/Wien2k" target="_blank">https://itp.uni-frankfurt.de/wiki-it/index.php/Wien2k</a><br>
[5]
<a href="https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg15511.html" target="_blank">https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg15511.html</a><br>
[6]
<a href="https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg07097.html" target="_blank">https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg07097.html</a><br>
[7] <a href="https://www.nsc.liu.se/software/installed/tetralith/wien2k/" target="_blank">https://www.nsc.liu.se/software/installed/tetralith/wien2k/</a>
<div><br>
</div>
<div>On 11/13/2020 3:37 AM, Laurence Marks
wrote:<br>
</div>
<blockquote type="cite">
<div dir="auto">
<div>N.B., example mid-term questions:
<div dir="auto">1. What SBATCH command will give you 3 nodes?</div>
<div dir="auto">2. What command creates your .machines file?</div>
<div dir="auto">3. What are your fastest and slowest nodes?</div>
<div dir="auto">4. Which nodes have the best communications.</div>
<div dir="auto"><br>
</div>
<div dir="auto">N.B., please don't post your answers -- just
understand!<br>
<br>
<div dir="auto">_____<br>
Professor Laurence Marks<br>
"Research is to see what everybody else has seen, and to
think what nobody else has thought", Albert Szent-Gyorgi<br>
<a href="http://www.numis.northwestern.edu" target="_blank">www.numis.northwestern.edu</a></div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Fri, Nov 13, 2020,
04:21 Laurence Marks <<a href="mailto:laurence.marks@gmail.com" target="_blank">laurence.marks@gmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="auto">Much of what you are requesting is
problem/cluster specific, so there is no magic answer --
it will vary. Suggestions:
<div dir="auto">1) Read the UG sections on .machines and
parallel operation.</div>
<div dir="auto">2) Read the man page for your cluster
job command (srun)</div>
<div dir="auto">3) Reread the UG sections.</div>
<div dir="auto">4) Read the example scripts, and
understand (lookup) all the commands so you know what
they are doing.</div>
<div dir="auto"><br>
</div>
<div dir="auto">It is really not that complicated. If
you cannot master this by yourself, I will wonder
whether you are in the right profession.<br>
<br>
<div dir="auto">_____<br>
Professor Laurence Marks<br>
"Research is to see what everybody else has seen,
and to think what nobody else has thought", Albert
Szent-Gyorgi<br>
<a href="http://www.numis.northwestern.edu" rel="noreferrer" target="_blank">www.numis.northwestern.edu</a></div>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Fri, Nov 13, 2020,
03:24 Dr. K. C. Bhamu <<a href="mailto:kcbhamu85@gmail.com" rel="noreferrer" target="_blank">kcbhamu85@gmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div style="font-size:large">Dear
All</div>
<div style="font-size:large"><br>
</div>
<div style="font-size:large">I
need your extensive help.</div>
<div style="font-size:large">I
have tried to provide full details that can help
you understand my requirement. In case I have
missed something, please let me know.</div>
<div style="font-size:large"><br>
</div>
<div style="font-size:large">I
am looking for a job file for our cluster. The
available jobs files on FAQs are not working. They
give me</div>
<div style="font-size:large">.machine0
.machines .machines_current
files only wherein .machines has # and the other
two are empty.<br>
</div>
<div style="font-size:large"><br>
</div>
<div style="font-size:large">The
script that is working fine for Quantum Espresso
for 44core partition is below</div>
<div style="font-size:large">#!/bin/sh<br>
#SBATCH -J test #job name<br>
#SBATCH -p 44core #partition name<br>
#SBATCH -N 1 #node<br>
#SBATCH -n 18 #core<br>
#SBATCH -o %x.o%j<br>
#SBATCH -e %x.e%j<br>
export I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so #Do
not change here!!<br>
srun ~/soft/qe66/bin/pw.x < <a href="https://urldefense.com/v3/__http://case.in__;!!Dq0X2DkFhyF93HkjWTBQKhk!GAoAiAGPo-P9rf1ZIm9YcQa-sF1GVFoIXYQ5SUQSFmUQH3oCvMobKrJ6gbDtT98andJs2Q$" rel="noreferrer noreferrer" target="_blank">case.in</a> > case.out<br>
</div>
<div style="font-size:large"><br>
</div>
<div style="font-size:large">I
have compiled Wien2k_19.2 on the Centos
queuing system which has the head node of Centos
kernel Linux 3.10.0-1127.19.1.el7.x86_64.</div>
<div style="font-size:large"><br>
</div>
<div style="font-size:large">I
used compilers_and_libraries_2020.2.254 ,
fftw-3.3.8 , libxc-4.34 for the installation.</div>
<div style="font-size:large"><br>
</div>
<div style="font-size:large">The
details of the nodes that I can use are as follows
(I can login into these nodes with my user
password):</div>
<div style="font-size:large">NODELIST
NODES PARTITION STATE CPUS S:C:T MEMORY
TMP_DISK WEIGHT AVAIL_FE REASON </div>
<div style="font-size:large">elpidos
1 master idle 4 4:1:1
15787 0 1 (null) none
<br>
node01 1 72core allocated 72
72:1:1 515683 0 1 (null) none
<br>
node02 1 72core allocated 72
72:1:1 257651 0 1 (null) none
<br>
node03 1 72core allocated 72
72:1:1 257651 0 1 (null) none
<br>
node09 1 44core mixed 44
44:1:1 128650 0 1 (null) none
<br>
node10 1 44core mixed 44
44:1:1 128649 0 1 (null) none
<br>
node11 1 52core* allocated 52
52:1:1 191932 0 1 (null) none
<br>
node12 1 52core* allocated 52
52:1:1 191932 0 1 (null) none
<br>
</div>
<div style="font-size:large"><br>
</div>
<div style="font-size:large">The
other nodes have a mixture of the kernel as below.</div>
<div style="font-size:large"><br>
</div>
<div style="font-size:large">
OS=Linux 3.10.0-1062.12.1.el7.x86_64 #1 SMP Tue
Feb 4 23:02:59 UTC 2020 <br>
OS=Linux 3.10.0-1127.19.1.el7.x86_64 #1 SMP Tue
Aug 25 17:23:54 UTC 2020 <br>
OS=Linux 3.10.0-514.el7.x86_64 #1 SMP Tue Nov
22 16:42:41 UTC 2016 </div>
<div style="font-size:large">
OS=Linux 3.10.0-957.12.2.el7.x86_64 #1 SMP Tue
May 14 21:24:32 UTC 2019 <br>
</div>
<div style="font-size:large"><br>
</div>
<div style="font-size:large">Your
extensive help will improve my research
productivity.</div>
<div style="font-size:large"><br>
</div>
<div style="font-size:large">Thank
you very much.</div>
<div style="font-size:large">Regards</div>
<div style="font-size:large">Bhamu</div>
</div>
</blockquote>
</div>
</blockquote>
</div>
</div>
</div>
</blockquote>
<blockquote type="cite"></blockquote>
</div>
_______________________________________________<br>
Wien mailing list<br>
<a href="mailto:Wien@zeus.theochem.tuwien.ac.at" target="_blank">Wien@zeus.theochem.tuwien.ac.at</a><br>
<a href="http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien" rel="noreferrer" target="_blank">http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien</a><br>
SEARCH the MAILING-LIST at: <a href="http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html" rel="noreferrer" target="_blank">http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html</a><br>
</blockquote></div></div>
</blockquote></div>