<div dir="ltr"><div dir="ltr"><div class="gmail_default" style="font-size:large">Dear Gavin and Prof. Marks</div><div class="gmail_default" style="font-size:large">Thank you for your inputs.</div><div class="gmail_default" style="font-size:large">qsub MyJobFIle.job creates the .machines file.</div><div class="gmail_default" style="font-size:large"><br></div><div class="gmail_default" style="font-size:large">With the below given  job file, I could create the proper .machine files (equal to number of cores in the node and .machines file) but  <span style="font-size:large;background-color:rgb(255,255,255);font-family:Arial,Helvetica,sans-serif;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;color:rgb(34,34,34);display:inline!important;float:none">lapw1 always </span>crashes</div><div class="gmail_default" style="font-size:large"><br></div><div class="gmail_default" style="font-size:large"><u><b>case.dayfile is</b></u></div><div class="gmail_default" style="font-size:large"><br>Calculating pbe in /home/kcbhamu/work/test/pbe<br>on node11 with PID 9241<br>using WIEN2k_19.1 (Release 25/6/2019) in /home/kcbhamu/soft/w2k192<br><br><br>    start       (Sun Nov 15 15:42:05 KST 2020) with lapw0 (40/99 to go)<br><br>    cycle 1        (Sun Nov 15 15:42:05 KST 2020)  (40/99 to go)<br><br>>   lapw0   -p    (15:42:05) starting parallel lapw0 at Sun Nov 15 15:42:05 KST 2020<br>-------- .machine0 : processors<br>running lapw0 in single mode<br>7.281u 0.272s 0:07.64 98.8%  0+0k 1000+1216io 0pf+0w<br>>   lapw1  -p           (15:42:13) starting parallel lapw1 at Sun Nov 15 15:42:13 KST 2020<br>->  starting parallel LAPW1 jobs at Sun Nov 15 15:42:13 KST 2020<br>running LAPW1 in parallel mode (using .machines)<br>16 number_of_parallel_jobs<br>0.200u 0.369s 0:00.59 94.9%     0+0k 208+456io 0pf+0w<br>error: command   /home/kcbhamu/soft/w2k192/lapw1para lapw1.def   failed<br><br>>   stop error<br></div><div class="gmail_default" style="font-size:large"><br></div><div class="gmail_default" style="font-size:large"><b><u>the job.eout file indicates below error:</u></b></div><div class="gmail_default" style="font-size:large"><br></div><div class="gmail_default" style="font-size:large">But I am getting below error</div><div class="gmail_default" style="font-size:large"><br></div><div class="gmail_default" style="font-size:large">bc: Command not found.<br> LAPW0 END<br>bc: Command not found.<br>number_per_job: Subscript out of range.<br>grep: *scf1*: No such file or directory<br>grep: lapw2*.error: No such file or directory<br></div><div class="gmail_default" style="font-size:large"><br></div><div class="gmail_default" style="font-size:large"><b><u>.machines file is give below<br></u></b></div><div class="gmail_default" style="font-size:large"><br></div><div class="gmail_default" style="font-size:large">1:node11<br>1:node11<br>1:node11<br>1:node11<br>1:node11<br>1:node11<br>1:node11<br>1:node11<br>1:node11<br>1:node11<br>1:node11<br>1:node11<br>1:node11<br>1:node11<br>1:node11<br>1:node11<br>granularity:1<br>extrafine:1<br></div><div class="gmail_default" style="font-size:large"><br></div><div class="gmail_default" style="font-size:large"><br></div><div class="gmail_default" style="font-size:large"><b><u>parallel_options file</u></b></div><div class="gmail_default" style="font-size:large">setenv TASKSET "no"<br>if ( ! $?USE_REMOTE ) setenv USE_REMOTE 0<br>if ( ! $?MPI_REMOTE ) setenv MPI_REMOTE 0<br>setenv WIEN_GRANULARITY 1<br>setenv DELAY 0.1<br>setenv SLEEPY 1<br>setenv WIEN_MPIRUN "mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_"<br>setenv CORES_PER_NODE 16<br></div><div class="gmail_default" style="font-size:large"><br></div><div class="gmail_default" style="font-size:large"><b><u>job file</u></b></div><div class="gmail_default" style="font-size:large"><br></div><div class="gmail_default" style="font-size:large">#!/bin/sh<br>#SBATCH -J test<br>#SBATCH -p 52core    # THis is the name of the partition.<br>#SBATCH -N 1<br>#SBATCH -n 16<br>#SBATCH -o %x.o%j<br>#SBATCH -e %x.e%j<br>#export I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so <br><br>export OMP_NUM_THREADS=16     # I have check with 1,2 4, 8 also.<br><br># Use , as list separator<br>IFS=','<br># Convert string to array<br>hcpus=($SLURM_JOB_CPUS_PER_NODE)<br>unset IFS<br><br>declare -a conv<br><br># Expand compressed slurm array<br>for cpu in ${hcpus[@]}; do<br>     if [[ $cpu =~ (.*)\((.*)x\) ]]; then<br>       # found compressed value<br>      value=${BASH_REMATCH[1]}<br>      factor=${BASH_REMATCH[2]}<br>     for j in $(seq 1 $factor); do<br>     conv=( ${conv[*]} $value )<br>      done<br>     else<br>    conv=( ${conv[*]} $cpu )<br>     fi<br>done<br><br># Build .machines file<br>rm -f .machines<br><br>nhost=0<br><br>echo ${conv[@]};<br><br>IFS=','<br>for node in $SLURM_NODELIST<br>do <br>    declare -i cpuspernode=${conv[$nhost]};<br>    for ((i=0; i<${cpuspernode}; i++))   <br>    do<br>    echo 1:$node >> .machines<br>    done<br>    let nhost+=1<br>done <br><br>echo 'granularity:1' >>.machines<br>echo 'extrafine:1' >>.machines<br><br><br>run_lapw -p<br></div><div class="gmail_default" style="font-size:large"><br></div><div class="gmail_default" style="font-size:large"><br></div><div class="gmail_default" style="font-size:large">Thank you very much</div><div class="gmail_default" style="font-size:large"><br></div><div class="gmail_default" style="font-size:large">Regards</div><div class="gmail_default" style="font-size:large">Bhamu</div><div class="gmail_default" style="font-size:large"><br></div><div class="gmail_default" style="font-size:large"><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Nov 13, 2020 at 7:04 PM Gavin Abo <<a href="mailto:gsabo@crimson.ua.edu">gsabo@crimson.ua.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
  
    
  
  <div>
    <p>If you have a look at [1], it can be seen that different cluster
      systems have different commands for job submission.</p>
    <p>I did not see it clearly shown in your post how the job was
      submitted, for example did you maybe use something similar to that
      at [2]:</p>
    <p>$ sbatch MyJobScript.sh<br>
    </p>
    <p><b>What command creates your .machines file?</b><br>
    </p>
    <p>In your MyJobScript.sh below, I'm not seeing any lines that
      create a .machines file.<br>
    </p>
    <font color="#808080">MyJobScript.sh<br>
--------------------------------------------------------------------------------------------------------<br>
      #!/bin/sh<br>
      #SBATCH -J test #job name<br>
      #SBATCH -p 44core #partition name<br>
      #SBATCH -N 1 #node<br>
      #SBATCH -n 18 #core<br>
      #SBATCH -o %x.o%j<br>
      #SBATCH -e %x.e%j<br>
      export I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so #Do not change
      here!!<br>
      srun ~/soft/qe66/bin/pw.x  < <a href="http://case.in" target="_blank">case.in</a> > case.out<br>
--------------------------------------------------------------------------------------------------------
    </font>
    <p><font color="#808080">The available jobs files on FAQs are not
        working. They give me<br>
        .machine0          .machines          .machines_current   files
        only wherein .machines has # and the other two are empty.</font><br>
    </p>
    <p>In the Slurm documentation at [3], it looks like there is
      variable for helping creating a list of nodes on the fly that
      would need to be written to the .machines file:</p>
    <p>SLURM_JOB_NODELIST (and SLURM_NODELIST for backwards
      compatibility)<br>
    </p>
    <p>I'm not seeing this in your MyJobScript.sh like that seen in
      other job scripts found on the Internet, for example [4-7].<br>
    </p>
    [1] <a href="https://slurm.schedmd.com/rosetta.pdf" target="_blank">https://slurm.schedmd.com/rosetta.pdf</a><br>
    [2] <a href="https://hpc-uit.readthedocs.io/en/latest/jobs/examples.html" target="_blank">https://hpc-uit.readthedocs.io/en/latest/jobs/examples.html</a><br>
    [3] <a href="https://slurm.schedmd.com/sbatch.html" target="_blank">https://slurm.schedmd.com/sbatch.html</a><br>
    [4] <a href="https://itp.uni-frankfurt.de/wiki-it/index.php/Wien2k" target="_blank">https://itp.uni-frankfurt.de/wiki-it/index.php/Wien2k</a><br>
    [5]
    <a href="https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg15511.html" target="_blank">https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg15511.html</a><br>
    [6]
    <a href="https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg07097.html" target="_blank">https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg07097.html</a><br>
    [7] <a href="https://www.nsc.liu.se/software/installed/tetralith/wien2k/" target="_blank">https://www.nsc.liu.se/software/installed/tetralith/wien2k/</a>
    <div><br>
    </div>
    <div>On 11/13/2020 3:37 AM, Laurence Marks
      wrote:<br>
    </div>
    <blockquote type="cite">
      
      <div dir="auto">
        <div>N.B., example mid-term questions:
          <div dir="auto">1. What SBATCH command will give you 3 nodes?</div>
          <div dir="auto">2. What command creates your .machines file?</div>
          <div dir="auto">3. What are your fastest and slowest nodes?</div>
          <div dir="auto">4. Which nodes have the best communications.</div>
          <div dir="auto"><br>
          </div>
          <div dir="auto">N.B., please don't post your answers -- just
            understand!<br>
            <br>
            <div dir="auto">_____<br>
              Professor Laurence Marks<br>
              "Research is to see what everybody else has seen, and to
              think what nobody else has thought", Albert Szent-Gyorgi<br>
              <a href="http://www.numis.northwestern.edu" target="_blank">www.numis.northwestern.edu</a></div>
          </div>
          <br>
          <div class="gmail_quote">
            <div dir="ltr" class="gmail_attr">On Fri, Nov 13, 2020,
              04:21 Laurence Marks <<a href="mailto:laurence.marks@gmail.com" target="_blank">laurence.marks@gmail.com</a>>
              wrote:<br>
            </div>
            <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
              <div dir="auto">Much of what you are requesting is
                problem/cluster specific, so there is no magic answer --
                it will vary. Suggestions:
                <div dir="auto">1) Read the UG sections on .machines and
                  parallel operation.</div>
                <div dir="auto">2) Read the man page for your cluster
                  job command (srun)</div>
                <div dir="auto">3) Reread the UG sections.</div>
                <div dir="auto">4) Read the example scripts, and
                  understand (lookup) all the commands so you know what
                  they are doing.</div>
                <div dir="auto"><br>
                </div>
                <div dir="auto">It is really not that complicated. If
                  you cannot master this by yourself, I will wonder
                  whether you are in the right profession.<br>
                  <br>
                  <div dir="auto">_____<br>
                    Professor Laurence Marks<br>
                    "Research is to see what everybody else has seen,
                    and to think what nobody else has thought", Albert
                    Szent-Gyorgi<br>
                    <a href="http://www.numis.northwestern.edu" rel="noreferrer" target="_blank">www.numis.northwestern.edu</a></div>
                </div>
              </div>
              <br>
              <div class="gmail_quote">
                <div dir="ltr" class="gmail_attr">On Fri, Nov 13, 2020,
                  03:24 Dr. K. C. Bhamu <<a href="mailto:kcbhamu85@gmail.com" rel="noreferrer" target="_blank">kcbhamu85@gmail.com</a>>
                  wrote:<br>
                </div>
                <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                  <div dir="ltr">
                    <div style="font-size:large">Dear
                      All</div>
                    <div style="font-size:large"><br>
                    </div>
                    <div style="font-size:large">I
                      need your extensive help.</div>
                    <div style="font-size:large">I
                      have tried to provide full details that can help
                      you understand my requirement. In case I have
                      missed something, please let me know.</div>
                    <div style="font-size:large"><br>
                    </div>
                    <div style="font-size:large">I
                      am looking for a job file for our cluster. The
                      available jobs files on FAQs are not working. They
                      give me</div>
                    <div style="font-size:large">.machine0
                               .machines          .machines_current 
                       files only wherein .machines has # and the other
                      two are empty.<br>
                    </div>
                    <div style="font-size:large"><br>
                    </div>
                    <div style="font-size:large">The
                      script that is working fine for Quantum Espresso
                      for 44core partition is below</div>
                    <div style="font-size:large">#!/bin/sh<br>
                      #SBATCH -J test #job name<br>
                      #SBATCH -p 44core #partition name<br>
                      #SBATCH -N 1 #node<br>
                      #SBATCH -n 18 #core<br>
                      #SBATCH -o %x.o%j<br>
                      #SBATCH -e %x.e%j<br>
                      export I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so #Do
                      not change here!!<br>
                      srun ~/soft/qe66/bin/pw.x  < <a href="https://urldefense.com/v3/__http://case.in__;!!Dq0X2DkFhyF93HkjWTBQKhk!GAoAiAGPo-P9rf1ZIm9YcQa-sF1GVFoIXYQ5SUQSFmUQH3oCvMobKrJ6gbDtT98andJs2Q$" rel="noreferrer noreferrer" target="_blank">case.in</a> > case.out<br>
                    </div>
                    <div style="font-size:large"><br>
                    </div>
                    <div style="font-size:large">I
                      have compiled Wien2k_19.2 on the Centos
                      queuing system which has the head node of Centos
                      kernel Linux 3.10.0-1127.19.1.el7.x86_64.</div>
                    <div style="font-size:large"><br>
                    </div>
                    <div style="font-size:large">I
                      used compilers_and_libraries_2020.2.254 ,
                      fftw-3.3.8 , libxc-4.34 for the installation.</div>
                    <div style="font-size:large"><br>
                    </div>
                    <div style="font-size:large">The
                      details of the nodes that I can use are as follows
                      (I can login into these nodes with my user
                      password):</div>
                    <div style="font-size:large">NODELIST
                        NODES PARTITION       STATE CPUS    S:C:T MEMORY
                      TMP_DISK WEIGHT AVAIL_FE REASON               </div>
                    <div style="font-size:large">elpidos
                             1    master        idle 4       4:1:1
                       15787        0      1   (null) none              
                        <br>
                      node01         1    72core   allocated 72    
                      72:1:1 515683        0      1   (null) none      
                                <br>
                      node02         1    72core   allocated 72    
                      72:1:1 257651        0      1   (null) none      
                                <br>
                      node03         1    72core   allocated 72    
                      72:1:1 257651        0      1   (null) none      
                                <br>
                      node09         1    44core       mixed 44    
                      44:1:1 128650        0      1   (null) none      
                                <br>
                      node10         1    44core       mixed 44    
                      44:1:1 128649        0      1   (null) none      
                                <br>
                      node11         1   52core*   allocated 52    
                      52:1:1 191932        0      1   (null) none      
                                <br>
                      node12         1   52core*   allocated 52    
                      52:1:1 191932        0      1   (null) none       
                               <br>
                    </div>
                    <div style="font-size:large"><br>
                    </div>
                    <div style="font-size:large">The
                      other nodes have a mixture of the kernel as below.</div>
                    <div style="font-size:large"><br>
                    </div>
                    <div style="font-size:large"> 
                       OS=Linux 3.10.0-1062.12.1.el7.x86_64 #1 SMP Tue
                      Feb 4 23:02:59 UTC 2020 <br>
                         OS=Linux 3.10.0-1127.19.1.el7.x86_64 #1 SMP Tue
                      Aug 25 17:23:54 UTC 2020 <br>
                         OS=Linux 3.10.0-514.el7.x86_64 #1 SMP Tue Nov
                      22 16:42:41 UTC 2016 </div>
                    <div style="font-size:large"> 
                       OS=Linux 3.10.0-957.12.2.el7.x86_64 #1 SMP Tue
                      May 14 21:24:32 UTC 2019 <br>
                    </div>
                    <div style="font-size:large"><br>
                    </div>
                    <div style="font-size:large">Your
                      extensive help will improve my research
                      productivity.</div>
                    <div style="font-size:large"><br>
                    </div>
                    <div style="font-size:large">Thank
                      you very much.</div>
                    <div style="font-size:large">Regards</div>
                    <div style="font-size:large">Bhamu</div>
                  </div>
                </blockquote>
              </div>
            </blockquote>
          </div>
        </div>
      </div>
    </blockquote>
    <blockquote type="cite"></blockquote>
  </div>

_______________________________________________<br>
Wien mailing list<br>
<a href="mailto:Wien@zeus.theochem.tuwien.ac.at" target="_blank">Wien@zeus.theochem.tuwien.ac.at</a><br>
<a href="http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien" rel="noreferrer" target="_blank">http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien</a><br>
SEARCH the MAILING-LIST at:  <a href="http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html" rel="noreferrer" target="_blank">http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html</a><br>
</blockquote></div></div>