[Wien] Need extensive help for a job file for slurm job scheduler cluster

Gavin Abo gsabo at crimson.ua.edu
Fri Nov 13 14:34:38 CET 2020


If you have a look at [1], it can be seen that different cluster systems 
have different commands for job submission.

I did not see it clearly shown in your post how the job was submitted, 
for example did you maybe use something similar to that at [2]:

$ sbatch MyJobScript.sh

*What command creates your .machines file?*

In your MyJobScript.sh below, I'm not seeing any lines that create a 
.machines file.

MyJobScript.sh
--------------------------------------------------------------------------------------------------------
#!/bin/sh
#SBATCH -J test #job name
#SBATCH -p 44core #partition name
#SBATCH -N 1 #node
#SBATCH -n 18 #core
#SBATCH -o %x.o%j
#SBATCH -e %x.e%j
export I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so #Do not change here!!
srun ~/soft/qe66/bin/pw.x  < case.in > case.out
-------------------------------------------------------------------------------------------------------- 


The available jobs files on FAQs are not working. They give me
.machine0          .machines          .machines_current   files only 
wherein .machines has # and the other two are empty.

In the Slurm documentation at [3], it looks like there is variable for 
helping creating a list of nodes on the fly that would need to be 
written to the .machines file:

SLURM_JOB_NODELIST (and SLURM_NODELIST for backwards compatibility)

I'm not seeing this in your MyJobScript.sh like that seen in other job 
scripts found on the Internet, for example [4-7].

[1] https://slurm.schedmd.com/rosetta.pdf
[2] https://hpc-uit.readthedocs.io/en/latest/jobs/examples.html
[3] https://slurm.schedmd.com/sbatch.html
[4] https://itp.uni-frankfurt.de/wiki-it/index.php/Wien2k
[5] 
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg15511.html
[6] 
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg07097.html
[7] https://www.nsc.liu.se/software/installed/tetralith/wien2k/

On 11/13/2020 3:37 AM, Laurence Marks wrote:
> N.B., example mid-term questions:
> 1. What SBATCH command will give you 3 nodes?
> 2. What command creates your .machines file?
> 3. What are your fastest and slowest nodes?
> 4. Which nodes have the best communications.
>
> N.B., please don't post your answers -- just understand!
>
> _____
> Professor Laurence Marks
> "Research is to see what everybody else has seen, and to think what 
> nobody else has thought", Albert Szent-Gyorgi
> www.numis.northwestern.edu <http://www.numis.northwestern.edu>
>
> On Fri, Nov 13, 2020, 04:21 Laurence Marks <laurence.marks at gmail.com 
> <mailto:laurence.marks at gmail.com>> wrote:
>
>     Much of what you are requesting is problem/cluster specific, so
>     there is no magic answer -- it will vary. Suggestions:
>     1) Read the UG sections on .machines and parallel operation.
>     2) Read the man page for your cluster job command (srun)
>     3) Reread the UG sections.
>     4) Read the example scripts, and understand (lookup) all the
>     commands so you know what they are doing.
>
>     It is really not that complicated. If you cannot master this by
>     yourself, I will wonder whether you are in the right profession.
>
>     _____
>     Professor Laurence Marks
>     "Research is to see what everybody else has seen, and to think
>     what nobody else has thought", Albert Szent-Gyorgi
>     www.numis.northwestern.edu <http://www.numis.northwestern.edu>
>
>     On Fri, Nov 13, 2020, 03:24 Dr. K. C. Bhamu <kcbhamu85 at gmail.com
>     <mailto:kcbhamu85 at gmail.com>> wrote:
>
>         Dear All
>
>         I need your extensive help.
>         I have tried to provide full details that can help you
>         understand my requirement. In case I have missed something,
>         please let me know.
>
>         I am looking for a job file for our cluster. The
>         available jobs files on FAQs are not working. They give me
>         .machine0          .machines          .machines_current  files
>         only wherein .machines has # and the other two are empty.
>
>         The script that is working fine for Quantum Espresso for
>         44core partition is below
>         #!/bin/sh
>         #SBATCH -J test #job name
>         #SBATCH -p 44core #partition name
>         #SBATCH -N 1 #node
>         #SBATCH -n 18 #core
>         #SBATCH -o %x.o%j
>         #SBATCH -e %x.e%j
>         export I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so #Do not change
>         here!!
>         srun ~/soft/qe66/bin/pw.x  < case.in
>         <https://urldefense.com/v3/__http://case.in__;!!Dq0X2DkFhyF93HkjWTBQKhk!GAoAiAGPo-P9rf1ZIm9YcQa-sF1GVFoIXYQ5SUQSFmUQH3oCvMobKrJ6gbDtT98andJs2Q$>
>         > case.out
>
>         I have compiled Wien2k_19.2 on the Centos queuing system which
>         has the head node of Centos kernel Linux
>         3.10.0-1127.19.1.el7.x86_64.
>
>         I used compilers_and_libraries_2020.2.254 , fftw-3.3.8 ,
>         libxc-4.34 for the installation.
>
>         The details of the nodes that I can use are as follows (I can
>         login into these nodes with my user password):
>         NODELIST   NODES PARTITION       STATE CPUS    S:C:T MEMORY
>         TMP_DISK WEIGHT AVAIL_FE REASON
>         elpidos        1    master        idle 4       4:1:1  15787  
>              0      1   (null) none
>         node01         1    72core   allocated 72 72:1:1 515683      
>          0      1   (null) none
>         node02         1    72core   allocated 72 72:1:1 257651      
>          0      1   (null) none
>         node03         1    72core   allocated 72 72:1:1 257651      
>          0      1   (null) none
>         node09         1    44core       mixed 44 44:1:1 128650      
>          0      1   (null) none
>         node10         1    44core       mixed 44 44:1:1 128649      
>          0      1   (null) none
>         node11         1   52core*   allocated 52 52:1:1 191932      
>          0      1   (null) none
>         node12         1   52core*   allocated 52 52:1:1 191932      
>          0      1   (null) none
>
>         The other nodes have a mixture of the kernel as below.
>
>          OS=Linux 3.10.0-1062.12.1.el7.x86_64 #1 SMP Tue Feb 4
>         23:02:59 UTC 2020
>            OS=Linux 3.10.0-1127.19.1.el7.x86_64 #1 SMP Tue Aug 25
>         17:23:54 UTC 2020
>            OS=Linux 3.10.0-514.el7.x86_64 #1 SMP Tue Nov 22 16:42:41
>         UTC 2016
>          OS=Linux 3.10.0-957.12.2.el7.x86_64 #1 SMP Tue May 14
>         21:24:32 UTC 2019
>
>         Your extensive help will improve my research productivity.
>
>         Thank you very much.
>         Regards
>         Bhamu
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20201113/f343f34c/attachment.htm>


More information about the Wien mailing list