[Wien] Need extensive help for a job file for slurm job scheduler cluster
Gavin Abo
gsabo at crimson.ua.edu
Fri Nov 13 14:34:38 CET 2020
If you have a look at [1], it can be seen that different cluster systems
have different commands for job submission.
I did not see it clearly shown in your post how the job was submitted,
for example did you maybe use something similar to that at [2]:
$ sbatch MyJobScript.sh
*What command creates your .machines file?*
In your MyJobScript.sh below, I'm not seeing any lines that create a
.machines file.
MyJobScript.sh
--------------------------------------------------------------------------------------------------------
#!/bin/sh
#SBATCH -J test #job name
#SBATCH -p 44core #partition name
#SBATCH -N 1 #node
#SBATCH -n 18 #core
#SBATCH -o %x.o%j
#SBATCH -e %x.e%j
export I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so #Do not change here!!
srun ~/soft/qe66/bin/pw.x < case.in > case.out
--------------------------------------------------------------------------------------------------------
The available jobs files on FAQs are not working. They give me
.machine0 .machines .machines_current files only
wherein .machines has # and the other two are empty.
In the Slurm documentation at [3], it looks like there is variable for
helping creating a list of nodes on the fly that would need to be
written to the .machines file:
SLURM_JOB_NODELIST (and SLURM_NODELIST for backwards compatibility)
I'm not seeing this in your MyJobScript.sh like that seen in other job
scripts found on the Internet, for example [4-7].
[1] https://slurm.schedmd.com/rosetta.pdf
[2] https://hpc-uit.readthedocs.io/en/latest/jobs/examples.html
[3] https://slurm.schedmd.com/sbatch.html
[4] https://itp.uni-frankfurt.de/wiki-it/index.php/Wien2k
[5]
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg15511.html
[6]
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg07097.html
[7] https://www.nsc.liu.se/software/installed/tetralith/wien2k/
On 11/13/2020 3:37 AM, Laurence Marks wrote:
> N.B., example mid-term questions:
> 1. What SBATCH command will give you 3 nodes?
> 2. What command creates your .machines file?
> 3. What are your fastest and slowest nodes?
> 4. Which nodes have the best communications.
>
> N.B., please don't post your answers -- just understand!
>
> _____
> Professor Laurence Marks
> "Research is to see what everybody else has seen, and to think what
> nobody else has thought", Albert Szent-Gyorgi
> www.numis.northwestern.edu <http://www.numis.northwestern.edu>
>
> On Fri, Nov 13, 2020, 04:21 Laurence Marks <laurence.marks at gmail.com
> <mailto:laurence.marks at gmail.com>> wrote:
>
> Much of what you are requesting is problem/cluster specific, so
> there is no magic answer -- it will vary. Suggestions:
> 1) Read the UG sections on .machines and parallel operation.
> 2) Read the man page for your cluster job command (srun)
> 3) Reread the UG sections.
> 4) Read the example scripts, and understand (lookup) all the
> commands so you know what they are doing.
>
> It is really not that complicated. If you cannot master this by
> yourself, I will wonder whether you are in the right profession.
>
> _____
> Professor Laurence Marks
> "Research is to see what everybody else has seen, and to think
> what nobody else has thought", Albert Szent-Gyorgi
> www.numis.northwestern.edu <http://www.numis.northwestern.edu>
>
> On Fri, Nov 13, 2020, 03:24 Dr. K. C. Bhamu <kcbhamu85 at gmail.com
> <mailto:kcbhamu85 at gmail.com>> wrote:
>
> Dear All
>
> I need your extensive help.
> I have tried to provide full details that can help you
> understand my requirement. In case I have missed something,
> please let me know.
>
> I am looking for a job file for our cluster. The
> available jobs files on FAQs are not working. They give me
> .machine0 .machines .machines_current files
> only wherein .machines has # and the other two are empty.
>
> The script that is working fine for Quantum Espresso for
> 44core partition is below
> #!/bin/sh
> #SBATCH -J test #job name
> #SBATCH -p 44core #partition name
> #SBATCH -N 1 #node
> #SBATCH -n 18 #core
> #SBATCH -o %x.o%j
> #SBATCH -e %x.e%j
> export I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so #Do not change
> here!!
> srun ~/soft/qe66/bin/pw.x < case.in
> <https://urldefense.com/v3/__http://case.in__;!!Dq0X2DkFhyF93HkjWTBQKhk!GAoAiAGPo-P9rf1ZIm9YcQa-sF1GVFoIXYQ5SUQSFmUQH3oCvMobKrJ6gbDtT98andJs2Q$>
> > case.out
>
> I have compiled Wien2k_19.2 on the Centos queuing system which
> has the head node of Centos kernel Linux
> 3.10.0-1127.19.1.el7.x86_64.
>
> I used compilers_and_libraries_2020.2.254 , fftw-3.3.8 ,
> libxc-4.34 for the installation.
>
> The details of the nodes that I can use are as follows (I can
> login into these nodes with my user password):
> NODELIST NODES PARTITION STATE CPUS S:C:T MEMORY
> TMP_DISK WEIGHT AVAIL_FE REASON
> elpidos 1 master idle 4 4:1:1 15787
> 0 1 (null) none
> node01 1 72core allocated 72 72:1:1 515683
> 0 1 (null) none
> node02 1 72core allocated 72 72:1:1 257651
> 0 1 (null) none
> node03 1 72core allocated 72 72:1:1 257651
> 0 1 (null) none
> node09 1 44core mixed 44 44:1:1 128650
> 0 1 (null) none
> node10 1 44core mixed 44 44:1:1 128649
> 0 1 (null) none
> node11 1 52core* allocated 52 52:1:1 191932
> 0 1 (null) none
> node12 1 52core* allocated 52 52:1:1 191932
> 0 1 (null) none
>
> The other nodes have a mixture of the kernel as below.
>
> OS=Linux 3.10.0-1062.12.1.el7.x86_64 #1 SMP Tue Feb 4
> 23:02:59 UTC 2020
> OS=Linux 3.10.0-1127.19.1.el7.x86_64 #1 SMP Tue Aug 25
> 17:23:54 UTC 2020
> OS=Linux 3.10.0-514.el7.x86_64 #1 SMP Tue Nov 22 16:42:41
> UTC 2016
> OS=Linux 3.10.0-957.12.2.el7.x86_64 #1 SMP Tue May 14
> 21:24:32 UTC 2019
>
> Your extensive help will improve my research productivity.
>
> Thank you very much.
> Regards
> Bhamu
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20201113/f343f34c/attachment.htm>
More information about the Wien
mailing list