[Wien] Need extensive help for a job file for slurm job scheduler cluster

Laurence Marks laurence.marks at gmail.com
Fri Nov 13 11:21:24 CET 2020


Much of what you are requesting is problem/cluster specific, so there is no
magic answer -- it will vary. Suggestions:
1) Read the UG sections on .machines and parallel operation.
2) Read the man page for your cluster job command (srun)
3) Reread the UG sections.
4) Read the example scripts, and understand (lookup) all the commands so
you know what they are doing.

It is really not that complicated. If you cannot master this by yourself, I
will wonder whether you are in the right profession.

_____
Professor Laurence Marks
"Research is to see what everybody else has seen, and to think what nobody
else has thought", Albert Szent-Gyorgi
www.numis.northwestern.edu

On Fri, Nov 13, 2020, 03:24 Dr. K. C. Bhamu <kcbhamu85 at gmail.com> wrote:

> Dear All
>
> I need your extensive help.
> I have tried to provide full details that can help you understand my
> requirement. In case I have missed something, please let me know.
>
> I am looking for a job file for our cluster. The available jobs files on
> FAQs are not working. They give me
> .machine0          .machines          .machines_current   files only
> wherein .machines has # and the other two are empty.
>
> The script that is working fine for Quantum Espresso for 44core partition
> is below
> #!/bin/sh
> #SBATCH -J test #job name
> #SBATCH -p 44core #partition name
> #SBATCH -N 1 #node
> #SBATCH -n 18 #core
> #SBATCH -o %x.o%j
> #SBATCH -e %x.e%j
> export I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so #Do not change here!!
> srun ~/soft/qe66/bin/pw.x  < case.in
> <https://urldefense.com/v3/__http://case.in__;!!Dq0X2DkFhyF93HkjWTBQKhk!GAoAiAGPo-P9rf1ZIm9YcQa-sF1GVFoIXYQ5SUQSFmUQH3oCvMobKrJ6gbDtT98andJs2Q$>
> > case.out
>
> I have compiled Wien2k_19.2 on the Centos queuing system which has the
> head node of Centos kernel Linux 3.10.0-1127.19.1.el7.x86_64.
>
> I used compilers_and_libraries_2020.2.254 , fftw-3.3.8 , libxc-4.34 for
> the installation.
>
> The details of the nodes that I can use are as follows (I can login into
> these nodes with my user password):
> NODELIST   NODES PARTITION       STATE CPUS    S:C:T MEMORY TMP_DISK
> WEIGHT AVAIL_FE REASON
> elpidos        1    master        idle 4       4:1:1  15787        0
>  1   (null) none
> node01         1    72core   allocated 72     72:1:1 515683        0
>  1   (null) none
> node02         1    72core   allocated 72     72:1:1 257651        0
>  1   (null) none
> node03         1    72core   allocated 72     72:1:1 257651        0
>  1   (null) none
> node09         1    44core       mixed 44     44:1:1 128650        0
>  1   (null) none
> node10         1    44core       mixed 44     44:1:1 128649        0
>  1   (null) none
> node11         1   52core*   allocated 52     52:1:1 191932        0
>  1   (null) none
> node12         1   52core*   allocated 52     52:1:1 191932        0
>  1   (null) none
>
> The other nodes have a mixture of the kernel as below.
>
>    OS=Linux 3.10.0-1062.12.1.el7.x86_64 #1 SMP Tue Feb 4 23:02:59 UTC 2020
>    OS=Linux 3.10.0-1127.19.1.el7.x86_64 #1 SMP Tue Aug 25 17:23:54 UTC
> 2020
>    OS=Linux 3.10.0-514.el7.x86_64 #1 SMP Tue Nov 22 16:42:41 UTC 2016
>    OS=Linux 3.10.0-957.12.2.el7.x86_64 #1 SMP Tue May 14 21:24:32 UTC 2019
>
> Your extensive help will improve my research productivity.
>
> Thank you very much.
> Regards
> Bhamu
>
> *Full details of the nodes are here:*
>
> NodeName=elpidos Arch=x86_64 CoresPerSocket=1
>    CPUAlloc=0 CPUTot=4 CPULoad=0.06
>    AvailableFeatures=(null)
>    ActiveFeatures=(null)
>    Gres=(null)
>    NodeAddr=10.0.0.250 NodeHostName=elpidos Version=20.02.3
>    OS=Linux 3.10.0-1127.19.1.el7.x86_64 #1 SMP Tue Aug 25 17:23:54 UTC
> 2020
>    RealMemory=15787 AllocMem=0 FreeMem=5597 Sockets=4 Boards=1
>    State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
>    Partitions=master
>    BootTime=2020-10-13T14:25:13 SlurmdStartTime=2020-10-13T14:25:26
>    CfgTRES=cpu=4,mem=15787M,billing=4
>    AllocTRES=
>    CapWatts=n/a
>    CurrentWatts=0 AveWatts=0
>    ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
>
>
> NodeName=node01 Arch=x86_64 CoresPerSocket=1
>    CPUAlloc=72 CPUTot=72 CPULoad=72.00
>    AvailableFeatures=(null)
>    ActiveFeatures=(null)
>    Gres=(null)
>    NodeAddr=10.0.0.1 NodeHostName=node01 Version=20.02.3
>    OS=Linux 3.10.0-1127.19.1.el7.x86_64 #1 SMP Tue Aug 25 17:23:54 UTC
> 2020
>    RealMemory=515683 AllocMem=0 FreeMem=363362 Sockets=72 Boards=1
>    State=ALLOCATED ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A
> MCS_label=N/A
>    Partitions=72core
>    BootTime=2020-10-13T20:44:04 SlurmdStartTime=2020-10-14T05:44:23
>    CfgTRES=cpu=72,mem=515683M,billing=72
>    AllocTRES=cpu=72
>    CapWatts=n/a
>    CurrentWatts=0 AveWatts=0
>    ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
>
>
> NodeName=node02 Arch=x86_64 CoresPerSocket=1
>    CPUAlloc=72 CPUTot=72 CPULoad=71.92
>    AvailableFeatures=(null)
>    ActiveFeatures=(null)
>    Gres=(null)
>    NodeAddr=10.0.0.2 NodeHostName=node02 Version=20.02.3
>    OS=Linux 3.10.0-1127.19.1.el7.x86_64 #1 SMP Tue Aug 25 17:23:54 UTC
> 2020
>    RealMemory=257651 AllocMem=0 FreeMem=142057 Sockets=72 Boards=1
>    State=ALLOCATED ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A
> MCS_label=N/A
>    Partitions=72core
>    BootTime=2020-10-13T20:44:04 SlurmdStartTime=2020-10-14T05:44:17
>    CfgTRES=cpu=72,mem=257651M,billing=72
>    AllocTRES=cpu=72
>    CapWatts=n/a
>    CurrentWatts=0 AveWatts=0
>    ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
>
>
> NodeName=node03 Arch=x86_64 CoresPerSocket=1
>    CPUAlloc=72 CPUTot=72 CPULoad=71.96
>    AvailableFeatures=(null)
>    ActiveFeatures=(null)
>    Gres=(null)
>    NodeAddr=10.0.0.3 NodeHostName=node03 Version=20.02.3
>    OS=Linux 3.10.0-1127.19.1.el7.x86_64 #1 SMP Tue Aug 25 17:23:54 UTC
> 2020
>    RealMemory=257651 AllocMem=0 FreeMem=168118 Sockets=72 Boards=1
>    State=ALLOCATED ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A
> MCS_label=N/A
>    Partitions=72core
>    BootTime=2020-10-13T20:44:33 SlurmdStartTime=2020-10-14T05:43:35
>    CfgTRES=cpu=72,mem=257651M,billing=72
>    AllocTRES=cpu=72
>    CapWatts=n/a
>    CurrentWatts=0 AveWatts=0
>    ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
>
>
> NodeName=node04 Arch=x86_64 CoresPerSocket=1
>    CPUAlloc=0 CPUTot=20 CPULoad=0.01
>    AvailableFeatures=(null)
>    ActiveFeatures=(null)
>    Gres=(null)
>    NodeAddr=10.0.0.4 NodeHostName=node04 Version=20.02.3
>    OS=Linux 3.10.0-1127.19.1.el7.x86_64 #1 SMP Tue Aug 25 17:23:54 UTC
> 2020
>    RealMemory=128664 AllocMem=0 FreeMem=126677 Sockets=20 Boards=1
>    State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
>    Partitions=20core
>    BootTime=2020-10-13T20:43:24 SlurmdStartTime=2020-10-14T05:42:43
>    CfgTRES=cpu=20,mem=128664M,billing=20
>    AllocTRES=
>    CapWatts=n/a
>    CurrentWatts=0 AveWatts=0
>    ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
>
>
> NodeName=node05 Arch=x86_64 CoresPerSocket=1
>    CPUAlloc=0 CPUTot=4 CPULoad=0.01
>    AvailableFeatures=(null)
>    ActiveFeatures=(null)
>    Gres=(null)
>    NodeAddr=10.0.0.5 NodeHostName=node05 Version=20.02.3
>    OS=Linux 3.10.0-1062.12.1.el7.x86_64 #1 SMP Tue Feb 4 23:02:59 UTC 2020
>    RealMemory=64190 AllocMem=0 FreeMem=63350 Sockets=4 Boards=1
>    State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
>    Partitions=4core
>    BootTime=2020-10-29T11:34:18 SlurmdStartTime=2020-10-29T11:34:30
>    CfgTRES=cpu=4,mem=64190M,billing=4
>    AllocTRES=
>    CapWatts=n/a
>    CurrentWatts=0 AveWatts=0
>    ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
>
>
> NodeName=node06 Arch=x86_64 CoresPerSocket=1
>    CPUAlloc=0 CPUTot=4 CPULoad=0.01
>    AvailableFeatures=(null)
>    ActiveFeatures=(null)
>    Gres=(null)
>    NodeAddr=10.0.0.6 NodeHostName=node06 Version=20.02.3
>    OS=Linux 3.10.0-1127.19.1.el7.x86_64 #1 SMP Tue Aug 25 17:23:54 UTC
> 2020
>    RealMemory=64190 AllocMem=0 FreeMem=63084 Sockets=4 Boards=1
>    State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
>    Partitions=4core
>    BootTime=2020-10-19T11:07:32 SlurmdStartTime=2020-10-19T11:07:51
>    CfgTRES=cpu=4,mem=64190M,billing=4
>    AllocTRES=
>    CapWatts=n/a
>    CurrentWatts=0 AveWatts=0
>    ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
>
>
> NodeName=node07 Arch=x86_64 CoresPerSocket=1
>    CPUAlloc=0 CPUTot=64 CPULoad=0.01
>    AvailableFeatures=(null)
>    ActiveFeatures=(null)
>    Gres=(null)
>    NodeAddr=10.0.0.7 NodeHostName=node07 Version=20.02.3
>    OS=Linux 3.10.0-514.el7.x86_64 #1 SMP Tue Nov 22 16:42:41 UTC 2016
>    RealMemory=80241 AllocMem=0 FreeMem=75316 Sockets=64 Boards=1
>    State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
>    Partitions=64core
>    BootTime=2020-10-13T20:52:40 SlurmdStartTime=2020-10-13T21:10:59
>    CfgTRES=cpu=64,mem=80241M,billing=64
>    AllocTRES=
>    CapWatts=n/a
>    CurrentWatts=0 AveWatts=0
>    ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
>
>
> NodeName=node08 Arch=x86_64 CoresPerSocket=1
>    CPUAlloc=0 CPUTot=64 CPULoad=0.01
>    AvailableFeatures=(null)
>    ActiveFeatures=(null)
>    Gres=(null)
>    NodeAddr=10.0.0.8 NodeHostName=node08 Version=20.02.3
>    OS=Linux 3.10.0-514.el7.x86_64 #1 SMP Tue Nov 22 16:42:41 UTC 2016
>    RealMemory=47987 AllocMem=0 FreeMem=42188 Sockets=64 Boards=1
>    State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
>    Partitions=64core
>    BootTime=2020-10-13T20:51:08 SlurmdStartTime=2020-10-13T20:57:12
>    CfgTRES=cpu=64,mem=47987M,billing=64
>    AllocTRES=
>    CapWatts=n/a
>    CurrentWatts=0 AveWatts=0
>    ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
>
>
> NodeName=node09 Arch=x86_64 CoresPerSocket=1
>    CPUAlloc=36 CPUTot=44 CPULoad=35.99
>    AvailableFeatures=(null)
>    ActiveFeatures=(null)
>    Gres=(null)
>    NodeAddr=10.0.0.9 NodeHostName=node09 Version=20.02.3
>    OS=Linux 3.10.0-957.12.2.el7.x86_64 #1 SMP Tue May 14 21:24:32 UTC 2019
>    RealMemory=128650 AllocMem=0 FreeMem=78059 Sockets=44 Boards=1
>    State=MIXED ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
>    Partitions=44core
>    BootTime=2020-10-13T20:47:11 SlurmdStartTime=2020-10-13T20:47:29
>    CfgTRES=cpu=44,mem=128650M,billing=44
>    AllocTRES=cpu=36
>    CapWatts=n/a
>    CurrentWatts=0 AveWatts=0
>    ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
>
>
> NodeName=node10 Arch=x86_64 CoresPerSocket=1
>    CPUAlloc=18 CPUTot=44 CPULoad=18.01
>    AvailableFeatures=(null)
>    ActiveFeatures=(null)
>    Gres=(null)
>    NodeAddr=10.0.0.10 NodeHostName=node10 Version=20.02.3
>    OS=Linux 3.10.0-957.12.2.el7.x86_64 #1 SMP Tue May 14 21:24:32 UTC 2019
>    RealMemory=128649 AllocMem=0 FreeMem=82279 Sockets=44 Boards=1
>    State=MIXED ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
>    Partitions=44core
>    BootTime=2020-10-13T20:47:36 SlurmdStartTime=2020-10-13T20:48:00
>    CfgTRES=cpu=44,mem=128649M,billing=44
>    AllocTRES=cpu=18
>    CapWatts=n/a
>    CurrentWatts=0 AveWatts=0
>    ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
>
>
> NodeName=node11 Arch=x86_64 CoresPerSocket=1
>    CPUAlloc=52 CPUTot=52 CPULoad=52.01
>    AvailableFeatures=(null)
>    ActiveFeatures=(null)
>    Gres=(null)
>    NodeAddr=10.0.0.11 NodeHostName=node11 Version=20.02.3
>    OS=Linux 3.10.0-1127.13.1.el7.x86_64 #1 SMP Tue Jun 23 15:46:38 UTC
> 2020
>    RealMemory=191932 AllocMem=0 FreeMem=147904 Sockets=52 Boards=1
>    State=ALLOCATED ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A
> MCS_label=N/A
>    Partitions=52core
>    BootTime=2020-10-13T20:47:02 SlurmdStartTime=2020-10-13T20:47:13
>    CfgTRES=cpu=52,mem=191932M,billing=52
>    AllocTRES=cpu=52
>    CapWatts=n/a
>    CurrentWatts=0 AveWatts=0
>    ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
>
>
> NodeName=node12 Arch=x86_64 CoresPerSocket=1
>    CPUAlloc=52 CPUTot=52 CPULoad=52.01
>    AvailableFeatures=(null)
>    ActiveFeatures=(null)
>    Gres=(null)
>    NodeAddr=10.0.0.12 NodeHostName=node12 Version=20.02.3
>    OS=Linux 3.10.0-1127.13.1.el7.x86_64 #1 SMP Tue Jun 23 15:46:38 UTC
> 2020
>    RealMemory=191932 AllocMem=0 FreeMem=162998 Sockets=52 Boards=1
>    State=ALLOCATED ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A
> MCS_label=N/A
>    Partitions=52core
>    BootTime=2020-10-13T20:47:31 SlurmdStartTime=2020-10-13T20:47:42
>    CfgTRES=cpu=52,mem=191932M,billing=52
>    AllocTRES=cpu=52
>    CapWatts=n/a
>    CurrentWatts=0 AveWatts=0
>    ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
>
>
> NodeName=node13 Arch=x86_64 CoresPerSocket=1
>    CPUAlloc=0 CPUTot=4 CPULoad=0.01
>    AvailableFeatures=(null)
>    ActiveFeatures=(null)
>    Gres=(null)
>    NodeAddr=10.0.0.13 NodeHostName=node13 Version=20.02.3
>    OS=Linux 3.10.0-1127.19.1.el7.x86_64 #1 SMP Tue Aug 25 17:23:54 UTC
> 2020
>    RealMemory=31836 AllocMem=0 FreeMem=31093 Sockets=4 Boards=1
>    State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
>    Partitions=4core
>    BootTime=2020-10-13T20:48:12 SlurmdStartTime=2020-10-13T20:48:20
>    CfgTRES=cpu=4,mem=31836M,billing=4
>    AllocTRES=
>    CapWatts=n/a
>    CurrentWatts=0 AveWatts=0
>    ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
>
> https://urldefense.com/v3/__http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien__;!!Dq0X2DkFhyF93HkjWTBQKhk!GAoAiAGPo-P9rf1ZIm9YcQa-sF1GVFoIXYQ5SUQSFmUQH3oCvMobKrJ6gbDtT983sRoe0A$
> SEARCH the MAILING-LIST at:
> https://urldefense.com/v3/__http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html__;!!Dq0X2DkFhyF93HkjWTBQKhk!GAoAiAGPo-P9rf1ZIm9YcQa-sF1GVFoIXYQ5SUQSFmUQH3oCvMobKrJ6gbDtT99O8q-FWg$
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20201113/71f50b61/attachment.htm>


More information about the Wien mailing list