[Wien] Parallel calculations in cluster - choices

Laurence Marks L-marks at northwestern.edu
Sun Aug 7 18:07:51 CEST 2011


On Sun, Aug 7, 2011 at 5:00 AM, Dr Qiwen  YAO <Yao.Qiwen at nims.go.jp> wrote:
> Dear Wien2k users,
> I was running a supercell 3x3x1 calculation for a 4-atom double peroskite compound with spin polarized calculation.
> The job was killed by the cluster because of wtime limited - see below the relevant error message:

> =>> PBS: job killed: walltime 86419 exceeded limit 86400
> ----------
> Question 1.
> In a case like this - what is the best way for me to continue on the previous calculation (or if it is possible just to re-run the same job again pretending nothing happened - since the calculation itself does not crush) - if I am to restart the job again would Wien2k be able to pick up the best point and continues on? - I could not find anything similar to this in the email achieve.

Yes, just add "-NI" to the runXYZ command (XYZ as appropriate).

>
> Question 2. The choices of the mpi and k-parallel  - for my future calculation choices.
> The cluster I am running wien2k with is a 512 node cluster with PBS. Each node is with 8 cores and it is 2.85GB/core in memory/core ratio.

You need to benchmark the speed of your system first. Use the
benchmark from the Wien2k web page, and try different configurations.
Work out as well the speed of lapw1 (the slowist program) for
different sizes for you cluster (and different numbers of nodes).

Note, as well, that in general you want to use a comparable density of
k-points in reciprocal space. Hence for a 3x3x1 cell you need 1/9 the
number of k-points that you need for a 1x1x1 cell. In addition, for an
insulator in general you need less k-points than for a metal. Check
the number that you need for a 1x1x1 cell then use the same density
for the 3x3x1 (not the same number).

Last, you should see what is being produced by the script you have,
i.e. look at the .machines file (cat .machines). I have a small set of
commands that I have used to control what one gets in a more flexible
fashion. I will send it to Peter and ask that he puts it on the
unsupported software page. (I can send it seperately on request, but
not via the list.)

> I was running 8 k-point parallel and 8 mpi parallel (suggested by the system engineer as I don't have much idea what is the best choice) with these options:
>
> #!/usr/bin/tcsh
> #QSUB2 queue qaM
> #QSUB2 core 8
> #QSUB2 mpi 8
> #QSUB2 smp 1
>
> These lines provided by the system support and it seems we do not use dynamic script to find the available nodes/cores in run time but we specify the number of cores/nodes we want to use in our script (we are using a point system to run our scripts) then we submit our job in the queue. I could specify up to maybe 24 nodes x 8 cores/node (#QSUB2 core 192, and  #QSUB2 mpi 192) in the script - would that help speed up the calculation if I did in the current system?
>
> The system engineer thought it might not be better for win2k as wien2k is most efficient in k-point parallel. So he suggested me to use 8 k-pont parallel and 8 mpi parallel - even though he is not sure what the best choice is - and I have less idea about the best choice in the current system.
>
> Below is the complete script for submitting my previous job:
> -------------------------
> #!/usr/bin/tcsh
> #QSUB2 queue qaM
> #QSUB2 core 8
> #QSUB2 mpi 8
> #QSUB2 smp 1
>
> cd ${PBS_O_WORKDIR}
>
> source /etc/profile.d/modules.csh
> module load intel11.1/sgimpt
>
> cat $PBS_NODEFILE > .machines_current
> set aa=`wc .machines_current`
> echo '#' > .machines
>
> # example for an MPI parallel lapw0
> echo -n 'lapw0:' >> .machines
> set i=1
> while ($i < $aa[1])
> echo -n `cat $PBS_NODEFILE |head -$i | tail -1` ' ' >> .machines
> @ i ++
> end
> echo  `cat $PBS_NODEFILE |head -$i|tail -1` ' ' >> .machines
>
> #example for k-point parallel lapw1/2
> set i=1
> while ($i <= $aa[1])
> echo -n '1:' >> .machines
> head -$i .machines_current |tail -1 >> .machines
> @ i ++
> end
> echo 'granularity:1' >> .machines
> echo 'extrafine:1' >> .machines
>
> runsp_lapw -ec 0.0001 -p
> ------------------------------
>
> Any comment or suggestion would be highly appreciated.
>
> Thank you very much.
> Qiwen
>
> ------Original Message------
> From:"Aaron"<nkleof at gmail.com>
> To:<Wien at zeus.theochem.tuwien.ac.at>
> Cc:
> Subject:[Wien] problem on electronic structure in slab
> Date:07/29/2011 04:25:23 PM(+0800)
>>_______________________________________________
>>Wien mailing list
>>Wien at zeus.theochem.tuwien.ac.at
>>http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>>
>> << 1.2.html >>
>
> **********************************************************
>
> Dr QiWen YAO
>
> JSPS Fellow
> Multifunctional Materials Group
> Optical and Electronic Materials Unit
> Environment and Energy Materials Research Division
>
> National Institute for Materials Science
>
> 1-2-1 Sengen, Tsukuba, Ibaraki 305-0047, Japan
> Phone: +81-29-851-3354, ext. no. 6482, Fax: +81-29-859-2501
>
> **********************************************************
>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>



-- 
Laurence Marks
Department of Materials Science and Engineering
MSE Rm 2036 Cook Hall
2220 N Campus Drive
Northwestern University
Evanston, IL 60208, USA
Tel: (847) 491-3996 Fax: (847) 491-7820
email: L-marks at northwestern dot edu
Web: www.numis.northwestern.edu
Chair, Commission on Electron Crystallography of IUCR
www.numis.northwestern.edu/
Research is to see what everybody else has seen, and to think what
nobody else has thought
Albert Szent-Gyorgi


More information about the Wien mailing list