[Wien] Parallel calculations in cluster - choices

Dr Qiwen YAO Yao.Qiwen at nims.go.jp
Sun Aug 7 12:00:53 CEST 2011


Dear Wien2k users,
I was running a supercell 3x3x1 calculation for a 4-atom double peroskite compound with spin polarized calculation.
The job was killed by the cluster because of wtime limited - see below the relevant error message:
----------
...
 LAPW2 END
 SUMPARA END
 CORE  END
 CORE  END
 MIXER END
 LAPW0 END
 LAPW0 END
 LAPW0 END
 LAPW0 END
 LAPW0 END
 LAPW0 END
 LAPW0 END
 LAPW0 END
=>> PBS: job killed: walltime 86419 exceeded limit 86400
----------
Question 1.
In a case like this - what is the best way for me to continue on the previous calculation (or if it is possible just to re-run the same job again pretending nothing happened - since the calculation itself does not crush) - if I am to restart the job again would Wien2k be able to pick up the best point and continues on? - I could not find anything similar to this in the email achieve.

Question 2. The choices of the mpi and k-parallel  - for my future calculation choices.
The cluster I am running wien2k with is a 512 node cluster with PBS. Each node is with 8 cores and it is 2.85GB/core in memory/core ratio.
I was running 8 k-point parallel and 8 mpi parallel (suggested by the system engineer as I don't have much idea what is the best choice) with these options:

#!/usr/bin/tcsh
#QSUB2 queue qaM
#QSUB2 core 8
#QSUB2 mpi 8
#QSUB2 smp 1

These lines provided by the system support and it seems we do not use dynamic script to find the available nodes/cores in run time but we specify the number of cores/nodes we want to use in our script (we are using a point system to run our scripts) then we submit our job in the queue. I could specify up to maybe 24 nodes x 8 cores/node (#QSUB2 core 192, and  #QSUB2 mpi 192) in the script - would that help speed up the calculation if I did in the current system?

The system engineer thought it might not be better for win2k as wien2k is most efficient in k-point parallel. So he suggested me to use 8 k-pont parallel and 8 mpi parallel - even though he is not sure what the best choice is - and I have less idea about the best choice in the current system.

Below is the complete script for submitting my previous job:
-------------------------
#!/usr/bin/tcsh
#QSUB2 queue qaM
#QSUB2 core 8
#QSUB2 mpi 8
#QSUB2 smp 1

cd ${PBS_O_WORKDIR}

source /etc/profile.d/modules.csh
module load intel11.1/sgimpt

cat $PBS_NODEFILE > .machines_current
set aa=`wc .machines_current`
echo '#' > .machines

# example for an MPI parallel lapw0 
echo -n 'lapw0:' >> .machines
set i=1
while ($i < $aa[1])
echo -n `cat $PBS_NODEFILE |head -$i | tail -1` ' ' >> .machines
@ i ++
end
echo  `cat $PBS_NODEFILE |head -$i|tail -1` ' ' >> .machines

#example for k-point parallel lapw1/2
set i=1
while ($i <= $aa[1])
echo -n '1:' >> .machines
head -$i .machines_current |tail -1 >> .machines
@ i ++
end
echo 'granularity:1' >> .machines
echo 'extrafine:1' >> .machines

runsp_lapw -ec 0.0001 -p  
------------------------------

Any comment or suggestion would be highly appreciated.

Thank you very much.
Qiwen

------Original Message------
From:"Aaron"<nkleof at gmail.com>
To:<Wien at zeus.theochem.tuwien.ac.at>
Cc:
Subject:[Wien] problem on electronic structure in slab
Date:07/29/2011 04:25:23 PM(+0800)
>_______________________________________________
>Wien mailing list
>Wien at zeus.theochem.tuwien.ac.at
>http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>
> << 1.2.html >>

**********************************************************

Dr QiWen YAO

JSPS Fellow
Multifunctional Materials Group
Optical and Electronic Materials Unit
Environment and Energy Materials Research Division

National Institute for Materials Science

1-2-1 Sengen, Tsukuba, Ibaraki 305-0047, Japan
Phone: +81-29-851-3354, ext. no. 6482, Fax: +81-29-859-2501

**********************************************************



More information about the Wien mailing list