[Wien] k-point parallel job in distributed file system

Thu Aug 17 13:10:03 CEST 2006

There is the pbs script from the wien2k FAQ. I modified a little. It works. 

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
#!/bin/tcsh -f

#PBS -l nodes=2:nonvasp
##PBS -l walltime=8:0:0
#PBS -N test
#PBS -j oe
#PBS -q qa

cd $PBS_O_WORKDIR

# setting up local SCRATCH
setenv SCRATCH  /tmp/$PBS_JOBID

# creating .machines
cat $PBS_NODEFILE > .machines_current
set aa=`wc .machines_current`
echo '#' > .machines

# run lapw1/2 using k-point parallel
set i=1
while ($i <= $aa[1])
  echo -n '1:' >> .machines
#  head -$i .machines_current |tail -1 >> .machines
  set mn = `head -$i .machines_current |tail -1`
  echo $mn >> .machines
  rsh $mn mkdir -p $SCRATCH
  @ i++
end
echo 'granularity:1' >>.machines
echo 'extrafine:1' >>.machines

# Wien2k command
runsp_lapw -p -i 200 -ec 0.000001 -NI

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

-----Original Message-----
From: wien-bounces at zeus.theochem.tuwien.ac.at
[mailto:wien-bounces at zeus.theochem.tuwien.ac.at] On Behalf Of Ravindran
Ponniah
Sent: Thursday, August 17, 2006 5:16 PM
To: A Mailing list for WIEN2k users
Cc: Torgeir Andersen Ruden
Subject: [Wien] k-point parallel job in distributed file system

Hello,

 	We are trying to do k-point parallel wien2k job in a linux cluster
which has distributed file system. Though we are able to do k-point parallel
calculation, we have a problem in assigning a common work space ($SCRATCH)
to read/write all input/output files. This means that, for example, if we do
a 10 kpoint calculation in 10 nodes, all the 10 nodes should communicate to
the common working area through ssh to read/write files. This slows down the
performance and also the network. 
So far we have done k-point parallel calculations in supercomputers with
shared memory and hence we never had such a problem.  Is it possible to do
k-point parallel calculations in distributed file system without any common
working area?

I have received the following from the system expert here.

###
Hmm, I've been looking through the jungle of scipts which constitutes
wien2k, and it is clear to me that this way of paralellizing isn't meant for
distributed filesystems (local disks on nodes). Unless the wien2k people
have a solution, I don't think we will get around this without some major
reprogramming. At least it seems so to me, but I must admit that I don't
have the complete overview of todo tasks.

Also a quick google of the proble, did not provide a solution.
This is very efficient for SMP types of machines, but is a bit ad-hoc for
cluster type computers.
On the bright side, it doesn't seem taht the program does a lot of disk
read/write in the long run. Only 10-20 min bursts of 10 megs/sek.
####

Looking forward your responses to do the computation more efficently.

Best regards
Ravi
_______________________________________________
Wien mailing list
Wien at zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien