[Wien] Parallel Options

Ghosh SUDDHASATTWA ssghosh at igcar.gov.in
Wed Sep 7 07:36:17 CEST 2011


Dear Wien2k users, 

We have compiled Wien2k_11.1 with the following parallel options 

setenv USE_REMOTE 1

setenv MPI_REMOTE 1

setenv WIEN_GRANULARITY 1

setenv WIEN_MPIRUN "mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_"

 

The k-point parallel start up script is given by 

#!/bin/bash

#

# RJ: Startup for Wien2k-kpoint parallel conforming with Grid Engine

# parallel environment interface

#

# usage: start_kpoint.sh <pe_hostfile>

#

PeHostfile2Wien2kMachineFile()

{

   cat $1 | while read line; do

      # echo $line

      host=`echo $line|cut -f1 -d" "|cut -f1 -d"."`

      nslots=`echo $line|cut -f2 -d" "`

      # add here code to map regular hostnames into IB hostnames

      for ((i=0; i < $nslots; i=i+1)); do

          echo 1:i$host

      done

   done

   echo 'granularity:1'

   echo 'extrafine:1'

}

 

# useful to control parameters passed to us

echo $*

 

SLEEPTIME=5

RETRIES=10

 

me=`basename $0`

 

# test number of args

if [ $# -lt 1 ]; then

   echo "$me: got wrong number of arguments" >&2

   exit 1

fi

 

# get arguments

pe_hostfile=$1

 

# ensure pe_hostfile is readable

if [ ! -r $pe_hostfile ]; then

   echo "$me: can't read $pe_hostfile" >&2

   exit 1

fi

# create machine-file

# remove column with number of slots per queue

# mpi does not support them in this form

machines="$TMPDIR/machines.wien2k-kpoint"

pwdir=`pwd`

PeHostfile2Wien2kMachineFile $pe_hostfile >> $machines

cat $machines

hostname

#scp $machines nx0:$pwdir/machines

 

The SGE job script is given by 

#!/bin/bash

#

#$ -cwd

#$ -j y

#$ -S /bin/bash

#$ -V

#$ -pe kpoint 2-

 

# RJ: Script to run Wien2k-kpoint parallel job thru SGE

# use kpoint PE

 

#echo "Hostname: "

#hostname

 

#echo "No. of Slots"

#echo $NSLOTS

 

# machines.wien2k-kpoint would be created by

# start_kpoint.sh PE script at $TMPDIR

echo "Wien2k Machine file $TMPDIR/machines"

mf=`cat $TMPDIR/machines.wien2k-kpoint`

echo $mf

 

cp $TMPDIR/machines.wien2k-kpoint .machines

 

# RJ: command for kpoint parallel run

runsp_lapw -cc 0.0001 -ec 0.00001 -in1ef -i 200 -p

Now, we have 12 processors in 1 node. 

When we do 

Qsub -pe kpoint 12 kpoint.sh 

The script works 

But when we do 

Qsub -pe kpoint 16 kpoint.sh

 

It doesn't

 

Can anybody suggest what the problem is and if any changes in the job script
is required 

 

Thanks in advance 

Suddhasattwa 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20110907/6732acbe/attachment.htm>


More information about the Wien mailing list