[Wien] Problem in running k-point parallel jobs

Tue Aug 15 14:46:07 CEST 2006

Hello,

well, this would be very inefficient! Forget MPI if you have enough 
k-points to do k-point parallelization.

There has to be a list somewhere (or your sysadm has to create it for 
you) in the queue system of the machines you have been allocated. You 
can then with this list and a bit of scripting create a suitable 
.machines file.

On one of the clusters I use, the list of allocated CPU's is in 
$TMPDIR/machines, and I build a .machines file in the submitted 
job-shell like this:

<----
# Build the Wien2k ".machines" file - this should be rebuilt every time
# The allocated CPU's are listed in $TMPDIR/machines
if (-e .machines) rm -f .machines
echo "granularity:1" > .machines
echo "extrafine:1" >> .machines
sed 's/aix/1:aix/g' $TMPDIR/machines >> .machines
<----

where $TMPDIR/machines could look like this (example) during runtime - 
it should be cleared at the end of the job, of course, and nonexisting 
before the job begins...

<----
aixhp7
aixhp7
aixhp8
aixhp8
aixhp1
aixhp1
aixhp1
aixhp9
<----

for an 8-CPU job. But all depends on how your queue system is configured...

Best regards,
Torsten Andersen.

Ravindran Ponniah wrote:
> Hello,
> 
>  	I am trying to setup k-point parallel jobs in a linux cluster 
> here. If we ask for 8 cpus (for 8 kpt job), the queuing system allotting 
> correctly 8 cpus. But, the jobs are running only in the master node (i.e. in 2 cpus) 
> and the remaining 6 cpus are idle. We never had such problem in shared 
> memory systems. I am enclosing herewith the message I have received from 
> the system expert. Please inform me where we should look for to solve this 
> problem.
> 
> Best regards
> Ravi
> ###### communication from system expert
> Yes, it was run in parallel, but only on one node. If you don't use mpiexec, 
> your executables doesn't start on all nodes. So your 8 processes where running 
> on one node (that is 2 cpu's), while the other 6 processors are idle.
> 
> Please look at the load of your nodes you are currently using on 
> http://master.titan.uio.no/ganglia/:
> 
> -bash-3.00$ qstat -g t | grep ravi
>    41798 0.25746 YBC5SO     ravi         r     08/14/2006 12:31:28 
> kjemi at compute-1-0.local        SLAVE
>    42404 0.25656 YBM6U      ravi         r     08/15/2006 10:48:09 
> kjemi at compute-1-13.local       SLAVE
>    41798 0.25746 YBC5SO     ravi         r     08/14/2006 12:31:28 
> kjemi at compute-1-15.local       SLAVE
>    41798 0.25746 YBC5SO     ravi         r     08/14/2006 12:31:28 
> kjemi at compute-1-26.local       SLAVE
>    41798 0.25746 YBC5SO     ravi         r     08/14/2006 12:31:28 
> kjemi at compute-1-33.local       SLAVE
>    41798 0.25746 YBC5SO     ravi         r     08/14/2006 12:31:28 
> kjemi at compute-1-8.local        MASTER
>    42404 0.25656 YBM6U      ravi         r     08/15/2006 10:48:09 
> kjemi at compute-2-0.local        SLAVE
>    42404 0.25656 YBM6U      ravi         r     08/15/2006 10:48:09 
> kjemi at compute-2-11.local       MASTER
> 
> While your two master-nodes, 1-8 and 2-11, have a load of about 8 and 5 (8 and 
> 5 processes) respectively, your slave nodes has a load of 0. You can also see 
> this by logging into a master and slave node and do a:
> 
> ps -ef | grep ravi
> 
> We need to figure out a way to invoke mpiexec somewhere in order for this to 
> run in parallel properly (at least above using 2 cpu's).
> 
> best
> Torgeir
> 
> 
> On Tue, 15 Aug 2006, Ravindran Ponniah wrote:
> 
> 
>>On Tue, 15 Aug 2006, Torgeir Andersen Ruden wrote:
>>
>>
>>>It doesn't seem that you invoke mpiexec anywhere. You need to do this in 
>>>order for parallel on clusters to work. Which part is supposed be parallel?
>>
>>In wien2k code there are two ways the jobs were parallelized. One is through 
>>k-point parallelization and the other is called finegrain parallelization. We 
>>are using the k-point parallelization. It will split the kpoints depends upon 
>>the number of cpus used and run it in different nodes. See our dayfile
>>
>>###
>>LAPW0 END
>>LAPW1 END
>>LAPW1 END
>>LAPW1 END
>>LAPW1 END
>>LAPW1 END
>>LAPW1 END
>>LAPW1 END
>>LAPW1 END
>>LAPW1 END
>>LAPW1 END
>>LAPW1 END
>>LAPW1 END
>>LAPW1 END
>>LAPW1 END
>>LAPW1 END
>>LAPW1 END
>>LAPWSO END
>>LAPWSO END
>>LAPWSO END
>>LAPWSO END
>>LAPWSO END
>>LAPWSO END
>>LAPWSO END
>>LAPWSO END
>>LAPW2 - FERMI; weighs written
>>LAPW2 END
>>LAPW2 END
>>LAPW2 END
>>LAPW2 END
>>LAPW2 END
>>LAPW2 END
>>LAPW2 END
>>LAPW2 END
>>SUMPARA END
>>SUMPARA END
>>LAPW2 - FERMI; weighs written
>>###########
>>
>>We have used 8 cpu for the above calculation and hence lapw1, lapw2, lapwso 
>>are ran in 8 cpus. So, though we have not executed mpiexec the job was 
>>running in parallel.
> 
> 
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> 

-- 
Dr. Torsten Andersen        TA-web: http://deep.at/myspace/
AG Hübner, Department of Physics, Kaiserslautern University
http://cmt.physik.uni-kl.de    http://www.physik.uni-kl.de/