[Wien] Doubt in mpi running of Wien2K
Marcos Veríssimo Alves
marcos.verissimo.alves at gmail.com
Wed Aug 4 17:01:10 CEST 2010
Hi all,
The setting up of the .machines file of Wien2K for a parallel run using mpi
is not very clear to me... I am searching the list and I do not get to
conclusions about it, so I am asking for your help. I'll state my problem as
concisely and precisely as I can.
I am still having problems with running Wien2K parallel over k-points (that
is, using ssh/rsh) because our cluster's AFS seems to be really unstable. So
I am going to try to compile Wien2K using mvapich, since part of the cluster
is interconnected with infiniband.
Now, the infiniband part of the cluster is composed of 16 identical machines
(let's call them machine1...machine16) with 4 cpus each. I would like to run
Wien2K in parallel over k-points but using mvapich instead of ssh. The
machines are assigned by a queuing system, but I have already easily written
a script which reads the machines file assigned by the queuing system,
determines the machines assigned, and how many processors of each machine
participate in the calculation. I have a number of k-points which is not a
multiple of the number of cpus assigned, so I'd like to assign one k-point
per processor, and the remaining k-points could either be done fine-grained,
or assigned individually.
To be more precise, suppose I have 32 k-points and the maximum number of
processors I got was 9 (because all the others were busy with other users'
processes). Supposing that the file with the machines assigned by the
queuing system was:
machine1 (machine1: one processor)
machine2
machine2
machine2 (machine2: three processors)
machine3 (machine3: one processor)
machine4
machine4 (machine4: two processors)
machine5
machine5 (machine5: two prcessors)
My question is: if all processor have the same speed, would the following
.machines file be valid for running processes **only with mpi** (no sending
processes over ssh whatsoever)?
#
# Hypothetical
granularity:1
extrafine:1
1:machine1:4 machine2:12 machine4:2 machine3:3 machine4:6 machine5: 6
I am so sorry to ask a question which must be extremely basic, but I
couldn't find any enlightenment in the list, and I find the example in the
manual very confusing... I thank you for any advice you can give me with
that respect.
Best regards,
Marcos
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20100804/fa85ed67/attachment.htm>
More information about the Wien
mailing list