[Wien] Parallel Wien2k using Intel MPI?

Laurence Marks L-marks at northwestern.edu
Sun Nov 14 16:53:06 CET 2010


I don't think that this has much to do with Wien2k, it is an issue
with how you are setting up your mpi. From the looks of it you are
using MPICH2, whereas most of the scripts in Wien2k are setup to use
MPICH1 which is rather simpler. For MPICH2 you have to setup the mpd
daemon and configuration files, which is very different from the
simpler hostfile structure of MPICH1.

(I personally have never got Wien2k running smoothly with MPICH2, but
have not tried too hard. If  anyone has a detailed description this
would be a useful post.)

You can find some information about the other steps you need for
MPICH2 on the web, e.g.

http://developer.amd.com/documentation/articles/pages/HPCHighPerformanceLinpack.aspx

and google searches on "WARNING: Unable to read mpd.hosts or list of
hosts isn't provided"

On Sun, Nov 14, 2010 at 3:19 AM, Stefan Becuwe <stefan.becuwe at ua.ac.be> wrote:
>
> Hello,
>
> Our problem is more or less related to Wei Xie's postings of two weeks ago.
>  We can't get Wien2k 10.1 running using the MPI setup.  Serial versions and
> parallel versions based on ssh do work.  Since his solution does not seem to
> work for us, I'll describe our problem/setup.
>
> FYI: the Intel MPI setup does work for lots of other programs on our
> cluster, so I guess it must be an Intel MPI-Wien2k(-Torque-MOAB) specific
> problem.
>
> Software environment:
>
> icc/ifort: 11.1.073
> impi:      4.0.0.028
> imkl:      10.2.6.038
> FFTW:      2.1.5
> Torque/MOAB
>
>
> $ cat parallel_options
> setenv USE_REMOTE 1
> setenv MPI_REMOTE 1
> setenv WIEN_GRANULARITY 1
> setenv WIEN_MPIRUN "mpirun -r ssh -np _NP_ _EXEC_"
>
>
> Call:
>
> clean_lapw -s
> run_lapw -p -ec 0.00001 -i 1000
>
>
> $ cat .machines
> lapw0: cn002:8 cn004:8 cn016:8 cn018:8
> 1: cn002:8
> 1: cn004:8
> 1: cn016:8
> 1: cn018:8
> granularity:1
> extrafine:1
>
>
> Also, the appropriate .machine1, .machine2, etc are generated.
>
>
> $ cat TiC.dayfile
> [...]
>>
>>  lapw0 -p    (09:59:34) starting parallel lapw0 at Sun Nov 14 09:59:34 CET
>> 2010
>
> -------- .machine0 : 32 processors
> 0.428u 0.255s 0:05.12 13.0%     0+0k 0+0io 0pf+0w
>>
>>  lapw1  -p   (09:59:39) starting parallel lapw1 at Sun Nov 14 09:59:39 CET
>> 2010
>
> ->  starting parallel LAPW1 jobs at Sun Nov 14 09:59:39 CET 2010
> running LAPW1 in parallel mode (using .machines)
> 4 number_of_parallel_jobs
>     cn002 cn002 cn002 cn002 cn002 cn002 cn002 cn002(1) WARNING: Unable to
> read mpd.hosts or list of hosts isn't provided. MPI job will be run on the
> current machine only.
> rank 5 in job 1  cn002_55855   caused collective abort of all ranks
>  exit status of rank 5: killed by signal 9
> rank 4 in job 1  cn002_55855   caused collective abort of all ranks
>  exit status of rank 4: killed by signal 9
> rank 3 in job 1  cn002_55855   caused collective abort of all ranks
>  exit status of rank 3: killed by signal 9
> [...]
>
>
> Specifying -hostfile in the WIEN_MPIRUN variable results in the following
> error
>
> invalid "local" arg: -hostfile
>
>
> Thanks in advance for helping us running Wien2k in an MPI setup ;-)
>
> Regards
>
>
> Stefan Becuwe
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>



-- 
Laurence Marks
Department of Materials Science and Engineering
MSE Rm 2036 Cook Hall
2220 N Campus Drive
Northwestern University
Evanston, IL 60208, USA
Tel: (847) 491-3996 Fax: (847) 491-7820
email: L-marks at northwestern dot edu
Web: www.numis.northwestern.edu
Chair, Commission on Electron Crystallography of IUCR
www.numis.northwestern.edu/
Electron crystallography is the branch of science that uses electron
scattering and imaging to study the structure of matter.


More information about the Wien mailing list