[Wien] Parallel Wien2k using Intel MPI?

Laurence Marks L-marks at northwestern.edu
Sun Nov 14 17:02:02 CET 2010


One addendum. Torque-MOAB probably sets up some default files for you
in many cases under the assumption that all you are doing is running a
single mpi task using all the nodes you asked for. You might be able
to get away with something like changing to

setenv WIEN_MPIRUN "mpirun _EXEC_"

and a machines file such as
lapw0: cn002:8 cn004:8 cn016:8 cn018:8
1: cn002:8 cn004:8 cn016:8 cn018:8
granularity:1
extrafine:1

so in effect you are running 1 mpi job on all the nodes with the MOAB
defaults. (You might need -np _NP_ is the WIEN_MPIRUN, you have to
experiment and read your mpirun instructions, e.g.  "mpirun --help" or
"man mpirun".) However, this is not very efficient

On Sun, Nov 14, 2010 at 9:53 AM, Laurence Marks
<L-marks at northwestern.edu> wrote:
> I don't think that this has much to do with Wien2k, it is an issue
> with how you are setting up your mpi. From the looks of it you are
> using MPICH2, whereas most of the scripts in Wien2k are setup to use
> MPICH1 which is rather simpler. For MPICH2 you have to setup the mpd
> daemon and configuration files, which is very different from the
> simpler hostfile structure of MPICH1.
>
> (I personally have never got Wien2k running smoothly with MPICH2, but
> have not tried too hard. If  anyone has a detailed description this
> would be a useful post.)
>
> You can find some information about the other steps you need for
> MPICH2 on the web, e.g.
>
> http://developer.amd.com/documentation/articles/pages/HPCHighPerformanceLinpack.aspx
>
> and google searches on "WARNING: Unable to read mpd.hosts or list of
> hosts isn't provided"
>
> On Sun, Nov 14, 2010 at 3:19 AM, Stefan Becuwe <stefan.becuwe at ua.ac.be> wrote:
>>
>> Hello,
>>
>> Our problem is more or less related to Wei Xie's postings of two weeks ago.
>>  We can't get Wien2k 10.1 running using the MPI setup.  Serial versions and
>> parallel versions based on ssh do work.  Since his solution does not seem to
>> work for us, I'll describe our problem/setup.
>>
>> FYI: the Intel MPI setup does work for lots of other programs on our
>> cluster, so I guess it must be an Intel MPI-Wien2k(-Torque-MOAB) specific
>> problem.
>>
>> Software environment:
>>
>> icc/ifort: 11.1.073
>> impi:      4.0.0.028
>> imkl:      10.2.6.038
>> FFTW:      2.1.5
>> Torque/MOAB
>>
>>
>> $ cat parallel_options
>> setenv USE_REMOTE 1
>> setenv MPI_REMOTE 1
>> setenv WIEN_GRANULARITY 1
>> setenv WIEN_MPIRUN "mpirun -r ssh -np _NP_ _EXEC_"
>>
>>
>> Call:
>>
>> clean_lapw -s
>> run_lapw -p -ec 0.00001 -i 1000
>>
>>
>> $ cat .machines
>> lapw0: cn002:8 cn004:8 cn016:8 cn018:8
>> 1: cn002:8
>> 1: cn004:8
>> 1: cn016:8
>> 1: cn018:8
>> granularity:1
>> extrafine:1
>>
>>
>> Also, the appropriate .machine1, .machine2, etc are generated.
>>
>>
>> $ cat TiC.dayfile
>> [...]
>>>
>>>  lapw0 -p    (09:59:34) starting parallel lapw0 at Sun Nov 14 09:59:34 CET
>>> 2010
>>
>> -------- .machine0 : 32 processors
>> 0.428u 0.255s 0:05.12 13.0%     0+0k 0+0io 0pf+0w
>>>
>>>  lapw1  -p   (09:59:39) starting parallel lapw1 at Sun Nov 14 09:59:39 CET
>>> 2010
>>
>> ->  starting parallel LAPW1 jobs at Sun Nov 14 09:59:39 CET 2010
>> running LAPW1 in parallel mode (using .machines)
>> 4 number_of_parallel_jobs
>>     cn002 cn002 cn002 cn002 cn002 cn002 cn002 cn002(1) WARNING: Unable to
>> read mpd.hosts or list of hosts isn't provided. MPI job will be run on the
>> current machine only.
>> rank 5 in job 1  cn002_55855   caused collective abort of all ranks
>>  exit status of rank 5: killed by signal 9
>> rank 4 in job 1  cn002_55855   caused collective abort of all ranks
>>  exit status of rank 4: killed by signal 9
>> rank 3 in job 1  cn002_55855   caused collective abort of all ranks
>>  exit status of rank 3: killed by signal 9
>> [...]
>>
>>
>> Specifying -hostfile in the WIEN_MPIRUN variable results in the following
>> error
>>
>> invalid "local" arg: -hostfile
>>
>>
>> Thanks in advance for helping us running Wien2k in an MPI setup ;-)
>>
>> Regards
>>
>>
>> Stefan Becuwe
>> _______________________________________________
>> Wien mailing list
>> Wien at zeus.theochem.tuwien.ac.at
>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>>
>
>
>
> --
> Laurence Marks
> Department of Materials Science and Engineering
> MSE Rm 2036 Cook Hall
> 2220 N Campus Drive
> Northwestern University
> Evanston, IL 60208, USA
> Tel: (847) 491-3996 Fax: (847) 491-7820
> email: L-marks at northwestern dot edu
> Web: www.numis.northwestern.edu
> Chair, Commission on Electron Crystallography of IUCR
> www.numis.northwestern.edu/
> Electron crystallography is the branch of science that uses electron
> scattering and imaging to study the structure of matter.
>



-- 
Laurence Marks
Department of Materials Science and Engineering
MSE Rm 2036 Cook Hall
2220 N Campus Drive
Northwestern University
Evanston, IL 60208, USA
Tel: (847) 491-3996 Fax: (847) 491-7820
email: L-marks at northwestern dot edu
Web: www.numis.northwestern.edu
Chair, Commission on Electron Crystallography of IUCR
www.numis.northwestern.edu/
Electron crystallography is the branch of science that uses electron
scattering and imaging to study the structure of matter.


More information about the Wien mailing list