[Wien] MPI Problem

Mon Jan 23 09:08:06 CET 2012

Read the UG about mpi-parallelization.

It is not supposed to give you any performance for a "TiC" case. It is useful ONLY for
larger cases.

Using 5 mpi processes is particular bad. One should divide the
matrices into 2x2, 4x4 or (for your 32 core machines into 4x8, but
  not into 1x5, 1x7, 1x13,..., 1x31, ...)

Am 23.01.2012 06:03, schrieb Paul Fons:
> Thank you very much for your suggestion. I actually managed to figure this out by myself an hour or so ago. At the same time (usually not a good idea) I also compiled the mkl
> interface for fftw2 rather than using the standalone version I had compiled by myself earlier. Thus the RP library line looks like
>
> -lmkl_scalapack_lp64 -lmkl_solver_lp64 -lmkl_blacs_intelmpi_lp64 /opt/intel/mkl/lib/intel64/libfftw2x_cdft_DOUBLE.a $(R_LIBS)
>
> All compiles
>
> In the past ten minutes I have been running the TiC example under mpi and I must say the mpi version seems tricker to use than I first though as each mpi process seems to use
> multiple threads and it is easy to over saturate the number of cores one has. I have been testing the mpi version on a single node (32 cores) of a 500 core cluster to get a feel
> for things.
>
> Running the TiC calculation (with 4000 k-points) in serial mode (the mkl routine uses multiple threads) took about 75 seconds
>
> Running the sample calculation in parallel mode (with 5 mpi processes each which spawn multiple threads for an average cpu load of 28 in a 32 core machine) takes longer,
> specifically 300 seconds.
>
> Carrying out the same calculation with "1:localhost* on six separate lines (e.g. lapw runs six processes locally and then aggregates without using mpi) finishes quickest of all in
> about 60 seconds.
>
>
> I don't really understand what is going on. Obviously I don't really need to use the mpi/or parallel calculation for small problems as in the TiC example, but I would like to use
> it for larger supercells. Is there some sort of guiding principle which will help me decide which approach is best? I assume the parallelization by running multiple jobs is also
> limited to a single machine so this won't scale very well.
>
> Thanks for any advice.
>
> Paul
>
> On Jan 23, 2012, at 8:08 AM, Laurence Marks wrote:
>
>> A guess: you are using the wrong version of blacs. You need a
>> -lmkl_blacs_intelmpi_XX
>> where "XX" is the one for your system. I have seen this give the same error.
>>
>> Use http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/
>>
>> For reference, with openmpi it is _openmpi_ instead of _intelmpi_, and
>> similarly for sgi.
>>
>> 2012/1/22 Paul Fons <paul-fons at aist.go.jp <mailto:paul-fons at aist.go.jp>>:
>>>
>>> Hi,
>>> I have Wien2K running on a cluster of linux boxes each with 32 cores and
>>> connected by 10Gb ethernet. I have compiled Wien2K by the 3.174 version of
>>> Wien2K (I learned the hard way that bugs in the newer versions of the Intel
>>> compiler lead to crashes in Wien2K). I have also installed Intel's MPI.
>>> First, the single process Wien2K, let's say for the TiC case, works fine.
>>> It also works fine when I use a .machines file like
>>>
>>> granulaity:1
>>> localhost:1
>>> localhost:1
>>> … (24 times).
>>>
>>> This file leads to parallel execution without error. I can vary the number
>>> of processes by increasing the number of localhost:1 and the number of
>>> localhost:1 lines in the file and still everything works fine. When I try
>>> to use mpi to communicate with one process, it works as well.
>>>
>>> 1:localhost:1
>>>
>>> lstarting parallel lapw1 at Mon Jan 23 06:49:16 JST 2012
>>>
>>> -> starting parallel LAPW1 jobs at Mon Jan 23 06:49:16 JST 2012
>>> running LAPW1 in parallel mode (using .machines)
>>> 1 number_of_parallel_jobs
>>> [1] 22417
>>> LAPW1 END
>>> [1] + Done ( cd $PWD; $t $exe ${def}_$loop.def; rm
>>> -f .lock_$lockfile[$p] ) >> .time1_$loop
>>> localhost(111) 179.004u 4.635s 0:32.73 561.0%0+0k 0+26392io 0pf+0w
>>> Summary of lapw1para:
>>> localhostk=111user=179.004wallclock=32.73
>>> 179.167u 4.791s 0:35.61 516.5%0+0k 0+26624io 0pf+0w
>>>
>>>
>>> Changing the machine file to use more than one process (the same form of
>>> error occurs for more than 2)
>>>
>>> 1:localhost:2
>>>
>>> lead to a run time error in the MPI subsystem.
>>>
>>> starting parallel lapw1 at Mon Jan 23 06:51:04 JST 2012
>>> -> starting parallel LAPW1 jobs at Mon Jan 23 06:51:04 JST 2012
>>> running LAPW1 in parallel mode (using .machines)
>>> 1 number_of_parallel_jobs
>>> [1] 22673
>>> Fatal error in MPI_Comm_size: Invalid communicator, error stack:
>>> MPI_Comm_size(123): MPI_Comm_size(comm=0x5b, size=0x7ed20c) failed
>>> MPI_Comm_size(76).: Invalid communicator
>>> Fatal error in MPI_Comm_size: Invalid communicator, error stack:
>>> MPI_Comm_size(123): MPI_Comm_size(comm=0x5b, size=0x7ed20c) failed
>>> MPI_Comm_size(76).: Invalid communicator
>>> [1] + Done ( cd $PWD; $t $ttt; rm -f
>>> .lock_$lockfile[$p] ) >> .time1_$loop
>>> localhost localhost(111) APPLICATION TERMINATED WITH THE EXIT STRING:
>>> Hangup (signal 1)
>>> 0.037u 0.036s 0:00.06 100.0%0+0k 0+0io 0pf+0w
>>> TiC.scf1_1: No such file or directory.
>>> Summary of lapw1para:
>>> localhostk=0user=111wallclock=0
>>> 0.105u 0.168s 0:03.21 8.0%0+0k 0+216io 0pf+0w
>>>
>>>
>>> I have properly sourced the appropriate runtime environment for the Intel
>>> system. For example, compiling (mpiifort) and running the f90 mpi test
>>> program from intel produces:
>>>
>>>
>>>
>>> mpirun -np 32 /home/paulfons/mpitest/testf90
>>> Hello world: rank 0 of 32 running on
>>> asccmp177
>>>
>>>
>>> Hello world: rank 1 of 32 running on (32 times)
>>>
>>>
>>> Does anyone have any suggestions as to what to try next? I am not sure how
>>> to debug things from here. I have about 512 nodes that I can use for larger
>>> calculations that only can be accessed by mpi (the ssh setup works fine as
>>> well by the way). It would be great to figure out what is wrong.
>>>
>>> Thanks.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Dr. Paul Fons
>>> Functional Nano-phase-change Research Team
>>> Team Leader
>>> Nanodevice Innovation Research Center (NIRC)
>>> National Institute for Advanced Industrial Science & Technology
>>> METI
>>>
>>> AIST Central 4, Higashi 1-1-1
>>> Tsukuba, Ibaraki JAPAN 305-8568
>>>
>>> tel. +81-298-61-5636
>>> fax. +81-298-61-2939
>>>
>>> email: paul-fons at aist.go.jp <mailto:paul-fons at aist.go.jp>
>>>
>>> The following lines are in a Japanese font
>>>
>>> 〒305-8562 茨城県つくば市つくば中央東 1-1-1
>>> 産業技術総合研究所
>>> ナノ電子デバイス研究センター
>>> 相変化新規機能デバイス研究チーム　チームリーダー
>>> ポール・フォンス
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Wien mailing list
>>> Wien at zeus.theochem.tuwien.ac.at <mailto:Wien at zeus.theochem.tuwien.ac.at>
>>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>>>
>>
>>
>>
>> --
>> Professor Laurence Marks
>> Department of Materials Science and Engineering
>> Northwestern University
>> www.numis.northwestern.edu <http://www.numis.northwestern.edu> 1-847-491-3996
>> "Research is to see what everybody else has seen, and to think what
>> nobody else has thought"
>> Albert Szent-Gyorgi
>> _______________________________________________
>> Wien mailing list
>> Wien at zeus.theochem.tuwien.ac.at <mailto:Wien at zeus.theochem.tuwien.ac.at>
>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>
> Dr. Paul Fons
> Functional Nano-phase-change Research Team
> Team Leader
> Nanodevice Innovation Research Center (NIRC)
> National Institute for Advanced Industrial Science & Technology
> METI
>
> AIST Central 4, Higashi 1-1-1
> Tsukuba, Ibaraki JAPAN 305-8568
>
> tel. +81-298-61-5636
> fax. +81-298-61-2939
>
> email: _paul-fons at aist.go.jp <mailto:paul-fons at aist.go.jp>_
>
> The following lines are in a Japanese font
>
> 〒305-8562 茨城県つくば市つくば中央東 1-1-1
> 産業技術総合研究所
> ナノ電子デバイス研究センター
> 相変化新規機能デバイス研究チーム　チームリーダー
> ポール・フォンス
>
>
>
>
>
>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien

-- 

                                       P.Blaha
--------------------------------------------------------------------------
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-165300             FAX: +43-1-58801-165982
Email: blaha at theochem.tuwien.ac.at    WWW: http://info.tuwien.ac.at/theochem/
--------------------------------------------------------------------------