[Wien] MPI problem for LAPW2

Laurence Marks L-marks at northwestern.edu
Wed Sep 30 18:17:53 CEST 2009


It sounds like you are memory limited (RAM). If you use
lapw2_vector_split:N then the arrays used by lapw2 are split into N
parts. If M is the size of the array, then the total memory
requirement with the method you are using will always be M. If you
have 21 atoms (total) I am surprised that you have this problem,
perhaps you need more memory. (If it is 21 unique atoms then it is
possible, but still surprising.) Have a look at /proc/meminfo and if
you are running ganglia look at your memory records. You need
something like 2Gb per core and more might be better for newer
systems; with 1Gb or less per core you can easily run into this sort
of problem.

2009/9/30 Duy Le <ttduyle at gmail.com>:
> Thank you for your all inputs.
> I am running test on a system of 21 atoms, with spin polarized calculation,
> with 2 k-points, without inversion symmetry. Of course this test only with
> small system. So there would be no problem with the matrix size. The
> .machines file I have provided in my previous email.
> Good news, the problem has been solved. By using:
> lapw2_vector_split:$NCUS_per_MPI_JOB
> I am able to finish the benchmark test with 1, 2, 4, 8, 16 CPUS (on the same
> nodes) by fully MPI or by hydrid K-parallel& MPI.
>
> I am really not sure the way I do is correct.
> (lapw2_vector_split:$NCUS_per_MPI_JOB)
> Could anyone explain this for me? I am pretty new with Wien2k.
> Thank you.
> On Wed, Sep 30, 2009 at 3:12 AM, Peter Blaha <pblaha at theochem.tuwien.ac.at>
> wrote:
>>
>> Very unusual, I cannot believe that 3 or 7 nodes run efficiently (lapw1)
>> or
>> are necessary.
>> Maybe memory is an issue and you should try to set
>>
>> lapw2_vector_split:2
>>
>> (with a even number of processors!)
>>
>>> I can run mpi with lapw0, lapw1, and lapw2. However, lapw2 can run
>>> without problem with certain number of PROCESSORS PER MPI JOB (in both
>>> cases: fully mpi and/or hybrid k-parallel+mpi). Those certain numbers are 3
>>> and 7. If I try to run with other numbers of PROCESSORS PER MPI JOB, it
>>> gives me an message like below. This problem doesn't occur with lapw0 and
>>> lapw1. If any of you could give me some suggestion of fixing this problem,
>>> it would be appreciated.
>>>
>>> [compute-0-2.local:08162] *** An error occurred in MPI_Comm_split
>>> [compute-0-2.local:08162] *** on communicator MPI_COMM_WORLD
>>> [compute-0-2.local:08162] *** MPI_ERR_ARG: invalid argument of some other
>>> kind
>>> [compute-0-2.local:08162] *** MPI_ERRORS_ARE_FATAL (goodbye)
>>> forrtl: error (78): process killed (SIGTERM)
>>> Image              PC                Routine            Line
>>>  Source          libpthread.so.0    000000383440DE80  Unknown
>>> Unknown  Unknown
>>> ........... etc....
>>>
>>>
>>> Reference:
>>> OPTIONS file:
>>> current:FOPT:-FR -mp1 -w -prec_div -pc80 -pad -align -DINTEL_VML
>>> -traceback
>>> current:FPOPT:$(FOPT)
>>> current:LDFLAGS:$(FOPT) -L/share/apps/fftw-3.2.1/lib/ -lfftw3
>>> -L/share/apps/inte
>>> l/mkl/10.0.011/lib/em64t -i-static -openmp
>>> current:DPARALLEL:'-DParallel'
>>> current:R_LIBS:-lmkl_lapack -lmkl_core -lmkl_em64t -lguide -lpthread
>>> current:RP_LIBS:-lmkl_scalapack_lp64 -lmkl_solver_lp64_sequential
>>> -Wl,--start-gr
>>> oup -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lmkl_blacs_openmpi_lp64
>>> -Wl,--
>>> end-group -lpthread -lmkl_em64t -L/share/apps/intel/fce/10.1.008/lib
>>> -limf
>>> current:MPIRUN:mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_
>>>
>>> Openmpi 1.2.6
>>> Intel compiler 10
>>>
>>> .machines
>>> lapw0:compute-0-2:4
>>> 1:compute-0-2:4
>>> granularity:1
>>> extrafine:1
>>> lapw2_vector_split:1
>>>
>>> --------------------------------------------------
>>> Duy Le
>>> PhD Student
>>> Department of Physics
>>> University of Central Florida.
>>>
>>>
>>> ------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> Wien mailing list
>>> Wien at zeus.theochem.tuwien.ac.at
>>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>>
>> --
>>
>>                                      P.Blaha
>> --------------------------------------------------------------------------
>> Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
>> Phone: +43-1-58801-15671             FAX: +43-1-58801-15698
>> Email: blaha at theochem.tuwien.ac.at    WWW:
>> http://info.tuwien.ac.at/theochem/
>> --------------------------------------------------------------------------
>> _______________________________________________
>> Wien mailing list
>> Wien at zeus.theochem.tuwien.ac.at
>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>
>
>
> --
> --------------------------------------------------
> Duy Le
> PhD Student
> Department of Physics
> University of Central Florida.
>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>
>



-- 
Laurence Marks
Department of Materials Science and Engineering
MSE Rm 2036 Cook Hall
2220 N Campus Drive
Northwestern University
Evanston, IL 60208, USA
Tel: (847) 491-3996 Fax: (847) 491-7820
email: L-marks at northwestern dot edu
Web: www.numis.northwestern.edu
Chair, Commission on Electron Crystallography of IUCR
www.numis.northwestern.edu/
Electron crystallography is the branch of science that uses electron
scattering and imaging to study the structure of matter.


More information about the Wien mailing list