[Wien] Problems with LAPW1 running in PARALLEL

Laurence Marks L-marks at northwestern.edu
Sun Feb 8 21:07:19 CET 2009


Two comments:
1) Don't use threads and mpi. They fight with each other and lead to
chaos in many cases.
2) Don't use mpi with many nodes for small problems in lapw1. You
should only be using mpi extensively for large problems. The k-point
parallelization is much more efficient than trying to run one task
with mpi and many nodes. The reason is that in the second case you end
up wasting a lot of time on communications.

For 61 k-points and 8 cpu's I would split this with 7,7,7,8,8,8,8,8
k-points, although depending upon the cell size it may in fact be
faster to use less than 8 cpus.

3) I would consider mvapich -- not all implimentations of mpi are the same.

On Sun, Feb 8, 2009 at 1:28 PM,  <ronunez at gauss.mat.uson.mx> wrote:
>
> I have the next problem using WIEN2k 8.3 with MPI...
>
> I have compiled the wien2k 8.3 in a Cray XD1 (Opteron cpus),
> with pgf90 compiler (v 6.23), ACML libs, GOTO libs 1.26, and MPICH 1.26,
> without errors...
>
> I run the mpi-benchmark without problems, using 8 cpus and 1 thread in the
> GOTO libs, because with 2 thread the GOTO libs freeze sometimes...
>
> Next, I try to run an SCF cycle of In2O3, using 61
> k point in IBZ and 8 cpus. The mpi version of lapw0 run without
> problems, but the mpi lapw1 begin without problems but past some time
> crash!!!!
>
> I found the next message in In2O3.dayfile:
>
> **  LAPW1 crashed!
> 0.288u 0.596s 2:56.59 0.4%      0+0k 0+0io 3pf+0w
> error: command   /mnt/san/home/uson/r_ng732/WIEN2k/lapw1para lapw1.def   failed
>
> and with the option -xf in lapw1para I have in stdout:
>
> [5] Fatal Error: message truncated. ask 13312 got 21632 at line 454 in file
> /usr/src/packages/BUILD/mpich-1.2.6-51-pgi623/mpid/rai/raifma.c
> Process 0 lost connection: exiting
> Process 1 lost connection: exiting
> Process 2 lost connection: exiting
> Process 3 lost connection: exiting
> Process 4 lost connection: exiting
> Process 6 lost connection: exiting
> Process 7 lost connection: exiting
>
> If I run the same case but with 1 k point in IBZ, the SCF cycle
> end without problems... Therefore the problem happens when are
> used more of 1 k point in IBZ.
>
> Thanks in advance for your help and comments...
>
> Roberto Nuñez-Gonzalez
> Departamento de Matematicas
> Universidad de Sonora
> Mexico
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>



-- 
Laurence Marks
Department of Materials Science and Engineering
MSE Rm 2036 Cook Hall
2220 N Campus Drive
Northwestern University
Evanston, IL 60208, USA
Tel: (847) 491-3996 Fax: (847) 491-7820
email: L-marks at northwestern dot edu
Web: www.numis.northwestern.edu
Chair, Commission on Electron Crystallography of IUCR
www.numis.northwestern.edu/
Electron crystallography is the branch of science that uses electron
scattering to study the structure of matter.


More information about the Wien mailing list