[Wien] Problems with LAPW1 running in PARALLEL

Roberto Nunez Gonzalez ronunez at gauss.mat.uson.mx
Tue Feb 10 17:07:39 CET 2009


Thanks for your comments, Laurence...

1) About mpi and threads, I have problems with threads in
GOTO libs without using MPI, and therefor I define
GOTO_NUM_THREADS=1

2) I agree with your comments in this point... But from
the computational point of view, and considering the case
of In2O3 with a matrix size of ~6400, the hybrid parallelized
method should work. I think there is a problem or in
MPI libs or in wien2k parallel scripts.

I test another situation. I considered 3 k points using
the hybrid method in 3 nodes, each one with 4 cpus.
The calculation in each node (for each k point) is
realized with MPI programs using 4 cpus. Again, the SCF
cycle stop with error in lapw1, but apparently two of
the jobs end without problems, and the third end with
error.

And finally, the only choice I have in this computer
system is MPICH...

Thanks again for your comments...

Roberto Nuñez-Gonzalez
Departamento de Matematicas
Universidad de Sonora
Mexico

>L. Marks wrote:

>Two comments:
>1) Don't use threads and mpi. They fight with each other and lead to
>chaos in many cases.
>2) Don't use mpi with many nodes for small problems in lapw1. You
>should only be using mpi extensively for large problems. The k-point
>parallelization is much more efficient than trying to run one task
>with mpi and many nodes. The reason is that in the second case you end
>up wasting a lot of time on communications.
>
>For 61 k-points and 8 cpu's I would split this with 7,7,7,8,8,8,8,8
>k-points, although depending upon the cell size it may in fact be
>faster to use less than 8 cpus.
>
>3) I would consider mvapich -- not all implimentations of mpi are the same.
>
>

----- Mensaje original -----
De: <ronunez at gauss.mat.uson.mx>
Fecha: Domingo, Febrero 8, 2009 11:28 am
Asunto: Problems with LAPW1 running in PARALLEL
A: wien at zeus.theochem.tuwien.ac.at,

> 
> I have the next problem using WIEN2k 8.3 with MPI...
> 
> I have compiled the wien2k 8.3 in a Cray XD1 (Opteron cpus),
> with pgf90 compiler (v 6.23), ACML libs, GOTO libs 1.26, and MPICH 
> 1.26,without errors...
> 
> I run the mpi-benchmark without problems, using 8 cpus and 1 thread 
> in the
> GOTO libs, because with 2 thread the GOTO libs freeze sometimes...
> 
> Next, I try to run an SCF cycle of In2O3, using 61
> k point in IBZ and 8 cpus. The mpi version of lapw0 run without
> problems, but the mpi lapw1 begin without problems but past some time
> crash!!!!
> 
> I found the next message in In2O3.dayfile:
> 
> **  LAPW1 crashed!
> 0.288u 0.596s 2:56.59 0.4%      0+0k 0+0io 3pf+0w
> error: command   /mnt/san/home/uson/r_ng732/WIEN2k/lapw1para 
> lapw1.def   failed
> 
> and with the option -xf in lapw1para I have in stdout:
> 
> [5] Fatal Error: message truncated. ask 13312 got 21632 at line 454 
> in file
> /usr/src/packages/BUILD/mpich-1.2.6-51-pgi623/mpid/rai/raifma.c
> Process 0 lost connection: exiting
> Process 1 lost connection: exiting
> Process 2 lost connection: exiting
> Process 3 lost connection: exiting
> Process 4 lost connection: exiting
> Process 6 lost connection: exiting
> Process 7 lost connection: exiting
> 
> If I run the same case but with 1 k point in IBZ, the SCF cycle
> end without problems... Therefore the problem happens when are
> used more of 1 k point in IBZ.
> 
> Thanks in advance for your help and comments...
> 
> Roberto Nuñez-Gonzalez
> Departamento de Matematicas
> Universidad de Sonora
> Mexico
> 


More information about the Wien mailing list