[Wien] lapw0_mpi (MPI_Bcast) bug

Pawel Lesniak lesniak at ifmpan.poznan.pl
Wed Feb 3 09:01:57 CET 2010


W dniu 02.02.2010 20:21, Laurence Marks pisze:
> There is a bug in some mpi implementations that can lead to a SIGSEV
> in lapw0_mpi for large problems. The symptom is a SIGSEV at the
> MPI_Bcast call
> at about line 1574:
> #ifdef Parallel
>   if (.not.coul)  allocate(potk(nkk))
>   call MPI_Bcast(potk, NKK, MPI_DOUBLE_COMPLEX, 0,MPI_COMM_WORLD , ierr)
> #endif
>
> A patch (works) is to not do the MPI_Bcase all at once:
>
> #ifdef Parallel
>        if (.not.coul) allocate(potk(nkk))
>        ibbuff=32768  ! Could be optimized
>        i2=NKK/ibbuff
>        it1=1
>        do i=1,i2
>          call MPI_Bcast(potk(it1), 1024, MPI_DOUBLE_COMPLEX ,
> 0,MPI_COMM_WORLD , ierr)
>          it1=it1+ibbuff
>        enddo
>        if(it1 .lt. nkk)then
>          it2=nkk-it1+1
>          call MPI_Bcast(potk(it1), it2, MPI_DOUBLE_COMPLEX ,
> 0,MPI_COMM_WORLD , ierr)
>        endif
>        call MPI_BARRIER(MPI_COMM_WORLD,ierr)
> #endif
>
>
>    
Are you sure that
call MPI_Bcast(potk(it1), 1024, MPI_DOUBLE_COMPLEX ,
0,MPI_COMM_WORLD , ierr)
is correct?
Why length of 1024 when you split array by ibbuff ?
I believe correct version is:
call MPI_Bcast(potk(it1), ibbuff, MPI_DOUBLE_COMPLEX ,
0,MPI_COMM_WORLD , ierr)


Pawel Lesniak



More information about the Wien mailing list