[Wien] Segfault in lapw1_mpi (SL_INIT)

mbraga mbraga at fe.up.pt
Tue Jul 3 15:17:32 CEST 2012


On 03.07.2012 14:10, Laurence Marks wrote:
> This is an issue with your openmpi, either a simple one or a nasty
> one. Suggestions:
>
> a. Check that you are using libmkl_blacs_openmpi_lp64 or similar, the
> "blacs_openmpi" is what matters. This is probably the reaons and just
> changing this will fix everything.
>
> b. Run "ompi_info" which is in the openmpi directory and look for
> compatibility issues.
>
> c. Recompile openmpi, and I suggest using 1.4.4. Unfortunately there
> are some bugs in the 1.3.X versions of openmpi and I never got them 
> to
> work, but I did get 1.4.4 to work.
>
> On Tue, Jul 3, 2012 at 3:25 AM, Elias Assmann
> <elias.assmann at gmail.com> wrote:
>> Hello,
>>
>> When I execute lapw1_mpi, it dies on me immediately:
>>
>>         $ ./lapw1_mpi
>>         w2k_dispatch_signal(): received: Segmentation fault
>>          Child id           0 SIGSEGV, contact developers
>>         
>> --------------------------------------------------------------------------
>>         MPI_ABORT was invoked on rank 0 in communicator 
>> MPI_COMM_WORLD
>>         with errorcode 6.
>>
>>         NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI 
>> processes.
>>         You may or may not see output from other processes, 
>> depending on
>>         exactly when Open MPI kills them.
>>         
>> --------------------------------------------------------------------------
>>
>> It turns out that the offending line is the first call to SL_INIT in
>> INIT_PARALLEL (SRC_lapw1/modules.F),
>>
>>                 SUBROUTINE INIT_PARALLEL
>>                   IMPLICIT NONE
>>         #ifdef Parallel
>>                   include 'mpif.h'
>>                   INTEGER :: IERR,i,j
>>                   call MPI_INIT(IERR)
>>                   call MPI_COMM_SIZE( MPI_COMM_WORLD, NPE, IERR)
>>                   call MPI_COMM_RANK( MPI_COMM_WORLD, MYID, IERR)
>>                   CALL BARRIER
>> ->                CALL SL_INIT(ICTXTALL, 1, NPE)
>>
>> which is called eventually via GTFNAM at the top of the main program
>> LAPW1.
>>
>> I used ifort version 11.1 (specifically, I tried two revisions: 046
>> and 072) and the corresponding MKL libraries (including ScaLAPACK).
>> The MPI version is openmpi-1.3.2-icc, in case that matters.  Neither
>> lapw0_mpi nor lapw2_mpi have this problem (then again, they do not
>> seem to use SL_INIT).
>>
>> Any pointers how I should proceed?
>>
>> Thanks,
>>
>>         Elias
>>
>> _______________________________________________
>> Wien mailing list
>> Wien at zeus.theochem.tuwien.ac.at
>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien

-- 
Helena Braga
Engineering Physics Department
Engineering Faculty, Universidade do Porto
R. Dr. Roberto Frias, s/n
4200-465 Porto
Portugal
phone: +351 225081869
email: mbraga at fe.up.pt
URL 1: http://paginas.fe.up.pt/~mbraga/
URL 2: 
https://sigarra.up.pt/feup/funcionarios_geral.formview?p_codigo=320005
Our book chapter: 
http://www.intechopen.com/books/neutron-diffraction/hydrides-of-cu-and-mg-intermetallic-systems-characterization-catalytic-function-and-applications


More information about the Wien mailing list