[Wien] Segfault in lapw1_mpi (SL_INIT)

Laurence Marks L-marks at northwestern.edu
Tue Jul 3 15:10:26 CEST 2012


This is an issue with your openmpi, either a simple one or a nasty
one. Suggestions:

a. Check that you are using libmkl_blacs_openmpi_lp64 or similar, the
"blacs_openmpi" is what matters. This is probably the reaons and just
changing this will fix everything.

b. Run "ompi_info" which is in the openmpi directory and look for
compatibility issues.

c. Recompile openmpi, and I suggest using 1.4.4. Unfortunately there
are some bugs in the 1.3.X versions of openmpi and I never got them to
work, but I did get 1.4.4 to work.

On Tue, Jul 3, 2012 at 3:25 AM, Elias Assmann <elias.assmann at gmail.com> wrote:
> Hello,
>
> When I execute lapw1_mpi, it dies on me immediately:
>
>         $ ./lapw1_mpi
>         w2k_dispatch_signal(): received: Segmentation fault
>          Child id           0 SIGSEGV, contact developers
>         --------------------------------------------------------------------------
>         MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
>         with errorcode 6.
>
>         NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
>         You may or may not see output from other processes, depending on
>         exactly when Open MPI kills them.
>         --------------------------------------------------------------------------
>
> It turns out that the offending line is the first call to SL_INIT in
> INIT_PARALLEL (SRC_lapw1/modules.F),
>
>                 SUBROUTINE INIT_PARALLEL
>                   IMPLICIT NONE
>         #ifdef Parallel
>                   include 'mpif.h'
>                   INTEGER :: IERR,i,j
>                   call MPI_INIT(IERR)
>                   call MPI_COMM_SIZE( MPI_COMM_WORLD, NPE, IERR)
>                   call MPI_COMM_RANK( MPI_COMM_WORLD, MYID, IERR)
>                   CALL BARRIER
> ->                CALL SL_INIT(ICTXTALL, 1, NPE)
>
> which is called eventually via GTFNAM at the top of the main program
> LAPW1.
>
> I used ifort version 11.1 (specifically, I tried two revisions: 046
> and 072) and the corresponding MKL libraries (including ScaLAPACK).
> The MPI version is openmpi-1.3.2-icc, in case that matters.  Neither
> lapw0_mpi nor lapw2_mpi have this problem (then again, they do not
> seem to use SL_INIT).
>
> Any pointers how I should proceed?
>
> Thanks,
>
>         Elias
>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien



-- 
Professor Laurence Marks
Department of Materials Science and Engineering
Northwestern University
www.numis.northwestern.edu 1-847-491-3996
"Research is to see what everybody else has seen, and to think what
nobody else has thought"
Albert Szent-Gyorgi


More information about the Wien mailing list