[Wien] Fwd: MPI segmentation fault

Md. Fhokrul Islam fislam at hotmail.com
Sat Jan 30 19:51:59 CET 2010







Hi Marks,

    I have followed your suggestions and have used openmpi 1.4.1 compiled with icc.
I also have compiled fftw with cc instead of gcc and recompiled Wien2k with mpirun option
in parallel_options:

current:MPIRUN:mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_ -x LD_LIBRARY_PATH
 
Although I didn't get segmentation fault but the job still crashes at lapw1 with a different error 
message. I have pasted case.dayfile and case.error below along with ompi_info and stacksize
info. I am not even sure where to look for the solution. Please let me know if you have any
suggestions regarding this MPI problem.

Thanks,
Fhokrul 

case.dayfile:

    cycle 1     (Sat Jan 30 16:49:55 CET 2010)  (200/99 to go)

>   lapw0 -p    (16:49:55) starting parallel lapw0 at Sat Jan 30 16:49:56 CET 2010
-------- .machine0 : 4 processors
1863.235u 21.743s 8:21.32 376.0%        0+0k 0+0io 1068pf+0w
>   lapw1  -c -up -p    (16:58:17) starting parallel lapw1 at Sat Jan 30 16:58:18 CET 2010
->  starting parallel LAPW1 jobs at Sat Jan 30 16:58:18 CET 2010
running LAPW1 in parallel mode (using .machines)
1 number_of_parallel_jobs
     mn117.mpi mn117.mpi mn117.mpi mn117.mpi(1) 1263.782u 28.214s 36:47.58 58.5%        0+0k 0+0io 49300pf+0w
**  LAPW1 crashed!
1266.358u 37.286s 36:53.31 58.8%        0+0k 0+0io 49425pf+0w
error: command   /disk/global/home/eishfh/Wien2k_09_2/lapw1cpara -up -c uplapw1.def   failed

Error file:

 LAPW0 END
 LAPW0 END
 LAPW0 END
 LAPW0 END
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 8837 on node mn117.local exited on signal 9 (Killed).


[eishfh at milleotto
s110]$ ompi_info

                 Package: Open MPI
root at milleotto.local Distribution

                Open MPI: 1.4.1

                  Prefix:
/sw/pkg/openmpi/1.4.1/intel/11.1

 Configured architecture:
x86_64-unknown-linux-gnu

          Configure host: milleotto.local

           Configured by: root

           Configured on: Sat Jan 16 19:40:36
CET 2010

          Configure host: milleotto.local

              Built host: milleotto.local

Fortran90 bindings
size: small

              C compiler: icc

     C compiler absolute:
/sw/pkg/intel/11.1.064//bin/intel64/icc

            C++ compiler: icpc

   C++ compiler absolute:
/sw/pkg/intel/11.1.064//bin/intel64/icpc

      Fortran77 compiler: ifort

  Fortran77 compiler abs:
/sw/pkg/intel/11.1.064//bin/intel64/ifort

      Fortran90 compiler: ifort

  Fortran90 compiler abs:
/sw/pkg/intel/11.1.064//bin/intel64/ifort


stacksize:



 [eishfh at milleotto s110]$ ulimit -a

core file size          (blocks, -c) 0

data seg size           (kbytes, -d) unlimited

scheduling
priority             (-e) 0

file size               (blocks, -f) unlimited

pending signals                 (-i) 73728

max locked
memory       (kbytes, -l) 32

max memory size         (kbytes, -m) unlimited

open files                      (-n) 1024

pipe size            (512 bytes, -p) 8

POSIX message
queues     (bytes, -q) 819200

real-time
priority              (-r) 0

stack size              (kbytes, -s) unlimited

cpu time               (seconds, -t) unlimited

max user
processes              (-u) 73728

virtual memory          (kbytes, -v) unlimited

file locks                      (-x) unlimited





> 
> In essence, you have a mess and you are going to have to talk to your
> sysadmin (hikmpn) to get things sorted out. Issues:
> 
> a) You have openmpi-1.3.3. This works for small problems, fails for
> large ones. This needs to be updated to 1.4.0 or 1.4.1 (the older
> versions of openmpi have bugs).
> b) The openmpi was compiled with ifort 10.1 but you are using 11.1.064
> for Wien2k -- could lead to problems.
> c) The openmpi was compiled with gcc and ifort 10.1, not icc and ifort
> which could lead to problems.
> d) The fftw library you are using was compiled with gcc not icc, this
> could lead to problems.
> e) Some of the shared libraries are in your LD_LIBRARY_PATH, you will
> need to add -x LD_LIBRARY_PATH to how mpirun is called (in
> $WIENROOT/parallel_options) -- look at man mpirun.
> f) I still don't know what the stack limits are on your machine --
> this can lead to severe problems in lapw0_mpi

 		 	   		  
_________________________________________________________________
Hotmail: Trusted email with Microsoft’s powerful SPAM protection.
https://signup.live.com/signup.aspx?id=60969
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20100130/391efe4b/attachment.htm>


More information about the Wien mailing list