[Wien] Fwd: MPI segmentation fault
Md. Fhokrul Islam
fislam at hotmail.com
Sat Jan 30 22:17:33 CET 2010
Hi Marks,
In addition to what I have sent in my previous email, I would like to mention that
if I use 8 processors instead of 4 processors, I get the segmentation error at lapw0.
Thanks,
Fhokrul
From: fislam at hotmail.com
To: wien at zeus.theochem.tuwien.ac.at
Date: Sat, 30 Jan 2010 18:51:59 +0000
Subject: Re: [Wien] Fwd: MPI segmentation fault
Hi Marks,
I have followed your suggestions and have used openmpi 1.4.1 compiled with icc.
I also have compiled fftw with cc instead of gcc and recompiled Wien2k with mpirun option
in parallel_options:
current:MPIRUN:mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_ -x LD_LIBRARY_PATH
Although I didn't get segmentation fault but the job still crashes at lapw1 with a different error
message. I have pasted case.dayfile and case.error below along with ompi_info and stacksize
info. I am not even sure where to look for the solution. Please let me know if you have any
suggestions regarding this MPI problem.
Thanks,
Fhokrul
case.dayfile:
cycle 1 (Sat Jan 30 16:49:55 CET 2010) (200/99 to go)
> lapw0 -p (16:49:55) starting parallel lapw0 at Sat Jan 30 16:49:56 CET 2010
-------- .machine0 : 4 processors
1863.235u 21.743s 8:21.32 376.0% 0+0k 0+0io 1068pf+0w
> lapw1 -c -up -p (16:58:17) starting parallel lapw1 at Sat Jan 30 16:58:18 CET 2010
-> starting parallel LAPW1 jobs at Sat Jan 30 16:58:18 CET 2010
running LAPW1 in parallel mode (using .machines)
1 number_of_parallel_jobs
mn117.mpi mn117.mpi mn117.mpi mn117.mpi(1) 1263.782u 28.214s 36:47.58 58.5% 0+0k 0+0io 49300pf+0w
** LAPW1 crashed!
1266.358u 37.286s 36:53.31 58.8% 0+0k 0+0io 49425pf+0w
error: command /disk/global/home/eishfh/Wien2k_09_2/lapw1cpara -up -c uplapw1.def failed
Error file:
LAPW0 END
LAPW0 END
LAPW0 END
LAPW0 END
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 8837 on node mn117.local exited on signal 9 (Killed).
[eishfh at milleotto
s110]$ ompi_info
Package: Open MPI
root at milleotto.local Distribution
Open MPI: 1.4.1
Prefix:
/sw/pkg/openmpi/1.4.1/intel/11.1
Configured architecture:
x86_64-unknown-linux-gnu
Configure host: milleotto.local
Configured by: root
Configured on: Sat Jan 16 19:40:36
CET 2010
Configure host: milleotto.local
Built host: milleotto.local
Fortran90 bindings
size: small
C compiler: icc
C compiler absolute:
/sw/pkg/intel/11.1.064//bin/intel64/icc
C++ compiler: icpc
C++ compiler absolute:
/sw/pkg/intel/11.1.064//bin/intel64/icpc
Fortran77 compiler: ifort
Fortran77 compiler abs:
/sw/pkg/intel/11.1.064//bin/intel64/ifort
Fortran90 compiler: ifort
Fortran90 compiler abs:
/sw/pkg/intel/11.1.064//bin/intel64/ifort
stacksize:
[eishfh at milleotto s110]$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling
priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 73728
max locked
memory (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message
queues (bytes, -q) 819200
real-time
priority (-r) 0
stack size (kbytes, -s) unlimited
cpu time (seconds, -t) unlimited
max user
processes (-u) 73728
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
>
> In essence, you have a mess and you are going to have to talk to your
> sysadmin (hikmpn) to get things sorted out. Issues:
>
> a) You have openmpi-1.3.3. This works for small problems, fails for
> large ones. This needs to be updated to 1.4.0 or 1.4.1 (the older
> versions of openmpi have bugs).
> b) The openmpi was compiled with ifort 10.1 but you are using 11.1.064
> for Wien2k -- could lead to problems.
> c) The openmpi was compiled with gcc and ifort 10.1, not icc and ifort
> which could lead to problems.
> d) The fftw library you are using was compiled with gcc not icc, this
> could lead to problems.
> e) Some of the shared libraries are in your LD_LIBRARY_PATH, you will
> need to add -x LD_LIBRARY_PATH to how mpirun is called (in
> $WIENROOT/parallel_options) -- look at man mpirun.
> f) I still don't know what the stack limits are on your machine --
> this can lead to severe problems in lapw0_mpi
Hotmail: Trusted email with Microsoft’s powerful SPAM protection. Sign up now.
_________________________________________________________________
Hotmail: Free, trusted and rich email service.
https://signup.live.com/signup.aspx?id=60969
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20100130/7dec25c2/attachment.htm>
More information about the Wien
mailing list