[Wien] MPI run problem
alonofrio at comcast.net
alonofrio at comcast.net
Sat May 11 23:18:17 CEST 2013
Hello again,
I commented the line call W2kinit, and now I have a more descriptive message but I am still lost about it. Not sure if it's that its not finding some libraries or if is that the environments variables are not being propagated to all nodes.
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
libmpi.so.1 00002B746414FF7A Unknown Unknown Unknown
lapw1c_mpi 00000000004E9192 Unknown Unknown Unknown
libmkl_scalapack_ 00002B746330B231 Unknown Unknown Unknown
Any ideas? I'm so sorry for all the questions
David Guzman
On May 11, 2013, at 4:46 PM, Laurence Marks <L-marks at northwestern.edu> wrote:
The addition of the signal trapping in Wien2k (W2kinit in lapw[0-2].F
and others) has a plus, and a minus. The pluses are that the weekly
emails on the list about ulimit associated crashes, and also (perhaps
not so obvious) that mpi tasks die more gracefully. Unfortunately it
also can make knowing what is wrong with an mpi job less than clear.
I suggest (and others should do the same as needed) that you comment
out the "call W2kinit" in lapw1, recompile just lapw1 then try again
-- hopefully you will get a more human understandable message.
IMPORTANT: check hansen-b004 to ensure that you do not have any zombie
processes still running; depending upon what version of ssh you are
running you may have them hanging around.
On Sat, May 11, 2013 at 3:31 PM, alonofrio at comcast.net
<alonofrio at comcast.net> wrote:
<blockquote>
Thanks Professor Marks,
I corrected my compiler option, I had a mistake with the openmpi blacs
library. However I still get errors when trying to run. It always stops when
it starts lapw1.
Now is giving me this error
w2k_dispatch_signal(): received: Segmentation fault
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 3 in communicator MPI_COMM_WORLD
with errorcode 0.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
and when I look at the case.dayfile I see this:
hansen-b004 hansen-b004 hansen-b004 hansen-b004(5) Child id 0
SIGSEGV, contact developers
Child id 1 SIGSEGV, contact developers
Child id 2 SIGSEGV, contact developers
Child id 3 SIGSEGV, contact developers
0.085u 0.365s 0:01.49 29.5% 0+0k 0+0io 57pf+0w
Thanks for your help. Any comments are well appreciated.
<blockquote>
Alex Onofrio
Departamento de Fisica
Univesidad de Los Andes
Bogota, Colombia
On May 11, 2013, at 1:43 PM, Laurence Marks <L-marks at northwestern.edu>
wrote:
You need to use the openmpi blacs. Please check the intel compilation
assistance webpage (previously posted, so check the list).
On May 11, 2013 12:10 PM, "alonofrio at comcast.net" <alonofrio at comcast.net>
wrote:
<blockquote>
Wien2k User
I am trying to get the MPI capabilities of Wien running, but I got into
some complication.
The whole compilation process goes fine with no errors, but when I try to
run the code through run_lapw it stops at the begining of the lapw1 program
with the following error:
w2k_dispatch_signal(): received: Segmentation fault
MPI_ABORT was invoked on rank 7 in communicator MPI_COMM_WORLD
with errorcode 28607820.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun has exited due to process rank 7 with PID 60685 on
node carter-a355.rcac.purdue.edu exiting without calling "finalize". This
may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
This repeats the same number of times as the number of processors
submitted as mpi jobs.
Here are my complilation options as shown in the OPTIONS file:
current:FOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback
current:FPOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback
current:LDFLAGS:$(FOPT) -L$(MKLROOT)/lib/em64t -pthread
current:DPARALLEL:'-DParallel'
current:R_LIBS:-lmkl_blas95_lp64 -lmkl_lapack95_lp64
$(MKLROOT)/lib/em64t/libmkl
_scalapack_lp64.a -Wl,--start-group
$(MKLROOT)/lib/em64t/libmkl_cdft_core.a $(MK
LROOT)/lib/em64t/libmkl_intel_lp64.a
$(MKLROOT)/lib/em64t/libmkl_intel_thread.a
$(MKLROOT)/lib/em64t/libmkl_core.a
$(MKLROOT)/lib/em64t/libmkl_blacs_intelmpi_lp
64.a -Wl,--end-group -openmp -lpthread
current:RP_LIBS:-lmkl_scalapack_lp64 -lmkl_solver_lp64 -lmkl_blacs_lp64
-L/apps/
rhel6/fftw-3.3.1/openmpi-1.4.4_intel-12.0.084/lib -lfftw3_mpi -lfftw3
$(R_LIBS)
current:MPIRUN:mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_
and these are the options in the parallel_options file:
setenv USE_REMOTE 1
setenv MPI_REMOTE 0
setenv WIEN_GRANULARITY 1
setenv WIEN_MPIRUN "mpirun -x LD_LIBRARY_PATH -x PATH -np _NP_
-hostfile _HOSTS_ _EXEC_"
I compiled the code with intel 12.0.084, openmpi 1.4.4 (compiled with
intel 12.0.084) and fftw 3.3.1 (compiled with intel 12.0.084 and openmpi
1.4.4.
I am trying to run the code in the university cluster which has infiniband
and intel xeon-E5.
I hope this information is enough for any of you to point me to the
problem.
Thanks so much for your time
Alex Onofrio
Departamento de Fisica
Univesidad de Los Andes
Bogota, Colombia
</blockquote>
_______________________________________________
Wien mailing list
Wien at zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
</blockquote>
--
Professor Laurence Marks
Department of Materials Science and Engineering
Northwestern University
www.numis.northwestern.edu 1-847-491-3996
"Research is to see what everybody else has seen, and to think what
nobody else has thought"
Albert Szent-Gyorgi
_______________________________________________
Wien mailing list
Wien at zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
</blockquote>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20130511/76b4d824/attachment.htm>
More information about the Wien
mailing list