[Wien] Problems with mpi for Wien12.1
Paul Fons
paul-fons at aist.go.jp
Fri Aug 24 02:24:40 CEST 2012
Greetings all,
I have compiled Wien2K 12.1 under OpenSuse 11.4 (and OpenSuse 12.1) and the latest Intel compilers with identical mpi launch problems and I am hoping for some suggestions as to where to look to fix things. Note that the serial and k-point parallel versions of the code run fine (I have optimized GaAs a lot in my troubleshooting!).
Environment.
I am using the latest intel fort, icc, and impi libraries for linux.
matstud at pyxis:~/Wien2K> ifort --version
ifort (IFORT) 12.1.5 20120612
Copyright (C) 1985-2012 Intel Corporation. All rights reserved.
matstud at pyxis:~/Wien2K> mpirun --version
Intel(R) MPI Library for Linux* OS, Version 4.0 Update 3 Build 20110824
Copyright (C) 2003-2011, Intel Corporation. All rights reserved.
matstud at pyxis:~/Wien2K> icc --version
icc (ICC) 12.1.5 20120612
Copyright (C) 1985-2012 Intel Corporation. All rights reserved.
My OPTIONS files from /siteconfig_lapw
current:FOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback
current:FPOPT:-I$(MKLROOT)/include/intel64/lp64 -I$(MKLROOT)/include -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -DFFTW3 -traceback
current:LDFLAGS:$(FOPT) -L$(MKLROOT)/lib/$(MKL_TARGET_ARCH) -pthread
current:DPARALLEL:'-DParallel'
current:R_LIBS:-lmkl_lapack95_lp64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -openmp -lpthread
current:RP_LIBS:-L$(MKLROOT)/lib/intel64 $(MKLROOT)/lib/intel64/libmkl_blas95_lp64.a $(MKLROOT)/lib/intel64/libmkl_lapack95_lp64.a -lmkl_scalapack_lp64 -lmkl_cdft_core -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_lp64 -openmp -lpthread -lm -L/opt/local/fftw3/lib/ -lfftw3_mpi -lfftw3 $(R_LIBS)
current:MPIRUN:mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_
The code compiles and links without error. It runs fine in serial mode and in k-point parallel mode, e.g.
.machines with
1:localhost
1:localhost
1:localhost
granularity:1
extrafine:1
This runs fine. When I attempt to run a mpi process with 12 processes (on a 12 core machine), I crash and burn (see below) with a SIGSEV error with instructions to contact the developers.
The linking options were derived from Intel's mkl link advisor (the version on the intel site. I should add that the mpi-bench in fftw3 works fine using the intel mpi as do commands like hostname or even abinit so it would appear that that the Intel MPI environment itself is fine. I have wasted a lot of time trying to figure out how to fix this before writing to the list, but at this point, I feel like a monkey at a keyboard attempting to duplicate Shakesphere -- if you know what I mean. Thanks in advance for any heads up that you can offer.
.machines
lapw0:localhost:12
1:localhost:12
granularity:1
extrafine:1
> stop error
error: command /home/matstud/Wien2K/lapw0para -c lapw0.def failed
0.029u 0.046s 0:00.93 6.4% 0+0k 0+176io 0pf+0w
Child id 2 SIGSEGV, contact developers
Child id 8 SIGSEGV, contact developers
Child id 7 SIGSEGV, contact developers
Child id 11 SIGSEGV, contact developers
Child id 10 SIGSEGV, contact developers
Child id 9 SIGSEGV, contact developers
Child id 6 SIGSEGV, contact developers
Child id 5 SIGSEGV, contact developers
Child id 4 SIGSEGV, contact developers
Child id 3 SIGSEGV, contact developers
Child id 1 SIGSEGV, contact developers
Child id 0 SIGSEGV, contact developers
-------- .machine0 : 12 processors
> lapw0 -p (09:04:45) starting parallel lapw0 at Fri Aug 24 09:04:45 JST 2012
cycle 1 (Fri Aug 24 09:04:45 JST 2012) (40/99 to go)
start (Fri Aug 24 09:04:45 JST 2012) with lapw0 (40/99 to go)
using WIEN2k_12.1 (Release 22/7/2012) in /home/matstud/Wien2K
on pyxis with PID 15375
Calculating GaAs in /usr/local/share/Wien2K/Fons/GaAs
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20120824/1d183071/attachment.htm>
More information about the Wien
mailing list