[Wien] Problems with mpi for Wien12.1
Paul Fons
paul-fons at aist.go.jp
Fri Aug 24 15:22:37 CEST 2012
Dear Prof. Blaha,
Thank you for your earlier email. Running the command manually gives the following output (for a GaAs structure that works fine in serial or k-point parallel form). I am still not sure what to try next. Any suggestions?
matstud at ursa:~/WienDisk/Fons/GaAs> mpirun -np 4 ${WIENROOT}/lapw0_mpi lapw0.def
w2k_dispatch_signal(): received: Segmentation fault
w2k_dispatch_signal(): received: Segmentation fault
w2k_dispatch_signal(): received: Segmentation fault
w2k_dispatch_signal(): received: Segmentation fault
Child id 0 SIGSEGV, contact developers
Child id 1 SIGSEGV, contact developers
Child id 3 SIGSEGV, contact developers
Child id 2 SIGSEGV, contact developers
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 3
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 2
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 1
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1)
The MPI compilation options from siteconfig are as follows: (the settings are from the Intel MKL link advisor plus the fftw3 library)
Current settings:
RP RP_LIB(SCALAPACK+PBLAS): -L$(MKLROOT)/lib/intel64 $(MKLROOT)/lib/intel64/libmkl_blas95_lp64.a $(MKLROOT)/lib/intel64/libmkl_lapack95_lp64.a -lmkl_scalapack_lp64 -lmkl_cdft_core -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_lp64 -openmp -lpthread -lm -L/opt/local/fftw3/lib/ -lfftw3_mpi -lfftw3 $(R_LIBS)
FP FPOPT(par.comp.options): -I$(MKLROOT)/include/intel64/lp64 -I$(MKLROOT)/include -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -DFFTW3 -traceback
MP MPIRUN commando : mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_
The file parallel_options now reads
setenv USE_REMOTE 1
setenv MPI_REMOTE 0
setenv WIEN_GRANULARITY 1
setenv WIEN_MPIRUN "mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_"
I changed the MPI_REMOTE to 0 as suggested (I was not sure this applied to the Intel MPI environment as the siteconfig prompt only mentioned mich2.
As I mentioned the mpirun command seems to work fine. For example, the fftw3 benchmark program gives with 24 processes
mpirun -np 24 ./mpi-bench 1024x1024
Problem: 1024x1024, setup: 126.32 ms, time: 15.98 ms, ``mflops'': 6562.2
On Aug 24, 2012, at 3:05 PM, Peter Blaha wrote:
> Hard to say.
>
> What is in $WIENROOT/parallel_options ?
> MPI_REMOTE should be 0 !
>
> Otherwise run lapw0_mpi by "hand":
>
> mpirun -np 4 $WIENROOT/lapw0_mpi lapw0.def (or including .machinefile .machine0)
>
>
> Am 24.08.2012 02:24, schrieb Paul Fons:
>> Greetings all,
>> I have compiled Wien2K 12.1 under OpenSuse 11.4 (and OpenSuse 12.1)
>> and the latest Intel compilers with identical mpi launch problems and I
>> am hoping for some suggestions as to where to look to fix things. Note
>> that the serial and k-point parallel versions of the code run fine (I
>> have optimized GaAs a lot in my troubleshooting!).
>>
>> Environment.
>>
>> I am using the latest intel fort, icc, and impi libraries for linux.
>>
>> matstud at pyxis:~/Wien2K> ifort --version
>> ifort (IFORT) 12.1.5 20120612
>> Copyright (C) 1985-2012 Intel Corporation. All rights reserved.
>>
>> matstud at pyxis:~/Wien2K> mpirun --version
>> Intel(R) MPI Library for Linux* OS, Version 4.0 Update 3 Build 20110824
>> Copyright (C) 2003-2011, Intel Corporation. All rights reserved.
>>
>> matstud at pyxis:~/Wien2K> icc --version
>> icc (ICC) 12.1.5 20120612
>> Copyright (C) 1985-2012 Intel Corporation. All rights reserved.
>>
>>
>> My OPTIONS files from /siteconfig_lapw
>>
>> current:FOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback
>> current:FPOPT:-I$(MKLROOT)/include/intel64/lp64 -I$(MKLROOT)/include -FR
>> -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -DFFTW3 -traceback
>> current:LDFLAGS:$(FOPT) -L$(MKLROOT)/lib/$(MKL_TARGET_ARCH) -pthread
>> current:DPARALLEL:'-DParallel'
>> current:R_LIBS:-lmkl_lapack95_lp64 -lmkl_intel_lp64 -lmkl_intel_thread
>> -lmkl_core -openmp -lpthread
>> current:RP_LIBS:-L$(MKLROOT)/lib/intel64
>> $(MKLROOT)/lib/intel64/libmkl_blas95_lp64.a
>> $(MKLROOT)/lib/intel64/libmkl_lapack95_lp64.a -lmkl_scalapack_lp64
>> -lmkl_cdft_core -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core
>> -lmkl_blacs_intelmpi_lp64 -openmp -lpthread -lm -L/opt/local/fftw3/lib/
>> -lfftw3_mpi -lfftw3 $(R_LIBS)
>> current:MPIRUN:mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_
>>
>>
>>
>>
>> The code compiles and links without error. It runs fine in serial mode
>> and in k-point parallel mode, e.g.
>>
>> .machines with
>>
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> granularity:1
>> extrafine:1
>>
>> This runs fine. When I attempt to run a mpi process with 12 processes
>> (on a 12 core machine), I crash and burn (see below) with a SIGSEV error
>> with instructions to contact the developers.
>>
>> The linking options were derived from Intel's mkl link advisor (the
>> version on the intel site. I should add that the mpi-bench in fftw3
>> works fine using the intel mpi as do commands like hostname or even
>> abinit so it would appear that that the Intel MPI environment itself is
>> fine. I have wasted a lot of time trying to figure out how to fix this
>> before writing to the list, but at this point, I feel like a monkey at a
>> keyboard attempting to duplicate Shakesphere -- if you know what I mean.
>> Thanks in advance for any heads up that you can offer.
>>
>>
>>
>> .machines
>>
>> lapw0:localhost:12
>> 1:localhost:12
>> granularity:1
>> extrafine:1
>>
>>> stop error
>>
>> error: command /home/matstud/Wien2K/lapw0para -c lapw0.def failed
>> 0.029u 0.046s 0:00.93 6.4% 0+0k 0+176io 0pf+0w
>> Child id 2 SIGSEGV, contact developers
>> Child id 8 SIGSEGV, contact developers
>> Child id 7 SIGSEGV, contact developers
>> Child id 11 SIGSEGV, contact developers
>> Child id 10 SIGSEGV, contact developers
>> Child id 9 SIGSEGV, contact developers
>> Child id 6 SIGSEGV, contact developers
>> Child id 5 SIGSEGV, contact developers
>> Child id 4 SIGSEGV, contact developers
>> Child id 3 SIGSEGV, contact developers
>> Child id 1 SIGSEGV, contact developers
>> Child id 0 SIGSEGV, contact developers
>> -------- .machine0 : 12 processors
>>> lapw0 -p (09:04:45) starting parallel lapw0 at Fri Aug 24 09:04:45 JST 2012
>>
>> cycle 1 (Fri Aug 24 09:04:45 JST 2012) (40/99 to go)
>>
>> start (Fri Aug 24 09:04:45 JST 2012) with lapw0 (40/99 to go)
>>
>>
>> using WIEN2k_12.1 (Release 22/7/2012) in /home/matstud/Wien2K
>> on pyxis with PID 15375
>> Calculating GaAs in /usr/local/share/Wien2K/Fons/GaAs
>>
>>
>>
>>
>> _______________________________________________
>> Wien mailing list
>> Wien at zeus.theochem.tuwien.ac.at
>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>>
>
> --
> Peter Blaha
> Inst.Materials Chemistry
> TU Vienna
> Getreidemarkt 9
> A-1060 Vienna
> Austria
> +43-1-5880115671
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
Dr. Paul Fons
Senior Research Scientist
Functional Nano-phase-change Research Team
Nanoelectronics Research Institute
National Institute for Advanced Industrial Science & Technology
METI
AIST Central 4, Higashi 1-1-1
Tsukuba, Ibaraki JAPAN 305-8568
tel. +81-298-61-5636
fax. +81-298-61-2939
email: paul-fons at aist.go.jp
The following lines are in a Japanese font
〒305-8562 茨城県つくば市つくば中央東 1-1-1
産業技術総合研究所
ナノエレクトロニクス研究部門
相変化新規機能デバイス研究チーム
主任研究員
ポール・フォンス
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20120824/0607a4a0/attachment.htm>
More information about the Wien
mailing list