[Wien] Lapw1 mpi run problem ( gfortran+openmpi)
Deyerling, André
andre.deyerling at tum.de
Fri Apr 24 11:05:04 CEST 2020
Dear WIEN2k users,
I run into the following problem when running WIEN2k in parallel with mpi. WIEN2k Version is 19.1, the patches provided by Gavin Abo are installed. Elpa/FFTW3/Scalapack are used and compiled with gcc/gfortran mpicc/mpif90. The Compilation of
WIEN2k shows no errors.
K-Point parallelization works fine, WIEN2k is installed on a NFS share on a small selfbuild cluster (right now only 4 nodes but will be more if everything runs).
The Problem looks like a problem with openmpi, however simple exemplary mpif90 programs work fine when run in parallel. Something goes wrong with lapw1para.
----------------------------------------------------------------------------------------------------------------------------------
run_lapw -p
STOP LAPW0 END
[1] Done /usr/lib64/openmpi/bin/mpirun -x LD_LIBRARY_PATH -x PATH -np 2 -machinefile .machine0 /home/mpiuser/WIEN2k-19.1/lapw0_mpi lapw0.def >> .time00
[node0:1423512:0:1423512] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
==== backtrace ====
[node0:1423513:0:1423513] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
==== backtrace ====
0 /usr/lib64/libucs.so.0(+0x1b25f) [0x1462b91ad25f]
1 /usr/lib64/libucs.so.0(+0x1b42a) [0x1462b91ad42a]
2 /home/mpiuser/WIEN2k-19.1/lapw1_mpi() [0x4482df]
3 /home/mpiuser/WIEN2k-19.1/lapw1_mpi() [0x40d1c5]
4 /home/mpiuser/WIEN2k-19.1/lapw1_mpi() [0x42dd6e]
5 /home/mpiuser/WIEN2k-19.1/lapw1_mpi() [0x404ded]
6 /usr/lib64/libc.so.6(__libc_start_main+0xf3) [0x1462ba7bb1a3]
7 /home/mpiuser/WIEN2k-19.1/lapw1_mpi() [0x404e1e]
===================
0 /usr/lib64/libucs.so.0(+0x1b25f) [0x14b734f3725f]
1 /usr/lib64/libucs.so.0(+0x1b42a) [0x14b734f3742a]
2 /home/mpiuser/WIEN2k-19.1/lapw1_mpi() [0x4482df]
3 /home/mpiuser/WIEN2k-19.1/lapw1_mpi() [0x40d1c5]
4 /home/mpiuser/WIEN2k-19.1/lapw1_mpi() [0x42dd6e]
5 /home/mpiuser/WIEN2k-19.1/lapw1_mpi() [0x404ded]
6 /usr/lib64/libc.so.6(__libc_start_main+0xf3) [0x14b7365451a3]
7 /home/mpiuser/WIEN2k-19.1/lapw1_mpi() [0x404e1e]
===================
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
mpirun noticed that process rank 1 with PID 0 on node node0 exited on signal 11 (Segmentation fault).
[1] + Done ( cd $PWD; $t $ttt; rm -f .lock_$lockfile[$p] ) >> .time1_$loop
--------------------------------------------------------------------------------------------------------------------------------
Dayfile of the case:
Calculating Testsession in /home/mpiuser/WIEN2k/Testsession
on node0 with PID 1423240
using WIEN2k_19.1 (Release 25/6/2019) in /home/mpiuser/WIEN2k-19.1
start (Mon 20 Apr 2020 01:52:09 PM CEST) with lapw0 (40/99 to go)
cycle 1 (Mon 20 Apr 2020 01:52:09 PM CEST) (40/99 to go)
> lapw0 -p (13:52:09) starting parallel lapw0 at Mon 20 Apr 2020 01:52:09 PM CEST
-------- .machine0 : 2 processors
1.028u 0.157s 0:02.41 48.5% 0+0k 0+496io 0pf+0w
> lapw1 -p (13:52:11) starting parallel lapw1 at Mon 20 Apr 2020 01:52:11 PM CEST
-> starting parallel LAPW1 jobs at Mon 20 Apr 2020 01:52:11 PM CEST
running LAPW1 in parallel mode (using .machines)
1 number_of_parallel_jobs
node0 node1(72) 0.100u 0.089s 0:01.03 17.4% 0+0k 0+8io 0pf+0w
Summary of lapw1para:
node0 k=0 user=72 wallclock=5.34
** LAPW1 crashed!
0.178u 0.148s 0:02.21 14.0% 0+0k 0+136io 0pf+0w
error: command /home/mpiuser/WIEN2k-19.1/lapw1para lapw1.def failed
> stop error
Parallel_Options:
setenv TASKSET "no"
if ( ! $?USE_REMOTE ) setenv USE_REMOTE 1
if ( ! $?MPI_REMOTE ) setenv MPI_REMOTE 0
setenv WIEN_GRANULARITY 1
setenv DELAY 0.1
setenv SLEEPY 1
setenv WIEN_MPIRUN "/usr/lib64/openmpi/bin/mpirun -x LD_LIBRARY_PATH -x PATH -np _NP_ -machinefile _HOSTS_ _EXEC_"
setenv CORES_PER_NODE 1
.machines file:
1:node0:1 node1:1
lapw0:node0:1 node1:1
granularity:1
Help would be greatly appreciated.
Best Regards
André Deyerling
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20200424/0f44eac0/attachment.html>
More information about the Wien
mailing list