[Wien] Error in mpi+k point parallelization across multiple nodes
lung Fermin
ferminlung at gmail.com
Mon May 4 08:03:51 CEST 2015
I have checked that case.vsp/vns are up-to-date. I guess lawp0_mpi runs
properly.
I compiled the source codes with ifort and please find the following for
the linking options:
current:FOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback
current:FPOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML
-Dmkl_scalapack -traceback
current:FFTW_OPT:-DFFTW3 -I/usr/local/include
current:FFTW_LIBS:-lfftw3_mpi -lfftw3 -L/usr/local/lib
current:LDFLAGS:$(FOPT) -L/opt/intel/Compiler/11.1/046/mkl/lib/em64t
-pthread
current:DPARALLEL:'-DParallel'
current:R_LIBS:-lmkl_lapack -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core
-openmp -lpthread -lguide
current:RP_LIBS:-lmkl_scalapack_lp64 -lmkl_solver_lp64
-lmkl_blacs_intelmpi_lp64 $(R_LIBS)
current:MPIRUN:/usr/local/mvapich2-icc/bin/mpirun -np _NP_ -hostfile
_HOSTS_ _EXEC_
current:MKL_TARGET_ARCH:intel64
Is it ok to use -lmkl_blacs_intelmpi_lp64?
Thanks a lot for all the suggestions.
Regards,
Fermin
-----Original Message-----
From: wien-bounces at zeus.theochem.tuwien.ac.at [mailto:
wien-bounces at zeus.theochem.tuwien.ac.at] On Behalf Of Peter Blaha
To: A Mailing list for WIEN2k users
Subject: Re: [Wien] Error in mpi+k point parallelization across multiple
nodes
It seems as if lapw0_mpi runs properly ?? Please check if you have NEW
(check date with ls -als)!! valid case.vsp/vns files, which can be used in
eg. a sequential lapw1 step.
This suggests that mpi and fftw are ok.
The problems seem to start in lapw1_mpi, and this program requires in
addition to mpi also scalapack.
I guess you compile with ifort and link with the mkl ??
There is one crucial blacs library, which must be adapted to your mpi,
since they are specific to a particular mpi (intelmpi, openmpi, ...):
Which blacks-library do you link ? -lmkl_blacs_lp64 or another one ??
Check out the doku for the mkl.
Am 04.05.2015 um 05:18 schrieb lung Fermin:
> I have tried to set MPI_REMOTE=0 and used 32 cores (on 2 nodes) for
> distributing the mpi job. However, the problem still persist... but the
error message looks different this time:
>
> $> cat *.error
> Error in LAPW2
> ** testerror: Error in Parallel LAPW2
>
> and the output on screen:
> Warning: no access to tty (Bad file descriptor).
> Thus no job control in this shell.
> z1-17 z1-17 z1-17 z1-17 z1-17 z1-17 z1-17 z1-17 z1-17 z1-17 z1-17
> z1-17 z1-17 z1-17 z1-17 z1-17 z1-18 z1-18 z1-18 z1-18 z1-18 z1-18
> z1-18 z1-18 z1-18 z1-18 z1-18 z1-18
> z1-18 z1-1
> 8 z1-18 z1-18
> number of processors: 32
> LAPW0 END
> [16] Failed to dealloc pd (Device or resource busy) [0] Failed to
> dealloc pd (Device or resource busy) [17] Failed to dealloc pd (Device
> or resource busy) [2] Failed to dealloc pd (Device or resource busy)
> [18] Failed to dealloc pd (Device or resource busy) [1] Failed to
> dealloc pd (Device or resource busy)
> LAPW1 END
> LAPW2 - FERMI; weighs written
> [z1-17:mpispawn_0][child_handler] MPI process (rank: 0, pid: 28291)
> terminated with signal 9 -> abort job [z1-17:mpispawn_0][readline]
Unexpected End-Of-File on file descriptor 9. MPI process died?
> [z1-17:mpispawn_0][mtpmi_processops] Error while reading PMI socket. MPI
process died?
> [z1-17:mpirun_rsh][process_mpispawn_connection] mpispawn_0 from node
> z1-17 aborted: Error while reading a PMI socket (4)
[z1-18:mpispawn_1][read_size] Unexpected End-Of-File on file descriptor 21.
MPI process died?
> [z1-18:mpispawn_1][read_size] Unexpected End-Of-File on file descriptor
21. MPI process died?
> [z1-18:mpispawn_1][handle_mt_peer] Error while reading PMI socket. MPI
process died?
> cp: cannot stat `.in.tmp': No such file or directory
>
> > stop error
>
>
> ----------------------------------------------------------------------
> --------------------------------------
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20150504/ef6bf002/attachment.html>
More information about the Wien
mailing list