[Wien] Two kinds of MPI run error

贾亚磊 jia_yalei at 163.com
Sat Aug 10 19:12:24 CEST 2013


Dear all,
     I compile wien2k 11 on linux centos 5.5 with icc , ifort 11.1, openmpi mpif90, and mkl(combined with  ifort compiler,)  with the following parameter in $WIENROOT/OPTIONS:
current:FOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback -i-static
current:FPOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback -i-static
current:LDFLAGS:$(FOPT) -L/home/yljia/intel/Compiler/11.1/072/mkl/lib/em64t -pthread
current:DPARALLEL:'-DParallel'
current:R_LIBS:$(MKLROOT)/lib/em64t/libmkl_lapack95_lp64.a -Wl,--start-group $(MKLROOT)/lib/em64t/libmkl_intel_lp64.a $(MKLROOT)/lib/em64t/libmkl_intel_thread.a $(MKLROOT)/lib/em64t/libmkl_core.a -Wl,--end-group -openmp -lpthread -lm -lguide
current:RP_LIBS:$(MKLROOT)/lib/em64t/libmkl_scalapack_lp64.a $(MKLROOT)/lib/em64t/libmkl_solver_lp64.a -Wl,--start-group $(MKLROOT)/lib/em64t/libmkl_intel_lp64.a $(MKLROOT)/lib/em64t/libmkl_intel_thread.a $(MKLROOT)/lib/em64t/libmkl_core.a $(MKLROOT)/lib/em64t/libmkl_blacs_openmpi_lp64.a -Wl,--end-group -lpthread -lm /home/yljia/compiler_library/fftw-2.1.5/lib/libfftw_mpi.a /home/yljia/compiler_library/fftw-2.1.5/lib/libfftw.a $(R_LIBS)
current:MPIRUN:mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_
and in my submitted shell script I add
source ~/.cshrc
source /home/yljia/intel/Compiler/11.1/073/bin/iccvars.csh intel64
source /home/yljia/intel/Compiler/11.1/072/bin/ifortvars.csh intel64
source /home/yljia/intel/Compiler/11.1/072/mkl/tools/environment/mklvarsem64t.csh
setenv LD_LIBRARY_PATH /home/yljia/compiler_library/fftw-2.1.5/lib:$LD_LIBRARY_PATH
set path = (/home/yljia/compiler_library/openmpi-1.6.1/bin $path)
setenv LD_LIBRARY_PATH /home/yljia/compiler_library/openmpi-1.6.1/lib:$LD_LIBRARY_PATH
setenv OMP_NUM_THREADS 1
setenv MKL_NUM_THREADS 1
The program can run in non parallel mode, k point paralle(one node and multi nodes, USE_REMOTE= 0 and 1). But in mpi  parallel mode , there are two case:
1). On one node--The run_lapw program can run with MPI_REMOTE=0, but can not run at lapw1 when MPI_REMOTE=1 with error messages in STDOUT like(NOTE:there is libmpi_f90.so.1 in $OpenmpiRoot/lib/):
/home/yljia/software/wien2k_11/lapw1_mpi: error while loading shared libraries: libmpi_f90.so.1: cannot open shared object file: No such file or directory
/home/yljia/software/wien2k_11/lapw1_mpi: error while loading shared libraries: libmpi_f90.so.1: cannot open shared object file: No such file or directory
2). On two nodes--The run_lapw program can not run at lapw1  with MPI_REMOTE=0 or MPI_REMOTE=1. When MPI_REMOTE=0 the error messages are like:
There are no allocated resources for the application
  /home/yljia/software/wien2k_11/lapw1_mpi
that match the requested mapping:
  .machine5
Verify that you have mapped the allocated resources properly using the
--host or --hostfile specification.
When MPI_REMOTE=1 the error messages are like:/home/yljia/software/wien2k_11/lapw1_mpi: error while loading shared libraries: libmpi_f90.so.1: cannot open shared object file: No such file or directory
/home/yljia/software/wien2k_11/lapw1_mpi: error while loading shared libraries: libmpi_f90.so.1: cannot open shared object file: No such file or directory

Best regards,
Jia Yalei
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20130811/42215b2b/attachment.htm>


More information about the Wien mailing list