[Wien] Two kinds of MPI run error

贾亚磊 jia_yalei at 163.com
Mon Aug 12 14:04:09 CEST 2013


Thank you for you suggestion.
However, "setenv  WIEN_MPIRUN "mpirun -x LD_LIBRARY_PATH -x PATH -np _NP_ -machinefile _HOSTS_ _EXEC_" and compile with "-static" does not take effect in my case.
Then, I install mpich2-1.4.1p1 and use it to compile wien2k_11 with -static option, the program can run in mpi mode with muti nodes now.



At 2013-08-11 04:07:17,"Laurence Marks" <L-marks at northwestern.edu> wrote:
>By default, openmpi does not export LD_LIBRARY_PATH or any
>environmental variables, and you have to tell it to do this, e.g. in
>parallel_options use
>
>setenv WIEN_MPIRUN "mpirun -x LD_LIBRARY_PATH -x PATH -np _NP_
>-machinefile _HOSTS_ _EXEC_"
>
>That may solve part of the problem associated with the shared library.
>I prefer to avoid shared libraries as much as possible as this avoids
>such problems, e.g. use -static or -i-static when compiling.
>
>The ".machine5" may go away when you correct the LD_LIBRARY_PATH
>issue, or it could be an error in your .machines file.
>
>
>On Sat, Aug 10, 2013 at 12:12 PM, 贾亚磊 <jia_yalei at 163.com> wrote:
>> Dear all,
>>      I compile wien2k 11 on linux centos 5.5 with icc , ifort 11.1, openmpi
>> mpif90, and mkl(combined with  ifort compiler,)  with the following
>> parameter in $WIENROOT/OPTIONS:
>>
>> current:FOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback
>> -i-static
>> current:FPOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback
>> -i-static
>> current:LDFLAGS:$(FOPT) -L/home/yljia/intel/Compiler/11.1/072/mkl/lib/em64t
>> -pthread
>> current:DPARALLEL:'-DParallel'
>> current:R_LIBS:$(MKLROOT)/lib/em64t/libmkl_lapack95_lp64.a -Wl,--start-group
>> $(MKLROOT! )/lib/em64t/libmkl_intel_lp64.a
>> $(MKLROOT)/lib/em64t/libmkl_intel_thread.a
>> $(MKLROOT)/lib/em64t/libmkl_core.a -Wl,--end-group -openmp -lpthread -lm
>> -lguide
>> current:RP_LIBS:$(MKLROOT)/lib/em64t/libmkl_scalapack_lp64.a
>> $(MKLROOT)/lib/em64t/libmkl_solver_lp64.a -Wl,--start-group
>> $(MKLROOT)/lib/em64t/libmkl_intel_lp64.a
>> $(MKLROOT)/lib/em64t/libmkl_intel_thread.a
>> $(MKLROOT)/lib/em64t/libmkl_core.a
>> $(MKLROOT)/lib/em64t/libmkl_blacs_openmpi_lp64.a -Wl,--end-group -lpthread
>> -lm /home/yljia/compiler_library/fftw-2.1.5/lib/libfftw_mpi.a
>> /home/yljia/compiler_library/fftw-2.1.5/lib/libfftw.a $(R_LIBS)
>> current:MPIRUN:mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_
>>
>> and in my submitted shell script I add
>>
>> source ~/.cshrc
>> source /home/yljia/intel/Compiler/11.1/073/bin/iccvars.csh intel64
>> source /home/yljia/intel/Compiler/11.1/072/bin/ifortvars.csh intel64
>> source /home!
>> /yljia/intel/Compiler/11.1/072/mkl/tools/environment/mklvarsem64t.csh
>> setenv LD_LIBRARY_PATH
>> /home/yljia/compiler_library/fftw-2.1.5/lib:$LD_LIBRARY_PATH
>> set path = (/home/yljia/compiler_library/openmpi-1.6.1/bin $path)
>> setenv LD_LIBRARY_PATH
>> /home/yljia/compiler_library/openmpi-1.6.1/lib:$LD_LIBRARY_PATH
>> setenv OMP_NUM_THREADS 1
>> setenv MKL_NUM_THREADS 1
>>
>> The program can run in non parallel mode, k point paralle(one node and multi
>> nodes, USE_REMOTE= 0 and 1). But in mpi  parallel mode , there are two case:
>> 1). On one node--The run_lapw program can run with MPI_REMOTE=0, but can not
>> run at lapw1 when MPI_REMOTE=1 with error messages in STDOUT like(NOTE:there
>> is libmpi_f90.so.1 in $OpenmpiRoot/lib/):
>>
>> /home/yljia/software/wien2k_11/lapw1_mpi: error while loading shared
>> libraries: libmpi_f90.so.1: cannot open shared object file: No such file or
>> directory
>> /home/yljia/software/wien2k_11/lapw1_mpi: error while loading shared
>> libraries: libmpi_f90.so.1: cannot open shared object file: No such file or
>> directory
>>
>> 2). On two nodes--The run_lapw program can not run at lapw1  with
>> MPI_REMOTE=0 or MPI_REMOTE=1. When MPI_REMOTE=0 the error messages are like:
>>
>> There are no allocated resources for the application
>>   /home/yljia/software/wien2k_11/lapw1_mpi
>> that match the requested mapping:
>>   .machine5
>> Verify that you have mapped the allocated resources properly using the
>> --host or --hostfile specification.
>>
>> When MPI_REMOTE=1 the error messages are like:
>>
>> /home/yljia/software/wien2k_11/lapw1_mpi: error while loading shared
>> libraries: libmpi_f90.so.1: cannot open shared object file: No such file or
>> directory
>> /home/yljia/software/wien2k_11/lapw1_mpi: error while loading shared
>> libraries: libmpi_f90.so.1: cannot open share! d object file: No such file
>> or directory
>>
>> Best regards,
>> Jia Yalei
>
>
>
>-- 
>Professor Laurence Marks
>Department of Materials Science and Engineering
>Northwestern University
>www.numis.northwestern.edu 1-847-491-3996
>"Research is to see what everybody else has seen, and to think what
>nobody else has thought"
>Albert Szent-Gyorgi
>_______________________________________________
>Wien mailing list
>Wien at zeus.theochem.tuwien.ac.at
>http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>SEARCH the MAILING-LIST at:  http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20130812/cc0f65d3/attachment-0001.htm>


More information about the Wien mailing list