[Wien] Error while parallel run

Peter Blaha pblaha at theochem.tuwien.ac.at
Thu Jul 26 10:55:20 CEST 2012


You seem to have several errors in your basic installation:

 > setenv USE_REMOTE 0
 > setenv MPI_REMOTE 0

 > [arya:01254] filem:rsh: copy(): Error: File type unknown

rsh ???   What did you specify in siteconfig when configuring the parallel environment ???

shared memory or non-shared memory  ??
ssh  or  rsh  ??    (most likely rsh will not work on most systems)

What kind of system do you have ??

a) Is it ONE computer with many cores (typically some SGI or IBM-power machines, or a SINGLE Computer
                                 with 2-4 Xeon-quadcore processors), or
b) a "cluster" (connected via Infiniband) of several (Xeon multicore) nodes

Only a) is a "shared memory machine" and you can set USE_REMOTE to 0

Another problem might be your   .machines file:
are your nodes really called "cpu1", ...

This implies more or less that you have a cluster of single-core machines ???

My guess is that you have a 16 core shared memory machine ???
In this case, the  .machines file must always contain the same "correct" machine name
(or maybe "localhost"), but not cpu1,2....


Am 26.07.2012 10:17, schrieb alpa dashora:
> Dear Wien2k Users and Prof. Marks,
>
> Thankyou very much for your reply. I am giving more information.
> Wien2k Version: Wien2k_11.1 on a 8 processor server each has two nodes.
> mkl library: 10.0.1.014
> openmpi: 1.3
> fftw: 2.1.5
>
> My OPTION file is as follows:
>
> current:FOPT:-FR -O3 -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback -l/opt/openmpi/include
> current:FPOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -traceback
> current:LDFLAGS:-L/root/WIEN2k_11/SRC_lib -L/opt/intel/cmkl/10.0.1.014/lib/em64t <http://10.0.1.014/lib/em64t> -lmkl_em64t -lmkl_blacs_openmpi_lp64 -lmkl_solver -lguide -lpthread
> -i-static
> current:DPARALLEL:'-DParallel'
> current:R_LIBS:-L/opt/intel/cmkl/10.0.1.014/lib/em64t <http://10.0.1.014/lib/em64t> -lmkl_scalapack_lp64 -lmkl_solver_lp64_sequential -Wl,--start-group -lmkl_intel_lp64
> -lmkl_sequential -lmkl_core -lmkl_blacs_openmpi_lp64 -Wl,--end-group -lpthread -lm -L/opt/openmpi/1.3/lib/ -lmpi_f90 -lmpi_f77 -lmpi -lopen-rte -lopen-pal -ldl
> -Wl,--export-dynamic -lnsl -lutil -limf -L/opt/fftw-2.1.5/lib/lib/ -lfftw_mpi -lrfftw_mpi -lfftw -lrfftw
> current:RP_LIBS:-L/opt/intel/cmkl/10.0.1.014/lib/em64t <http://10.0.1.014/lib/em64t> -lmkl_scalapack_lp64 -lmkl_solver_lp64_sequential -Wl,--start-group -lmkl_intel_lp64
> -lmkl_sequential -lmkl_core -lmkl_blacs_openmpi_lp64 -Wl,--end-group -lpthread -lm -L/opt/openmpi/1.3/lib/ -lmpi_f90 -lmpi_f77 -lmpi -lopen-rte -lopen-pal -ldl
> -Wl,--export-dynamic -lnsl -lutil -limf -L/opt/fftw-2.1.5/lib/lib/ -lfftw_mpi -lrfftw_mpi -lfftw -lrfftw
> current:MPIRUN:/opt/openmpi/1.3/bin/mpirun -v -n _NP_ _EXEC_
>
> My parallel_option file is as follows:
>
> setenv USE_REMOTE 0
> setenv MPI_REMOTE 0
> setenv WIEN_GRANULARITY 1
> setenv WIEN_MPIRUN "/opt/openmpi/1.3/bin/mpirun -v -n _NP_ -machinefile _HOSTS_ _EXEC_"
>
> On the compilation no error message was received and all the executable files are generated. I have edited parallel_option file, so now the error message is changed and it is as
> follows:
>
> [arya:01254] filem:rsh: copy(): Error: File type unknown
> ssh: cpu1: Name or service not known
>
> --------------------------------------------------------------------------
> A daemon (pid 9385) died unexpectedly with status 255 while attempting
> to launch so we are aborting.
>
> There may be more information reported by the environment (see above).
>
> This may be because the daemon was unable to find all the needed shared
> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
> location of the shared libraries on the remote nodes and this will
> automatically be forwarded to the remote nodes.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> mpirun noticed that the job aborted, but has no info as to the process
> that caused that situation.
> --------------------------------------------------------------------------
> ssh: cpu2: Name or service not known
>
> ssh: cpu3: Name or service not known
>
> ssh: cpu4: Name or service not known
>
> mpirun: clean termination accomplished
>
> LAPW1 - Error
> LAPW1 - Error
> LAPW1 - Error
> LAPW1 - Error
> LAPW1 - Error
> LAPW1 - Error
> LAPW1 - Error
>
> I have used the following .machines file for 16 k-points:
>
> granularity:1
> 1:cpu1
> 1:cpu2
> 1:cpu3
> 1:cpu4
> 1:cpu5
> 1:cpu6
> 1:cpu7
> 1:cpu8
> 1:cpu9
> 1:cpu10
> 1:cpu11
> 1:cpu12
> 1:cpu13
> 1:cpu14
> 1:cpu15
> 1:cpu16
> extrafine:1
> lapw0: cpu1:1 cpu2:1 cpu3:1 cpu4:1
>
> Please any one suggest me the solution of this problem.
>
> With kind regards,
>
>
> On Mon, Jul 23, 2012 at 4:50 PM, Laurence Marks <L-marks at northwestern.edu <mailto:L-marks at northwestern.edu>> wrote:
>
>     You probably have an incorrect MPIRUN environmental parameter. You have not provided enough information, and need to do a bit more analysis yourself.
>
>     ---------------------------
>     Professor Laurence Marks
>     Department of Materials Science and Engineering
>     Northwestern University
>     www.numis.northwestern.edu <http://www.numis.northwestern.edu> 1-847-491-3996
>     "Research is to see what everybody else has seen, and to think what nobody else has thought"
>     Albert Szent-Gyorgi
>
>     On Jul 23, 2012 6:17 AM, "alpa dashora" <dashoralpa at gmail.com <mailto:dashoralpa at gmail.com>> wrote:
>
>         Dear Wien2k Users,
>
>         I recently installed Wien2k with openmpi on 16 processor server. Installation was completed without any compilation error. While running the run_lapw -p command, I
>         received the following error:
>         ------------------------------------------------------------------------------------------------------------------------------
>
>         mpirun was unable to launch the specified application as it could not find an executable:
>
>         Executable:-4
>         Node: arya
>
>         while attempting to start process rank 0.
>         -------------------------------------------------------------------------------------------------------------------------------
>
>         Kindly suggest me the solution.
>         mpirun is available in /opt/openmpi/1.3/bin
>
>         Thank you in advance.
>
>         Regards,
>
>         --
>         Dr. Alpa Dashora
>
>
>     _______________________________________________
>     Wien mailing list
>     Wien at zeus.theochem.tuwien.ac.at <mailto:Wien at zeus.theochem.tuwien.ac.at>
>     http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>
>
>
>
> --
> Alpa Dashora
>
>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>

-- 

                                       P.Blaha
--------------------------------------------------------------------------
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-165300             FAX: +43-1-58801-165982
Email: blaha at theochem.tuwien.ac.at    WWW: http://info.tuwien.ac.at/theochem/
--------------------------------------------------------------------------


More information about the Wien mailing list