[Wien] Error while parallel run

Peter Blaha pblaha at theochem.tuwien.ac.at
Fri Jul 27 07:32:34 CEST 2012


How should I know the correct name of your computer ???

When you login to the machine, what are you using ??? Most likely, this will be the correct name.

If it is a shared memory machine you should use the same name for all
processes.

Am 26.07.2012 19:45, schrieb alpa dashora:
> Dear Prof. Blaha, Prof. Marks and All Wien2k users,
>
> Thank you very much for reply. I have given the more detail of my system as you required:
>
> 1. What kind of system do you have ??
>
>      We have HP ProLiant DL380 G7 (8 servers) with 2 processors each. So we have 16 processors and the total memory is shared by all the processors.
>
> 2. sh ???   What did you specify in siteconfig when configuring the parallel environment ??? shared memory or non-shared memory  ??
>      During the site configuration, I have used shared memory architecture.
>
> 3. *are your nodes really called "cpu1", ...*
> *
> *
>     I have used the 'top' command on terminal, it gives the performance of all the processors. It gives the name of each processor as cpu1, cpu2,      cpu3,........ so I
> have taken it as such.
>
> Please suggest me the correct .machines file or any other solution to solve this problem.
>
> With kind regards,
>
> On Thu, Jul 26, 2012 at 2:25 PM, Peter Blaha <pblaha at theochem.tuwien.ac.at <mailto:pblaha at theochem.tuwien.ac.at>> wrote:
>
>     You seem to have several errors in your basic installation:
>
>
>      > setenv USE_REMOTE 0
>      > setenv MPI_REMOTE 0
>
>      > [arya:01254] filem:rsh: copy(): Error: File type unknown
>
>     rsh ???   What did you specify in siteconfig when configuring the parallel environment ???
>
>     shared memory or non-shared memory  ??
>     ssh  or  rsh  ??    (most likely rsh will not work on most systems)
>
>     What kind of system do you have ??
>
>     a) Is it ONE computer with many cores (typically some SGI or IBM-power machines, or a SINGLE Computer
>                                      with 2-4 Xeon-quadcore processors), or
>     b) a "cluster" (connected via Infiniband) of several (Xeon multicore) nodes
>
>     Only a) is a "shared memory machine" and you can set USE_REMOTE to 0
>
>     Another problem might be your   .machines file:
>     are your nodes really called "cpu1", ...
>
>     This implies more or less that you have a cluster of single-core machines ???
>
>     My guess is that you have a 16 core shared memory machine ???
>     In this case, the  .machines file must always contain the same "correct" machine name
>     (or maybe "localhost"), but not cpu1,2....
>
>
>     Am 26.07.2012 10 <tel:26.07.2012%2010>:17, schrieb alpa dashora:
>
>         Dear Wien2k Users and Prof. Marks,
>
>         Thankyou very much for your reply. I am giving more information.
>         Wien2k Version: Wien2k_11.1 on a 8 processor server each has two nodes.
>         mkl library: 10.0.1.014
>         openmpi: 1.3
>         fftw: 2.1.5
>
>         My OPTION file is as follows:
>
>         current:FOPT:-FR -O3 -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback -l/opt/openmpi/include
>         current:FPOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -traceback
>         current:LDFLAGS:-L/root/__WIEN2k_11/SRC_lib -L/opt/intel/cmkl/10.0.1.014/__lib/em64t <http://10.0.1.014/lib/em64t> <http://10.0.1.014/lib/em64t> -lmkl_em64t
>         -lmkl_blacs_openmpi_lp64 -lmkl_solver -lguide -lpthread
>         -i-static
>         current:DPARALLEL:'-DParallel'
>         current:R_LIBS:-L/opt/intel/__cmkl/10.0.1.014/lib/em64t <http://10.0.1.014/lib/em64t> <http://10.0.1.014/lib/em64t> -lmkl_scalapack_lp64
>         -lmkl_solver_lp64_sequential -Wl,--start-group -lmkl_intel_lp64
>
>         -lmkl_sequential -lmkl_core -lmkl_blacs_openmpi_lp64 -Wl,--end-group -lpthread -lm -L/opt/openmpi/1.3/lib/ -lmpi_f90 -lmpi_f77 -lmpi -lopen-rte -lopen-pal -ldl
>         -Wl,--export-dynamic -lnsl -lutil -limf -L/opt/fftw-2.1.5/lib/lib/ -lfftw_mpi -lrfftw_mpi -lfftw -lrfftw
>         current:RP_LIBS:-L/opt/intel/__cmkl/10.0.1.014/lib/em64t <http://10.0.1.014/lib/em64t> <http://10.0.1.014/lib/em64t> -lmkl_scalapack_lp64
>         -lmkl_solver_lp64_sequential -Wl,--start-group -lmkl_intel_lp64
>
>         -lmkl_sequential -lmkl_core -lmkl_blacs_openmpi_lp64 -Wl,--end-group -lpthread -lm -L/opt/openmpi/1.3/lib/ -lmpi_f90 -lmpi_f77 -lmpi -lopen-rte -lopen-pal -ldl
>         -Wl,--export-dynamic -lnsl -lutil -limf -L/opt/fftw-2.1.5/lib/lib/ -lfftw_mpi -lrfftw_mpi -lfftw -lrfftw
>         current:MPIRUN:/opt/openmpi/1.__3/bin/mpirun -v -n _NP_ _EXEC_
>
>         My parallel_option file is as follows:
>
>         setenv USE_REMOTE 0
>         setenv MPI_REMOTE 0
>         setenv WIEN_GRANULARITY 1
>         setenv WIEN_MPIRUN "/opt/openmpi/1.3/bin/mpirun -v -n _NP_ -machinefile _HOSTS_ _EXEC_"
>
>         On the compilation no error message was received and all the executable files are generated. I have edited parallel_option file, so now the error message is changed
>         and it is as
>         follows:
>
>         [arya:01254] filem:rsh: copy(): Error: File type unknown
>         ssh: cpu1: Name or service not known
>
>         ------------------------------__------------------------------__--------------
>         A daemon (pid 9385) died unexpectedly with status 255 while attempting
>         to launch so we are aborting.
>
>         There may be more information reported by the environment (see above).
>
>         This may be because the daemon was unable to find all the needed shared
>         libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
>         location of the shared libraries on the remote nodes and this will
>         automatically be forwarded to the remote nodes.
>         ------------------------------__------------------------------__--------------
>         ------------------------------__------------------------------__--------------
>         mpirun noticed that the job aborted, but has no info as to the process
>         that caused that situation.
>         ------------------------------__------------------------------__--------------
>         ssh: cpu2: Name or service not known
>
>         ssh: cpu3: Name or service not known
>
>         ssh: cpu4: Name or service not known
>
>         mpirun: clean termination accomplished
>
>         LAPW1 - Error
>         LAPW1 - Error
>         LAPW1 - Error
>         LAPW1 - Error
>         LAPW1 - Error
>         LAPW1 - Error
>         LAPW1 - Error
>
>         I have used the following .machines file for 16 k-points:
>
>         granularity:1
>         1:cpu1
>         1:cpu2
>         1:cpu3
>         1:cpu4
>         1:cpu5
>         1:cpu6
>         1:cpu7
>         1:cpu8
>         1:cpu9
>         1:cpu10
>         1:cpu11
>         1:cpu12
>         1:cpu13
>         1:cpu14
>         1:cpu15
>         1:cpu16
>         extrafine:1
>         lapw0: cpu1:1 cpu2:1 cpu3:1 cpu4:1
>
>         Please any one suggest me the solution of this problem.
>
>         With kind regards,
>
>
>         On Mon, Jul 23, 2012 at 4:50 PM, Laurence Marks <L-marks at northwestern.edu <mailto:L-marks at northwestern.edu> <mailto:L-marks at northwestern.__edu
>         <mailto:L-marks at northwestern.edu>>> wrote:
>
>              You probably have an incorrect MPIRUN environmental parameter. You have not provided enough information, and need to do a bit more analysis yourself.
>
>              ---------------------------
>              Professor Laurence Marks
>              Department of Materials Science and Engineering
>              Northwestern University
>         www.numis.northwestern.edu <http://www.numis.northwestern.edu> <http://www.numis.__northwestern.edu <http://www.numis.northwestern.edu>> 1-847-491-3996
>         <tel:1-847-491-3996>
>
>              "Research is to see what everybody else has seen, and to think what nobody else has thought"
>              Albert Szent-Gyorgi
>
>              On Jul 23, 2012 6:17 AM, "alpa dashora" <dashoralpa at gmail.com <mailto:dashoralpa at gmail.com> <mailto:dashoralpa at gmail.com <mailto:dashoralpa at gmail.com>>> wrote:
>
>                  Dear Wien2k Users,
>
>                  I recently installed Wien2k with openmpi on 16 processor server. Installation was completed without any compilation error. While running the run_lapw -p
>         command, I
>                  received the following error:
>                  ------------------------------__------------------------------__------------------------------__------------------------------__------
>
>                  mpirun was unable to launch the specified application as it could not find an executable:
>
>                  Executable:-4
>                  Node: arya
>
>                  while attempting to start process rank 0.
>                  ------------------------------__------------------------------__------------------------------__------------------------------__-------
>
>                  Kindly suggest me the solution.
>                  mpirun is available in /opt/openmpi/1.3/bin
>
>                  Thank you in advance.
>
>                  Regards,
>
>                  --
>                  Dr. Alpa Dashora
>
>
>              _________________________________________________
>              Wien mailing list
>         Wien at zeus.theochem.tuwien.ac.__at <mailto:Wien at zeus.theochem.tuwien.ac.at> <mailto:Wien at zeus.theochem.__tuwien.ac.at <mailto:Wien at zeus.theochem.tuwien.ac.at>>
>
>         http://zeus.theochem.tuwien.__ac.at/mailman/listinfo/wien <http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien>
>
>
>
>
>         --
>         Alpa Dashora
>
>
>         _________________________________________________
>         Wien mailing list
>         Wien at zeus.theochem.tuwien.ac.__at <mailto:Wien at zeus.theochem.tuwien.ac.at>
>         http://zeus.theochem.tuwien.__ac.at/mailman/listinfo/wien <http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien>
>
>
>     --
>
>                                            P.Blaha
>     ------------------------------__------------------------------__--------------
>     Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
>     Phone: +43-1-58801-165300 <tel:%2B43-1-58801-165300>             FAX: +43-1-58801-165982 <tel:%2B43-1-58801-165982>
>     Email: blaha at theochem.tuwien.ac.at <mailto:blaha at theochem.tuwien.ac.at>    WWW: http://info.tuwien.ac.at/__theochem/ <http://info.tuwien.ac.at/theochem/>
>     ------------------------------__------------------------------__--------------
>
>     _________________________________________________
>     Wien mailing list
>     Wien at zeus.theochem.tuwien.ac.__at <mailto:Wien at zeus.theochem.tuwien.ac.at>
>     http://zeus.theochem.tuwien.__ac.at/mailman/listinfo/wien <http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien>
>
>
>
>
> --
> Alpa Dashora
>
>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>

-- 
-----------------------------------------
Peter Blaha
Inst. Materials Chemistry, TU Vienna
Getreidemarkt 9, A-1060 Vienna, Austria
Tel: +43-1-5880115671
Fax: +43-1-5880115698
email: pblaha at theochem.tuwien.ac.at
-----------------------------------------




More information about the Wien mailing list