[Wien] Error while parallel run
Peter Blaha
pblaha at theochem.tuwien.ac.at
Fri Jul 27 07:32:34 CEST 2012
How should I know the correct name of your computer ???
When you login to the machine, what are you using ??? Most likely, this will be the correct name.
If it is a shared memory machine you should use the same name for all
processes.
Am 26.07.2012 19:45, schrieb alpa dashora:
> Dear Prof. Blaha, Prof. Marks and All Wien2k users,
>
> Thank you very much for reply. I have given the more detail of my system as you required:
>
> 1. What kind of system do you have ??
>
> We have HP ProLiant DL380 G7 (8 servers) with 2 processors each. So we have 16 processors and the total memory is shared by all the processors.
>
> 2. sh ??? What did you specify in siteconfig when configuring the parallel environment ??? shared memory or non-shared memory ??
> During the site configuration, I have used shared memory architecture.
>
> 3. *are your nodes really called "cpu1", ...*
> *
> *
> I have used the 'top' command on terminal, it gives the performance of all the processors. It gives the name of each processor as cpu1, cpu2, cpu3,........ so I
> have taken it as such.
>
> Please suggest me the correct .machines file or any other solution to solve this problem.
>
> With kind regards,
>
> On Thu, Jul 26, 2012 at 2:25 PM, Peter Blaha <pblaha at theochem.tuwien.ac.at <mailto:pblaha at theochem.tuwien.ac.at>> wrote:
>
> You seem to have several errors in your basic installation:
>
>
> > setenv USE_REMOTE 0
> > setenv MPI_REMOTE 0
>
> > [arya:01254] filem:rsh: copy(): Error: File type unknown
>
> rsh ??? What did you specify in siteconfig when configuring the parallel environment ???
>
> shared memory or non-shared memory ??
> ssh or rsh ?? (most likely rsh will not work on most systems)
>
> What kind of system do you have ??
>
> a) Is it ONE computer with many cores (typically some SGI or IBM-power machines, or a SINGLE Computer
> with 2-4 Xeon-quadcore processors), or
> b) a "cluster" (connected via Infiniband) of several (Xeon multicore) nodes
>
> Only a) is a "shared memory machine" and you can set USE_REMOTE to 0
>
> Another problem might be your .machines file:
> are your nodes really called "cpu1", ...
>
> This implies more or less that you have a cluster of single-core machines ???
>
> My guess is that you have a 16 core shared memory machine ???
> In this case, the .machines file must always contain the same "correct" machine name
> (or maybe "localhost"), but not cpu1,2....
>
>
> Am 26.07.2012 10 <tel:26.07.2012%2010>:17, schrieb alpa dashora:
>
> Dear Wien2k Users and Prof. Marks,
>
> Thankyou very much for your reply. I am giving more information.
> Wien2k Version: Wien2k_11.1 on a 8 processor server each has two nodes.
> mkl library: 10.0.1.014
> openmpi: 1.3
> fftw: 2.1.5
>
> My OPTION file is as follows:
>
> current:FOPT:-FR -O3 -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback -l/opt/openmpi/include
> current:FPOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -traceback
> current:LDFLAGS:-L/root/__WIEN2k_11/SRC_lib -L/opt/intel/cmkl/10.0.1.014/__lib/em64t <http://10.0.1.014/lib/em64t> <http://10.0.1.014/lib/em64t> -lmkl_em64t
> -lmkl_blacs_openmpi_lp64 -lmkl_solver -lguide -lpthread
> -i-static
> current:DPARALLEL:'-DParallel'
> current:R_LIBS:-L/opt/intel/__cmkl/10.0.1.014/lib/em64t <http://10.0.1.014/lib/em64t> <http://10.0.1.014/lib/em64t> -lmkl_scalapack_lp64
> -lmkl_solver_lp64_sequential -Wl,--start-group -lmkl_intel_lp64
>
> -lmkl_sequential -lmkl_core -lmkl_blacs_openmpi_lp64 -Wl,--end-group -lpthread -lm -L/opt/openmpi/1.3/lib/ -lmpi_f90 -lmpi_f77 -lmpi -lopen-rte -lopen-pal -ldl
> -Wl,--export-dynamic -lnsl -lutil -limf -L/opt/fftw-2.1.5/lib/lib/ -lfftw_mpi -lrfftw_mpi -lfftw -lrfftw
> current:RP_LIBS:-L/opt/intel/__cmkl/10.0.1.014/lib/em64t <http://10.0.1.014/lib/em64t> <http://10.0.1.014/lib/em64t> -lmkl_scalapack_lp64
> -lmkl_solver_lp64_sequential -Wl,--start-group -lmkl_intel_lp64
>
> -lmkl_sequential -lmkl_core -lmkl_blacs_openmpi_lp64 -Wl,--end-group -lpthread -lm -L/opt/openmpi/1.3/lib/ -lmpi_f90 -lmpi_f77 -lmpi -lopen-rte -lopen-pal -ldl
> -Wl,--export-dynamic -lnsl -lutil -limf -L/opt/fftw-2.1.5/lib/lib/ -lfftw_mpi -lrfftw_mpi -lfftw -lrfftw
> current:MPIRUN:/opt/openmpi/1.__3/bin/mpirun -v -n _NP_ _EXEC_
>
> My parallel_option file is as follows:
>
> setenv USE_REMOTE 0
> setenv MPI_REMOTE 0
> setenv WIEN_GRANULARITY 1
> setenv WIEN_MPIRUN "/opt/openmpi/1.3/bin/mpirun -v -n _NP_ -machinefile _HOSTS_ _EXEC_"
>
> On the compilation no error message was received and all the executable files are generated. I have edited parallel_option file, so now the error message is changed
> and it is as
> follows:
>
> [arya:01254] filem:rsh: copy(): Error: File type unknown
> ssh: cpu1: Name or service not known
>
> ------------------------------__------------------------------__--------------
> A daemon (pid 9385) died unexpectedly with status 255 while attempting
> to launch so we are aborting.
>
> There may be more information reported by the environment (see above).
>
> This may be because the daemon was unable to find all the needed shared
> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
> location of the shared libraries on the remote nodes and this will
> automatically be forwarded to the remote nodes.
> ------------------------------__------------------------------__--------------
> ------------------------------__------------------------------__--------------
> mpirun noticed that the job aborted, but has no info as to the process
> that caused that situation.
> ------------------------------__------------------------------__--------------
> ssh: cpu2: Name or service not known
>
> ssh: cpu3: Name or service not known
>
> ssh: cpu4: Name or service not known
>
> mpirun: clean termination accomplished
>
> LAPW1 - Error
> LAPW1 - Error
> LAPW1 - Error
> LAPW1 - Error
> LAPW1 - Error
> LAPW1 - Error
> LAPW1 - Error
>
> I have used the following .machines file for 16 k-points:
>
> granularity:1
> 1:cpu1
> 1:cpu2
> 1:cpu3
> 1:cpu4
> 1:cpu5
> 1:cpu6
> 1:cpu7
> 1:cpu8
> 1:cpu9
> 1:cpu10
> 1:cpu11
> 1:cpu12
> 1:cpu13
> 1:cpu14
> 1:cpu15
> 1:cpu16
> extrafine:1
> lapw0: cpu1:1 cpu2:1 cpu3:1 cpu4:1
>
> Please any one suggest me the solution of this problem.
>
> With kind regards,
>
>
> On Mon, Jul 23, 2012 at 4:50 PM, Laurence Marks <L-marks at northwestern.edu <mailto:L-marks at northwestern.edu> <mailto:L-marks at northwestern.__edu
> <mailto:L-marks at northwestern.edu>>> wrote:
>
> You probably have an incorrect MPIRUN environmental parameter. You have not provided enough information, and need to do a bit more analysis yourself.
>
> ---------------------------
> Professor Laurence Marks
> Department of Materials Science and Engineering
> Northwestern University
> www.numis.northwestern.edu <http://www.numis.northwestern.edu> <http://www.numis.__northwestern.edu <http://www.numis.northwestern.edu>> 1-847-491-3996
> <tel:1-847-491-3996>
>
> "Research is to see what everybody else has seen, and to think what nobody else has thought"
> Albert Szent-Gyorgi
>
> On Jul 23, 2012 6:17 AM, "alpa dashora" <dashoralpa at gmail.com <mailto:dashoralpa at gmail.com> <mailto:dashoralpa at gmail.com <mailto:dashoralpa at gmail.com>>> wrote:
>
> Dear Wien2k Users,
>
> I recently installed Wien2k with openmpi on 16 processor server. Installation was completed without any compilation error. While running the run_lapw -p
> command, I
> received the following error:
> ------------------------------__------------------------------__------------------------------__------------------------------__------
>
> mpirun was unable to launch the specified application as it could not find an executable:
>
> Executable:-4
> Node: arya
>
> while attempting to start process rank 0.
> ------------------------------__------------------------------__------------------------------__------------------------------__-------
>
> Kindly suggest me the solution.
> mpirun is available in /opt/openmpi/1.3/bin
>
> Thank you in advance.
>
> Regards,
>
> --
> Dr. Alpa Dashora
>
>
> _________________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.__at <mailto:Wien at zeus.theochem.tuwien.ac.at> <mailto:Wien at zeus.theochem.__tuwien.ac.at <mailto:Wien at zeus.theochem.tuwien.ac.at>>
>
> http://zeus.theochem.tuwien.__ac.at/mailman/listinfo/wien <http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien>
>
>
>
>
> --
> Alpa Dashora
>
>
> _________________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.__at <mailto:Wien at zeus.theochem.tuwien.ac.at>
> http://zeus.theochem.tuwien.__ac.at/mailman/listinfo/wien <http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien>
>
>
> --
>
> P.Blaha
> ------------------------------__------------------------------__--------------
> Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
> Phone: +43-1-58801-165300 <tel:%2B43-1-58801-165300> FAX: +43-1-58801-165982 <tel:%2B43-1-58801-165982>
> Email: blaha at theochem.tuwien.ac.at <mailto:blaha at theochem.tuwien.ac.at> WWW: http://info.tuwien.ac.at/__theochem/ <http://info.tuwien.ac.at/theochem/>
> ------------------------------__------------------------------__--------------
>
> _________________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.__at <mailto:Wien at zeus.theochem.tuwien.ac.at>
> http://zeus.theochem.tuwien.__ac.at/mailman/listinfo/wien <http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien>
>
>
>
>
> --
> Alpa Dashora
>
>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>
--
-----------------------------------------
Peter Blaha
Inst. Materials Chemistry, TU Vienna
Getreidemarkt 9, A-1060 Vienna, Austria
Tel: +43-1-5880115671
Fax: +43-1-5880115698
email: pblaha at theochem.tuwien.ac.at
-----------------------------------------
More information about the Wien
mailing list