[Wien] Error while parallel run
Gavin Abo
gsabo at crimson.ua.edu
Mon Jul 30 10:21:58 CEST 2012
Previously, you were supposedly using
/opt/openmpi/1.3
that seemed to work fine. So wondering why you are now using:
/opt/openmpi-1.4.5
One HP ProLiant DL380 G7 server box
(http://h18004.www1.hp.com/products/quickspecs/13595_div/13595_div.html)
has one or two processors each with 2, 4, or 6 cores. You previously
indicated that you have the two processor model, but you have not
mentioned the number of cores. Each server has its own memory. You
mentioned having 8 server boxes. Do all 8 server boxes have the same
number of processors and cores? Without this information, no one knows
what your .machines might need to be.
On 7/30/2012 1:02 AM, Peter Blaha wrote:
> First: If you are unexperienced in computing, why would you use mpi at
> all.
> Try the k-point parallel version first.
>
> .machines:
> 1:arya
> 1:arya
> 1:arya
> ....
>
> no lapw0 line !!
>
> Am 30.07.2012 08:58, schrieb alpa dashora:
>> Dear Wien2k Users, Mr. Abo and Prof. Blaha,
>>
>> I have edited my .machines file with the correct cpu name. My new
>> .machines file is as follows:
>> granularity:1
>> 1:arya:2
>> 1:arya:2
>> 1:arya:2
>> 1:arya:2
>> 1:arya:2
>> 1:arya:2
>> 1:arya:2
>> 1:arya:2
>> extrafine:1
>> lapw0: arya:2 arya:2
>>
>> After run_lapw -p: it gives the following error message:
>> exe: MPI_Init: MPI_Root is not set
>> exe: MPI_Init: Cannot set mpirun startup protocol
>>
>> Than, I have set the MPI_ROOT as:
>> export MPI_ROOT=/opt/openmpi-1.4.5
>>
>> After export MPI_ROOT the following error was received:
>>
>> exe: MPI_INIT: Can't read plugin directory
>> /opt/openmpi-1.4.5/lib/linux_amd64/plugins
>> exe: MPI_Init: No plugins will be available
>>
>> I didnt have any idea about openmpi, please tell me how to solve this
>> error. Please also comment on the .machines file.
>>
>> With kind regards,
>>
>>
>>
>> On Fri, Jul 27, 2012 at 11:02 AM, Peter Blaha
>> <pblaha at theochem.tuwien.ac.at <mailto:pblaha at theochem.tuwien.ac.at>>
>> wrote:
>>
>> How should I know the correct name of your computer ???
>>
>> When you login to the machine, what are you using ??? Most
>> likely, this will be the correct name.
>>
>> If it is a shared memory machine you should use the same name for
>> all
>> processes.
>>
>> Am 26.07.2012 19:45, schrieb alpa dashora:
>>
>> Dear Prof. Blaha, Prof. Marks and All Wien2k users,
>>
>> Thank you very much for reply. I have given the more detail
>> of my system as you required:
>>
>> 1. What kind of system do you have ??
>>
>> We have HP ProLiant DL380 G7 (8 servers) with 2
>> processors each. So we have 16 processors and the total memory is
>> shared by all the processors.
>>
>> 2. sh ??? What did you specify in siteconfig when
>> configuring the parallel environment ??? shared memory or non-shared
>> memory ??
>> During the site configuration, I have used shared
>> memory architecture.
>>
>> 3. *are your nodes really called "cpu1", ...*
>> *
>>
>> *
>> I have used the 'top' command on terminal, it gives the
>> performance of all the processors. It gives the name of each
>> processor as cpu1, cpu2, cpu3,........ so I
>> have taken it as such.
>>
>> Please suggest me the correct .machines file or any other
>> solution to solve this problem.
>>
>> With kind regards,
>>
>> On Thu, Jul 26, 2012 at 2:25 PM, Peter Blaha
>> <pblaha at theochem.tuwien.ac.at <mailto:pblaha at theochem.tuwien.ac.at>
>> <mailto:pblaha at theochem.__tuwien.ac.at
>> <mailto:pblaha at theochem.tuwien.ac.at>>> wrote:
>>
>> You seem to have several errors in your basic installation:
>>
>>
>> > setenv USE_REMOTE 0
>> > setenv MPI_REMOTE 0
>>
>> > [arya:01254] filem:rsh: copy(): Error: File type unknown
>>
>> rsh ??? What did you specify in siteconfig when
>> configuring the parallel environment ???
>>
>> shared memory or non-shared memory ??
>> ssh or rsh ?? (most likely rsh will not work on
>> most systems)
>>
>> What kind of system do you have ??
>>
>> a) Is it ONE computer with many cores (typically some
>> SGI or IBM-power machines, or a SINGLE Computer
>> with 2-4 Xeon-quadcore
>> processors), or
>> b) a "cluster" (connected via Infiniband) of several
>> (Xeon multicore) nodes
>>
>> Only a) is a "shared memory machine" and you can set
>> USE_REMOTE to 0
>>
>> Another problem might be your .machines file:
>> are your nodes really called "cpu1", ...
>>
>> This implies more or less that you have a cluster of
>> single-core machines ???
>>
>> My guess is that you have a 16 core shared memory
>> machine ???
>> In this case, the .machines file must always contain
>> the same "correct" machine name
>> (or maybe "localhost"), but not cpu1,2....
>>
>>
>> Am 26.07.2012 10 <tel:26.07.2012%2010>:17, schrieb alpa
>> dashora:
>>
>>
>> Dear Wien2k Users and Prof. Marks,
>>
>> Thankyou very much for your reply. I am giving more
>> information.
>> Wien2k Version: Wien2k_11.1 on a 8 processor server
>> each has two nodes.
>> mkl library: 10.0.1.014
>> openmpi: 1.3
>> fftw: 2.1.5
>>
>> My OPTION file is as follows:
>>
>> current:FOPT:-FR -O3 -mp1 -w -prec_div -pc80 -pad
>> -ip -DINTEL_VML -traceback -l/opt/openmpi/include
>> current:FPOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip
>> -traceback
>> current:LDFLAGS:-L/root/____WIEN2k_11/SRC_lib
>> -L/opt/intel/cmkl/10.0.1.014/____lib/em64t
>> <http://10.0.1.014/__lib/em64t> <http://10.0.1.014/lib/em64t>
>> <http://10.0.1.014/lib/em64t> -lmkl_em64t
>>
>> -lmkl_blacs_openmpi_lp64 -lmkl_solver -lguide -lpthread
>> -i-static
>> current:DPARALLEL:'-DParallel'
>> current:R_LIBS:-L/opt/intel/____cmkl/10.0.1.014/lib/em64t
>> <http://10.0.1.014/lib/em64t> <http://10.0.1.014/lib/em64t>
>> <http://10.0.1.014/lib/em64t>
>> -lmkl_scalapack_lp64
>>
>> -lmkl_solver_lp64_sequential -Wl,--start-group
>> -lmkl_intel_lp64
>>
>> -lmkl_sequential -lmkl_core -lmkl_blacs_openmpi_lp64
>> -Wl,--end-group -lpthread -lm -L/opt/openmpi/1.3/lib/ -lmpi_f90
>> -lmpi_f77 -lmpi -lopen-rte -lopen-pal -ldl
>> -Wl,--export-dynamic -lnsl -lutil -limf
>> -L/opt/fftw-2.1.5/lib/lib/ -lfftw_mpi -lrfftw_mpi -lfftw -lrfftw
>> current:RP_LIBS:-L/opt/intel/____cmkl/10.0.1.014/lib/em64t
>> <http://10.0.1.014/lib/em64t> <http://10.0.1.014/lib/em64t>
>> <http://10.0.1.014/lib/em64t>
>> -lmkl_scalapack_lp64
>>
>> -lmkl_solver_lp64_sequential -Wl,--start-group
>> -lmkl_intel_lp64
>>
>> -lmkl_sequential -lmkl_core -lmkl_blacs_openmpi_lp64
>> -Wl,--end-group -lpthread -lm -L/opt/openmpi/1.3/lib/ -lmpi_f90
>> -lmpi_f77 -lmpi -lopen-rte -lopen-pal -ldl
>> -Wl,--export-dynamic -lnsl -lutil -limf
>> -L/opt/fftw-2.1.5/lib/lib/ -lfftw_mpi -lrfftw_mpi -lfftw -lrfftw
>> current:MPIRUN:/opt/openmpi/1.____3/bin/mpirun -v -n
>> _NP_ _EXEC_
>>
>>
>> My parallel_option file is as follows:
>>
>> setenv USE_REMOTE 0
>> setenv MPI_REMOTE 0
>> setenv WIEN_GRANULARITY 1
>> setenv WIEN_MPIRUN "/opt/openmpi/1.3/bin/mpirun -v
>> -n _NP_ -machinefile _HOSTS_ _EXEC_"
>>
>> On the compilation no error message was received and
>> all the executable files are generated. I have edited parallel_option
>> file, so now the error message is changed
>> and it is as
>> follows:
>>
>> [arya:01254] filem:rsh: copy(): Error: File type
>> unknown
>> ssh: cpu1: Name or service not known
>>
>> ------------------------------____----------------------------__--__--------------
>>
>> A daemon (pid 9385) died unexpectedly with status
>> 255 while attempting
>> to launch so we are aborting.
>>
>> There may be more information reported by the
>> environment (see above).
>>
>> This may be because the daemon was unable to find
>> all the needed shared
>> libraries on the remote node. You may set your
>> LD_LIBRARY_PATH to have the
>> location of the shared libraries on the remote nodes
>> and this will
>> automatically be forwarded to the remote nodes.
>> ------------------------------____----------------------------__--__--------------
>> ------------------------------____----------------------------__--__--------------
>>
>> mpirun noticed that the job aborted, but has no info
>> as to the process
>> that caused that situation.
>> ------------------------------____----------------------------__--__--------------
>>
>> ssh: cpu2: Name or service not known
>>
>> ssh: cpu3: Name or service not known
>>
>> ssh: cpu4: Name or service not known
>>
>> mpirun: clean termination accomplished
>>
>> LAPW1 - Error
>> LAPW1 - Error
>> LAPW1 - Error
>> LAPW1 - Error
>> LAPW1 - Error
>> LAPW1 - Error
>> LAPW1 - Error
>>
>> I have used the following .machines file for 16
>> k-points:
>>
>> granularity:1
>> 1:cpu1
>> 1:cpu2
>> 1:cpu3
>> 1:cpu4
>> 1:cpu5
>> 1:cpu6
>> 1:cpu7
>> 1:cpu8
>> 1:cpu9
>> 1:cpu10
>> 1:cpu11
>> 1:cpu12
>> 1:cpu13
>> 1:cpu14
>> 1:cpu15
>> 1:cpu16
>> extrafine:1
>> lapw0: cpu1:1 cpu2:1 cpu3:1 cpu4:1
>>
>> Please any one suggest me the solution of this problem.
>>
>> With kind regards,
>>
>>
>> On Mon, Jul 23, 2012 at 4:50 PM, Laurence Marks
>> <L-marks at northwestern.edu <mailto:L-marks at northwestern.edu>
>> <mailto:L-marks at northwestern.__edu
>> <mailto:L-marks at northwestern.edu>>
>> <mailto:L-marks at northwestern. <mailto:L-marks at northwestern.>____edu
>>
>> <mailto:L-marks at northwestern.__edu
>> <mailto:L-marks at northwestern.edu>>>> wrote:
>>
>> You probably have an incorrect MPIRUN
>> environmental parameter. You have not provided enough information,
>> and need to do a bit more analysis yourself.
>>
>> ---------------------------
>> Professor Laurence Marks
>> Department of Materials Science and Engineering
>> Northwestern University
>> www.numis.northwestern.edu
>> <http://www.numis.northwestern.edu>
>> <http://www.numis.__northwestern.edu
>> <http://www.numis.northwestern.edu>>
>> <http://www.numis.__northweste__rn.edu
>> <http://northwestern.edu> <http://www.numis.__northwestern.edu
>> <http://www.numis.northwestern.edu>>> 1-847-491-3996
>> <tel:1-847-491-3996>
>>
>>
>> "Research is to see what everybody else has
>> seen, and to think what nobody else has thought"
>> Albert Szent-Gyorgi
>>
>> On Jul 23, 2012 6:17 AM, "alpa dashora"
>> <dashoralpa at gmail.com <mailto:dashoralpa at gmail.com>
>> <mailto:dashoralpa at gmail.com <mailto:dashoralpa at gmail.com>>
>> <mailto:dashoralpa at gmail.com <mailto:dashoralpa at gmail.com>
>> <mailto:dashoralpa at gmail.com <mailto:dashoralpa at gmail.com>>>__> wrote:
>>
>> Dear Wien2k Users,
>>
>> I recently installed Wien2k with openmpi on
>> 16 processor server. Installation was completed without any
>> compilation error. While running the run_lapw -p
>> command, I
>> received the following error:
>> ------------------------------____----------------------------__--__--------------------------__----__------------------------__------__------
>>
>>
>> mpirun was unable to launch the specified
>> application as it could not find an executable:
>>
>> Executable:-4
>> Node: arya
>>
>> while attempting to start process rank 0.
>> ------------------------------____----------------------------__--__--------------------------__----__------------------------__------__-------
>>
>>
>> Kindly suggest me the solution.
>> mpirun is available in /opt/openmpi/1.3/bin
>>
>> Thank you in advance.
>>
>> Regards,
>>
>> --
>> Dr. Alpa Dashora
>>
>>
>> ___________________________________________________
>> Wien mailing list
>> Wien at zeus.theochem.tuwien.ac.____at
>> <mailto:Wien at zeus.theochem.__tuwien.ac.at
>> <mailto:Wien at zeus.theochem.tuwien.ac.at>> <mailto:Wien at zeus.theochem.
>> <mailto:Wien at zeus.theochem.>__t__uwien.ac.at
>> <http://tuwien.ac.at> <mailto:Wien at zeus.theochem.__tuwien.ac.at
>> <mailto:Wien at zeus.theochem.tuwien.ac.at>>>
>>
>> http://zeus.theochem.tuwien.____ac.at/mailman/listinfo/wien
>> <http://ac.at/mailman/listinfo/wien>
>> <http://zeus.theochem.tuwien.__ac.at/mailman/listinfo/wien
>> <http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien>>
>>
>>
>>
>>
>> --
>> Alpa Dashora
>>
>>
>> ___________________________________________________
>> Wien mailing list
>> Wien at zeus.theochem.tuwien.ac.____at
>> <mailto:Wien at zeus.theochem.__tuwien.ac.at
>> <mailto:Wien at zeus.theochem.tuwien.ac.at>>
>> http://zeus.theochem.tuwien.____ac.at/mailman/listinfo/wien
>> <http://ac.at/mailman/listinfo/wien>
>> <http://zeus.theochem.tuwien.__ac.at/mailman/listinfo/wien
>> <http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien>>
>>
>>
>> --
>>
>> P.Blaha
>> ------------------------------____----------------------------__--__--------------
>>
>> Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna,
>> A-1060 Vienna
>> Phone: +43-1-58801-165300
>> <tel:%2B43-1-58801-165300> FAX: +43-1-58801-165982
>> <tel:%2B43-1-58801-165982>
>> Email: blaha at theochem.tuwien.ac.at
>> <mailto:blaha at theochem.tuwien.ac.at>
>> <mailto:blaha at theochem.tuwien.__ac.at
>> <mailto:blaha at theochem.tuwien.ac.at>> WWW:
>> http://info.tuwien.ac.at/____theochem/
>> <http://info.tuwien.ac.at/__theochem/>
>> <http://info.tuwien.ac.at/__theochem/
>> <http://info.tuwien.ac.at/theochem/>>
>> ------------------------------____----------------------------__--__--------------
>>
>> ___________________________________________________
>> Wien mailing list
>> Wien at zeus.theochem.tuwien.ac.____at
>> <mailto:Wien at zeus.theochem.__tuwien.ac.at
>> <mailto:Wien at zeus.theochem.tuwien.ac.at>>
>> http://zeus.theochem.tuwien.____ac.at/mailman/listinfo/wien
>> <http://ac.at/mailman/listinfo/wien>
>> <http://zeus.theochem.tuwien.__ac.at/mailman/listinfo/wien
>> <http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien>>
>>
>>
>>
>>
>>
>> --
>> Alpa Dashora
>>
>>
>> _________________________________________________
>> Wien mailing list
>> Wien at zeus.theochem.tuwien.ac.__at
>> <mailto:Wien at zeus.theochem.tuwien.ac.at>
>> http://zeus.theochem.tuwien.__ac.at/mailman/listinfo/wien
>> <http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien>
>>
>>
>> --
>> ------------------------------__-----------
>> Peter Blaha
>> Inst. Materials Chemistry, TU Vienna
>> Getreidemarkt 9, A-1060 Vienna, Austria
>> Tel: +43-1-5880115671
>> Fax: +43-1-5880115698
>> email: pblaha at theochem.tuwien.ac.at
>> <mailto:pblaha at theochem.tuwien.ac.at>
>> ------------------------------__-----------
>>
>>
>>
>> _________________________________________________
>> Wien mailing list
>> Wien at zeus.theochem.tuwien.ac.__at
>> <mailto:Wien at zeus.theochem.tuwien.ac.at>
>> http://zeus.theochem.tuwien.__ac.at/mailman/listinfo/wien
>> <http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien>
>>
>>
>>
>>
>> --
>> Alpa Dashora
>>
>>
>> _______________________________________________
>> Wien mailing list
>> Wien at zeus.theochem.tuwien.ac.at
>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>>
>
More information about the Wien
mailing list