[Wien] Error while parallel run

Gavin Abo gsabo at crimson.ua.edu
Mon Jul 30 10:21:58 CEST 2012


Previously, you were supposedly using

/opt/openmpi/1.3

that seemed to work fine.  So wondering why you are now using:

/opt/openmpi-1.4.5

One HP ProLiant DL380 G7 server box 
(http://h18004.www1.hp.com/products/quickspecs/13595_div/13595_div.html) 
has one or two processors each with 2, 4, or 6 cores.  You previously 
indicated that you have the two processor model, but you have not 
mentioned the number of cores.  Each server has its own memory.  You 
mentioned having 8 server boxes. Do all 8 server boxes have the same 
number of processors and cores?  Without this information, no one knows 
what your .machines might need to be.

On 7/30/2012 1:02 AM, Peter Blaha wrote:
> First: If you are unexperienced in computing, why would you use mpi at 
> all.
> Try the k-point parallel version first.
>
> .machines:
> 1:arya
> 1:arya
> 1:arya
> ....
>
> no lapw0 line !!
>
> Am 30.07.2012 08:58, schrieb alpa dashora:
>> Dear Wien2k Users, Mr. Abo and Prof. Blaha,
>>
>> I have edited my .machines file with the correct cpu name. My new 
>> .machines file is as follows:
>> granularity:1
>> 1:arya:2
>> 1:arya:2
>> 1:arya:2
>> 1:arya:2
>> 1:arya:2
>> 1:arya:2
>> 1:arya:2
>> 1:arya:2
>> extrafine:1
>> lapw0: arya:2 arya:2
>>
>> After run_lapw -p: it gives the following error message:
>> exe: MPI_Init: MPI_Root is not set
>> exe: MPI_Init: Cannot set mpirun startup protocol
>>
>> Than, I have set the MPI_ROOT as:
>> export MPI_ROOT=/opt/openmpi-1.4.5
>>
>> After export MPI_ROOT the following error was received:
>>
>> exe: MPI_INIT: Can't read plugin directory 
>> /opt/openmpi-1.4.5/lib/linux_amd64/plugins
>> exe: MPI_Init: No plugins will be available
>>
>> I didnt have any idea about openmpi, please tell me how to solve this 
>> error. Please also comment on the .machines file.
>>
>> With kind regards,
>>
>>
>>
>> On Fri, Jul 27, 2012 at 11:02 AM, Peter Blaha 
>> <pblaha at theochem.tuwien.ac.at <mailto:pblaha at theochem.tuwien.ac.at>> 
>> wrote:
>>
>>     How should I know the correct name of your computer ???
>>
>>     When you login to the machine, what are you using ??? Most 
>> likely, this will be the correct name.
>>
>>     If it is a shared memory machine you should use the same name for 
>> all
>>     processes.
>>
>>     Am 26.07.2012 19:45, schrieb alpa dashora:
>>
>>         Dear Prof. Blaha, Prof. Marks and All Wien2k users,
>>
>>         Thank you very much for reply. I have given the more detail 
>> of my system as you required:
>>
>>         1. What kind of system do you have ??
>>
>>               We have HP ProLiant DL380 G7 (8 servers) with 2 
>> processors each. So we have 16 processors and the total memory is 
>> shared by all the processors.
>>
>>         2. sh ???   What did you specify in siteconfig when 
>> configuring the parallel environment ??? shared memory or non-shared 
>> memory  ??
>>               During the site configuration, I have used shared 
>> memory architecture.
>>
>>         3. *are your nodes really called "cpu1", ...*
>>         *
>>
>>         *
>>              I have used the 'top' command on terminal, it gives the 
>> performance of all the processors. It gives the name of each 
>> processor as cpu1, cpu2,      cpu3,........ so I
>>         have taken it as such.
>>
>>         Please suggest me the correct .machines file or any other 
>> solution to solve this problem.
>>
>>         With kind regards,
>>
>>         On Thu, Jul 26, 2012 at 2:25 PM, Peter Blaha 
>> <pblaha at theochem.tuwien.ac.at <mailto:pblaha at theochem.tuwien.ac.at> 
>> <mailto:pblaha at theochem.__tuwien.ac.at
>>         <mailto:pblaha at theochem.tuwien.ac.at>>> wrote:
>>
>>              You seem to have several errors in your basic installation:
>>
>>
>>               > setenv USE_REMOTE 0
>>               > setenv MPI_REMOTE 0
>>
>>               > [arya:01254] filem:rsh: copy(): Error: File type unknown
>>
>>              rsh ???   What did you specify in siteconfig when 
>> configuring the parallel environment ???
>>
>>              shared memory or non-shared memory  ??
>>              ssh  or  rsh  ??    (most likely rsh will not work on 
>> most systems)
>>
>>              What kind of system do you have ??
>>
>>              a) Is it ONE computer with many cores (typically some 
>> SGI or IBM-power machines, or a SINGLE Computer
>>                                               with 2-4 Xeon-quadcore 
>> processors), or
>>              b) a "cluster" (connected via Infiniband) of several 
>> (Xeon multicore) nodes
>>
>>              Only a) is a "shared memory machine" and you can set 
>> USE_REMOTE to 0
>>
>>              Another problem might be your   .machines file:
>>              are your nodes really called "cpu1", ...
>>
>>              This implies more or less that you have a cluster of 
>> single-core machines ???
>>
>>              My guess is that you have a 16 core shared memory 
>> machine ???
>>              In this case, the  .machines file must always contain 
>> the same "correct" machine name
>>              (or maybe "localhost"), but not cpu1,2....
>>
>>
>>              Am 26.07.2012 10 <tel:26.07.2012%2010>:17, schrieb alpa 
>> dashora:
>>
>>
>>                  Dear Wien2k Users and Prof. Marks,
>>
>>                  Thankyou very much for your reply. I am giving more 
>> information.
>>                  Wien2k Version: Wien2k_11.1 on a 8 processor server 
>> each has two nodes.
>>                  mkl library: 10.0.1.014
>>                  openmpi: 1.3
>>                  fftw: 2.1.5
>>
>>                  My OPTION file is as follows:
>>
>>                  current:FOPT:-FR -O3 -mp1 -w -prec_div -pc80 -pad 
>> -ip -DINTEL_VML -traceback -l/opt/openmpi/include
>>                  current:FPOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip 
>> -traceback
>>                  current:LDFLAGS:-L/root/____WIEN2k_11/SRC_lib 
>> -L/opt/intel/cmkl/10.0.1.014/____lib/em64t 
>> <http://10.0.1.014/__lib/em64t> <http://10.0.1.014/lib/em64t>
>>         <http://10.0.1.014/lib/em64t> -lmkl_em64t
>>
>>                  -lmkl_blacs_openmpi_lp64 -lmkl_solver -lguide -lpthread
>>                  -i-static
>>                  current:DPARALLEL:'-DParallel'
>> current:R_LIBS:-L/opt/intel/____cmkl/10.0.1.014/lib/em64t 
>> <http://10.0.1.014/lib/em64t> <http://10.0.1.014/lib/em64t> 
>> <http://10.0.1.014/lib/em64t>
>>         -lmkl_scalapack_lp64
>>
>>                  -lmkl_solver_lp64_sequential -Wl,--start-group 
>> -lmkl_intel_lp64
>>
>>                  -lmkl_sequential -lmkl_core -lmkl_blacs_openmpi_lp64 
>> -Wl,--end-group -lpthread -lm -L/opt/openmpi/1.3/lib/ -lmpi_f90 
>> -lmpi_f77 -lmpi -lopen-rte -lopen-pal -ldl
>>                  -Wl,--export-dynamic -lnsl -lutil -limf 
>> -L/opt/fftw-2.1.5/lib/lib/ -lfftw_mpi -lrfftw_mpi -lfftw -lrfftw
>> current:RP_LIBS:-L/opt/intel/____cmkl/10.0.1.014/lib/em64t 
>> <http://10.0.1.014/lib/em64t> <http://10.0.1.014/lib/em64t> 
>> <http://10.0.1.014/lib/em64t>
>>         -lmkl_scalapack_lp64
>>
>>                  -lmkl_solver_lp64_sequential -Wl,--start-group 
>> -lmkl_intel_lp64
>>
>>                  -lmkl_sequential -lmkl_core -lmkl_blacs_openmpi_lp64 
>> -Wl,--end-group -lpthread -lm -L/opt/openmpi/1.3/lib/ -lmpi_f90 
>> -lmpi_f77 -lmpi -lopen-rte -lopen-pal -ldl
>>                  -Wl,--export-dynamic -lnsl -lutil -limf 
>> -L/opt/fftw-2.1.5/lib/lib/ -lfftw_mpi -lrfftw_mpi -lfftw -lrfftw
>>                  current:MPIRUN:/opt/openmpi/1.____3/bin/mpirun -v -n 
>> _NP_ _EXEC_
>>
>>
>>                  My parallel_option file is as follows:
>>
>>                  setenv USE_REMOTE 0
>>                  setenv MPI_REMOTE 0
>>                  setenv WIEN_GRANULARITY 1
>>                  setenv WIEN_MPIRUN "/opt/openmpi/1.3/bin/mpirun -v 
>> -n _NP_ -machinefile _HOSTS_ _EXEC_"
>>
>>                  On the compilation no error message was received and 
>> all the executable files are generated. I have edited parallel_option 
>> file, so now the error message is changed
>>                  and it is as
>>                  follows:
>>
>>                  [arya:01254] filem:rsh: copy(): Error: File type 
>> unknown
>>                  ssh: cpu1: Name or service not known
>>
>> ------------------------------____----------------------------__--__--------------
>>
>>                  A daemon (pid 9385) died unexpectedly with status 
>> 255 while attempting
>>                  to launch so we are aborting.
>>
>>                  There may be more information reported by the 
>> environment (see above).
>>
>>                  This may be because the daemon was unable to find 
>> all the needed shared
>>                  libraries on the remote node. You may set your 
>> LD_LIBRARY_PATH to have the
>>                  location of the shared libraries on the remote nodes 
>> and this will
>>                  automatically be forwarded to the remote nodes.
>> ------------------------------____----------------------------__--__--------------
>> ------------------------------____----------------------------__--__--------------
>>
>>                  mpirun noticed that the job aborted, but has no info 
>> as to the process
>>                  that caused that situation.
>> ------------------------------____----------------------------__--__--------------
>>
>>                  ssh: cpu2: Name or service not known
>>
>>                  ssh: cpu3: Name or service not known
>>
>>                  ssh: cpu4: Name or service not known
>>
>>                  mpirun: clean termination accomplished
>>
>>                  LAPW1 - Error
>>                  LAPW1 - Error
>>                  LAPW1 - Error
>>                  LAPW1 - Error
>>                  LAPW1 - Error
>>                  LAPW1 - Error
>>                  LAPW1 - Error
>>
>>                  I have used the following .machines file for 16 
>> k-points:
>>
>>                  granularity:1
>>                  1:cpu1
>>                  1:cpu2
>>                  1:cpu3
>>                  1:cpu4
>>                  1:cpu5
>>                  1:cpu6
>>                  1:cpu7
>>                  1:cpu8
>>                  1:cpu9
>>                  1:cpu10
>>                  1:cpu11
>>                  1:cpu12
>>                  1:cpu13
>>                  1:cpu14
>>                  1:cpu15
>>                  1:cpu16
>>                  extrafine:1
>>                  lapw0: cpu1:1 cpu2:1 cpu3:1 cpu4:1
>>
>>                  Please any one suggest me the solution of this problem.
>>
>>                  With kind regards,
>>
>>
>>                  On Mon, Jul 23, 2012 at 4:50 PM, Laurence Marks 
>> <L-marks at northwestern.edu <mailto:L-marks at northwestern.edu> 
>> <mailto:L-marks at northwestern.__edu
>>         <mailto:L-marks at northwestern.edu>> 
>> <mailto:L-marks at northwestern. <mailto:L-marks at northwestern.>____edu
>>
>>                  <mailto:L-marks at northwestern.__edu 
>> <mailto:L-marks at northwestern.edu>>>> wrote:
>>
>>                       You probably have an incorrect MPIRUN 
>> environmental parameter. You have not provided enough information, 
>> and need to do a bit more analysis yourself.
>>
>>                       ---------------------------
>>                       Professor Laurence Marks
>>                       Department of Materials Science and Engineering
>>                       Northwestern University
>>         www.numis.northwestern.edu 
>> <http://www.numis.northwestern.edu> 
>> <http://www.numis.__northwestern.edu 
>> <http://www.numis.northwestern.edu>>
>>         <http://www.numis.__northweste__rn.edu 
>> <http://northwestern.edu> <http://www.numis.__northwestern.edu 
>> <http://www.numis.northwestern.edu>>> 1-847-491-3996
>>                  <tel:1-847-491-3996>
>>
>>
>>                       "Research is to see what everybody else has 
>> seen, and to think what nobody else has thought"
>>                       Albert Szent-Gyorgi
>>
>>                       On Jul 23, 2012 6:17 AM, "alpa dashora" 
>> <dashoralpa at gmail.com <mailto:dashoralpa at gmail.com> 
>> <mailto:dashoralpa at gmail.com <mailto:dashoralpa at gmail.com>>
>>         <mailto:dashoralpa at gmail.com <mailto:dashoralpa at gmail.com> 
>> <mailto:dashoralpa at gmail.com <mailto:dashoralpa at gmail.com>>>__> wrote:
>>
>>                           Dear Wien2k Users,
>>
>>                           I recently installed Wien2k with openmpi on 
>> 16 processor server. Installation was completed without any 
>> compilation error. While running the run_lapw -p
>>                  command, I
>>                           received the following error:
>> ------------------------------____----------------------------__--__--------------------------__----__------------------------__------__------
>>
>>
>>                           mpirun was unable to launch the specified 
>> application as it could not find an executable:
>>
>>                           Executable:-4
>>                           Node: arya
>>
>>                           while attempting to start process rank 0.
>> ------------------------------____----------------------------__--__--------------------------__----__------------------------__------__-------
>>
>>
>>                           Kindly suggest me the solution.
>>                           mpirun is available in /opt/openmpi/1.3/bin
>>
>>                           Thank you in advance.
>>
>>                           Regards,
>>
>>                           --
>>                           Dr. Alpa Dashora
>>
>>
>> ___________________________________________________
>>                       Wien mailing list
>>                  Wien at zeus.theochem.tuwien.ac.____at 
>> <mailto:Wien at zeus.theochem.__tuwien.ac.at 
>> <mailto:Wien at zeus.theochem.tuwien.ac.at>> <mailto:Wien at zeus.theochem.
>>         <mailto:Wien at zeus.theochem.>__t__uwien.ac.at 
>> <http://tuwien.ac.at> <mailto:Wien at zeus.theochem.__tuwien.ac.at 
>> <mailto:Wien at zeus.theochem.tuwien.ac.at>>>
>>
>> http://zeus.theochem.tuwien.____ac.at/mailman/listinfo/wien 
>> <http://ac.at/mailman/listinfo/wien> 
>> <http://zeus.theochem.tuwien.__ac.at/mailman/listinfo/wien
>> <http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien>>
>>
>>
>>
>>
>>                  --
>>                  Alpa Dashora
>>
>>
>> ___________________________________________________
>>                  Wien mailing list
>>                  Wien at zeus.theochem.tuwien.ac.____at 
>> <mailto:Wien at zeus.theochem.__tuwien.ac.at 
>> <mailto:Wien at zeus.theochem.tuwien.ac.at>>
>> http://zeus.theochem.tuwien.____ac.at/mailman/listinfo/wien 
>> <http://ac.at/mailman/listinfo/wien> 
>> <http://zeus.theochem.tuwien.__ac.at/mailman/listinfo/wien
>> <http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien>>
>>
>>
>>              --
>>
>>                                                     P.Blaha
>> ------------------------------____----------------------------__--__--------------
>>
>>              Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, 
>> A-1060 Vienna
>>              Phone: +43-1-58801-165300 
>> <tel:%2B43-1-58801-165300>             FAX: +43-1-58801-165982 
>> <tel:%2B43-1-58801-165982>
>>              Email: blaha at theochem.tuwien.ac.at 
>> <mailto:blaha at theochem.tuwien.ac.at> 
>> <mailto:blaha at theochem.tuwien.__ac.at 
>> <mailto:blaha at theochem.tuwien.ac.at>>    WWW:
>>         http://info.tuwien.ac.at/____theochem/ 
>> <http://info.tuwien.ac.at/__theochem/> 
>> <http://info.tuwien.ac.at/__theochem/ 
>> <http://info.tuwien.ac.at/theochem/>>
>> ------------------------------____----------------------------__--__--------------
>>
>>              ___________________________________________________
>>              Wien mailing list
>>              Wien at zeus.theochem.tuwien.ac.____at 
>> <mailto:Wien at zeus.theochem.__tuwien.ac.at 
>> <mailto:Wien at zeus.theochem.tuwien.ac.at>>
>> http://zeus.theochem.tuwien.____ac.at/mailman/listinfo/wien 
>> <http://ac.at/mailman/listinfo/wien> 
>> <http://zeus.theochem.tuwien.__ac.at/mailman/listinfo/wien
>> <http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien>>
>>
>>
>>
>>
>>
>>         --
>>         Alpa Dashora
>>
>>
>>         _________________________________________________
>>         Wien mailing list
>>         Wien at zeus.theochem.tuwien.ac.__at 
>> <mailto:Wien at zeus.theochem.tuwien.ac.at>
>> http://zeus.theochem.tuwien.__ac.at/mailman/listinfo/wien 
>> <http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien>
>>
>>
>>     --
>>     ------------------------------__-----------
>>     Peter Blaha
>>     Inst. Materials Chemistry, TU Vienna
>>     Getreidemarkt 9, A-1060 Vienna, Austria
>>     Tel: +43-1-5880115671
>>     Fax: +43-1-5880115698
>>     email: pblaha at theochem.tuwien.ac.at 
>> <mailto:pblaha at theochem.tuwien.ac.at>
>>     ------------------------------__-----------
>>
>>
>>
>>     _________________________________________________
>>     Wien mailing list
>>     Wien at zeus.theochem.tuwien.ac.__at 
>> <mailto:Wien at zeus.theochem.tuwien.ac.at>
>>     http://zeus.theochem.tuwien.__ac.at/mailman/listinfo/wien 
>> <http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien>
>>
>>
>>
>>
>> -- 
>> Alpa Dashora
>>
>>
>> _______________________________________________
>> Wien mailing list
>> Wien at zeus.theochem.tuwien.ac.at
>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>>
>



More information about the Wien mailing list