[Wien] Error while parallel run

alpa dashora dashoralpa at gmail.com
Thu Jul 26 19:45:20 CEST 2012


Dear Prof. Blaha, Prof. Marks and All Wien2k users,

Thank you very much for reply. I have given the more detail of my system as
you required:

1. What kind of system do you have ??

    We have HP ProLiant DL380 G7 (8 servers) with 2 processors each. So we
have 16 processors and the total memory is shared by all the processors.

2. sh ???   What did you specify in siteconfig when configuring the
parallel environment ??? shared memory or non-shared memory  ??

    During the site configuration, I have used shared memory architecture.

3. *are your nodes really called "cpu1", ...*
*
*
   I have used the 'top' command on terminal, it gives the performance of
all the processors. It gives the name of each processor as cpu1, cpu2,
 cpu3,........ so I have taken it as such.

Please suggest me the correct .machines file or any other solution to solve
this problem.

With kind regards,

On Thu, Jul 26, 2012 at 2:25 PM, Peter Blaha
<pblaha at theochem.tuwien.ac.at>wrote:

> You seem to have several errors in your basic installation:
>
>
> > setenv USE_REMOTE 0
> > setenv MPI_REMOTE 0
>
> > [arya:01254] filem:rsh: copy(): Error: File type unknown
>
> rsh ???   What did you specify in siteconfig when configuring the parallel
> environment ???
>
> shared memory or non-shared memory  ??
> ssh  or  rsh  ??    (most likely rsh will not work on most systems)
>
> What kind of system do you have ??
>
> a) Is it ONE computer with many cores (typically some SGI or IBM-power
> machines, or a SINGLE Computer
>                                 with 2-4 Xeon-quadcore processors), or
> b) a "cluster" (connected via Infiniband) of several (Xeon multicore) nodes
>
> Only a) is a "shared memory machine" and you can set USE_REMOTE to 0
>
> Another problem might be your   .machines file:
> are your nodes really called "cpu1", ...
>
> This implies more or less that you have a cluster of single-core machines
> ???
>
> My guess is that you have a 16 core shared memory machine ???
> In this case, the  .machines file must always contain the same "correct"
> machine name
> (or maybe "localhost"), but not cpu1,2....
>
>
> Am 26.07.2012 10:17, schrieb alpa dashora:
>
>> Dear Wien2k Users and Prof. Marks,
>>
>> Thankyou very much for your reply. I am giving more information.
>> Wien2k Version: Wien2k_11.1 on a 8 processor server each has two nodes.
>> mkl library: 10.0.1.014
>> openmpi: 1.3
>> fftw: 2.1.5
>>
>> My OPTION file is as follows:
>>
>> current:FOPT:-FR -O3 -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML
>> -traceback -l/opt/openmpi/include
>> current:FPOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -traceback
>> current:LDFLAGS:-L/root/**WIEN2k_11/SRC_lib -L/opt/intel/cmkl/10.0.1.014/
>> **lib/em64t <http://10.0.1.014/lib/em64t> <http://10.0.1.014/lib/em64t>
>> -lmkl_em64t -lmkl_blacs_openmpi_lp64 -lmkl_solver -lguide -lpthread
>> -i-static
>> current:DPARALLEL:'-DParallel'
>> current:R_LIBS:-L/opt/intel/**cmkl/10.0.1.014/lib/em64t <
>> http://10.0.1.014/lib/em64t> -lmkl_scalapack_lp64
>> -lmkl_solver_lp64_sequential -Wl,--start-group -lmkl_intel_lp64
>>
>> -lmkl_sequential -lmkl_core -lmkl_blacs_openmpi_lp64 -Wl,--end-group
>> -lpthread -lm -L/opt/openmpi/1.3/lib/ -lmpi_f90 -lmpi_f77 -lmpi -lopen-rte
>> -lopen-pal -ldl
>> -Wl,--export-dynamic -lnsl -lutil -limf -L/opt/fftw-2.1.5/lib/lib/
>> -lfftw_mpi -lrfftw_mpi -lfftw -lrfftw
>> current:RP_LIBS:-L/opt/intel/**cmkl/10.0.1.014/lib/em64t <
>> http://10.0.1.014/lib/em64t> -lmkl_scalapack_lp64
>> -lmkl_solver_lp64_sequential -Wl,--start-group -lmkl_intel_lp64
>>
>> -lmkl_sequential -lmkl_core -lmkl_blacs_openmpi_lp64 -Wl,--end-group
>> -lpthread -lm -L/opt/openmpi/1.3/lib/ -lmpi_f90 -lmpi_f77 -lmpi -lopen-rte
>> -lopen-pal -ldl
>> -Wl,--export-dynamic -lnsl -lutil -limf -L/opt/fftw-2.1.5/lib/lib/
>> -lfftw_mpi -lrfftw_mpi -lfftw -lrfftw
>> current:MPIRUN:/opt/openmpi/1.**3/bin/mpirun -v -n _NP_ _EXEC_
>>
>> My parallel_option file is as follows:
>>
>> setenv USE_REMOTE 0
>> setenv MPI_REMOTE 0
>> setenv WIEN_GRANULARITY 1
>> setenv WIEN_MPIRUN "/opt/openmpi/1.3/bin/mpirun -v -n _NP_ -machinefile
>> _HOSTS_ _EXEC_"
>>
>> On the compilation no error message was received and all the executable
>> files are generated. I have edited parallel_option file, so now the error
>> message is changed and it is as
>> follows:
>>
>> [arya:01254] filem:rsh: copy(): Error: File type unknown
>> ssh: cpu1: Name or service not known
>>
>> ------------------------------**------------------------------**
>> --------------
>> A daemon (pid 9385) died unexpectedly with status 255 while attempting
>> to launch so we are aborting.
>>
>> There may be more information reported by the environment (see above).
>>
>> This may be because the daemon was unable to find all the needed shared
>> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
>> location of the shared libraries on the remote nodes and this will
>> automatically be forwarded to the remote nodes.
>> ------------------------------**------------------------------**
>> --------------
>> ------------------------------**------------------------------**
>> --------------
>> mpirun noticed that the job aborted, but has no info as to the process
>> that caused that situation.
>> ------------------------------**------------------------------**
>> --------------
>> ssh: cpu2: Name or service not known
>>
>> ssh: cpu3: Name or service not known
>>
>> ssh: cpu4: Name or service not known
>>
>> mpirun: clean termination accomplished
>>
>> LAPW1 - Error
>> LAPW1 - Error
>> LAPW1 - Error
>> LAPW1 - Error
>> LAPW1 - Error
>> LAPW1 - Error
>> LAPW1 - Error
>>
>> I have used the following .machines file for 16 k-points:
>>
>> granularity:1
>> 1:cpu1
>> 1:cpu2
>> 1:cpu3
>> 1:cpu4
>> 1:cpu5
>> 1:cpu6
>> 1:cpu7
>> 1:cpu8
>> 1:cpu9
>> 1:cpu10
>> 1:cpu11
>> 1:cpu12
>> 1:cpu13
>> 1:cpu14
>> 1:cpu15
>> 1:cpu16
>> extrafine:1
>> lapw0: cpu1:1 cpu2:1 cpu3:1 cpu4:1
>>
>> Please any one suggest me the solution of this problem.
>>
>> With kind regards,
>>
>>
>> On Mon, Jul 23, 2012 at 4:50 PM, Laurence Marks <L-marks at northwestern.edu<mailto:
>> L-marks at northwestern.**edu <L-marks at northwestern.edu>>> wrote:
>>
>>     You probably have an incorrect MPIRUN environmental parameter. You
>> have not provided enough information, and need to do a bit more analysis
>> yourself.
>>
>>     ---------------------------
>>     Professor Laurence Marks
>>     Department of Materials Science and Engineering
>>     Northwestern University
>>     www.numis.northwestern.edu <http://www.numis.**northwestern.edu<http://www.numis.northwestern.edu>>
>> 1-847-491-3996
>>
>>     "Research is to see what everybody else has seen, and to think what
>> nobody else has thought"
>>     Albert Szent-Gyorgi
>>
>>     On Jul 23, 2012 6:17 AM, "alpa dashora" <dashoralpa at gmail.com<mailto:
>> dashoralpa at gmail.com>> wrote:
>>
>>         Dear Wien2k Users,
>>
>>         I recently installed Wien2k with openmpi on 16 processor server.
>> Installation was completed without any compilation error. While running the
>> run_lapw -p command, I
>>         received the following error:
>>         ------------------------------**------------------------------**
>> ------------------------------**------------------------------**------
>>
>>         mpirun was unable to launch the specified application as it could
>> not find an executable:
>>
>>         Executable:-4
>>         Node: arya
>>
>>         while attempting to start process rank 0.
>>         ------------------------------**------------------------------**
>> ------------------------------**------------------------------**-------
>>
>>         Kindly suggest me the solution.
>>         mpirun is available in /opt/openmpi/1.3/bin
>>
>>         Thank you in advance.
>>
>>         Regards,
>>
>>         --
>>         Dr. Alpa Dashora
>>
>>
>>     ______________________________**_________________
>>     Wien mailing list
>>     Wien at zeus.theochem.tuwien.ac.**at <Wien at zeus.theochem.tuwien.ac.at><mailto:
>> Wien at zeus.theochem.**tuwien.ac.at <Wien at zeus.theochem.tuwien.ac.at>>
>>
>>     http://zeus.theochem.tuwien.**ac.at/mailman/listinfo/wien<http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien>
>>
>>
>>
>>
>> --
>> Alpa Dashora
>>
>>
>> ______________________________**_________________
>> Wien mailing list
>> Wien at zeus.theochem.tuwien.ac.**at <Wien at zeus.theochem.tuwien.ac.at>
>> http://zeus.theochem.tuwien.**ac.at/mailman/listinfo/wien<http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien>
>>
>>
> --
>
>                                       P.Blaha
> ------------------------------**------------------------------**
> --------------
> Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
> Phone: +43-1-58801-165300             FAX: +43-1-58801-165982
> Email: blaha at theochem.tuwien.ac.at    WWW: http://info.tuwien.ac.at/**
> theochem/ <http://info.tuwien.ac.at/theochem/>
> ------------------------------**------------------------------**
> --------------
>
> ______________________________**_________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.**at <Wien at zeus.theochem.tuwien.ac.at>
> http://zeus.theochem.tuwien.**ac.at/mailman/listinfo/wien<http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien>
>



-- 
Alpa Dashora
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20120726/74fdc490/attachment.htm>


More information about the Wien mailing list