[Wien] MPI error

Peter Blaha pblaha at theochem.tuwien.ac.at
Mon Apr 12 20:33:57 CEST 2021



Am 12.04.2021 um 20:00 schrieb leila mollabashi:
> Dear Prof. Peter Blaha and WIEN2k users,
> 
> Thank you. Now my .machines file is:
> 
> lapw0:e0591:4
> 
> 1:e0591:4
> 
> 1:e0591:4
> 
> granularity:1
> 
> extrafine:1
> 
> I have installed WIEN2k in my user in the cluster. When I use this 
> script “srun --pty /bin/bash” then it goes to one node of the cluster, 
> the “ls -als $WIENROOT/lapw0”, “x lapw0” and “lapw0 lapw0.def” commands 
> are executed but, “x lapw0 –p” is not executed.The following error appears:

At least now the error: "lapw0 not found" is gone. Do you understand why ??

So you opened an interactive session on one node.
How many slots do you get by this srun command ?
Is this the node with the name  e0591  ???
Of course the .machines file must be consistent (dynamically adapted) 
with the actual nodename.

As you could see at the bottom of the message, the command

x lapw0 -p

creates   lapw0.def and .machine0, but then executes

mpirun -np 4 -machinefile .machine0


Somewhere there should be some documentation how one runs an mpi job on 
your system.

It is almost impossible to solve this from outside. All we can do is 
giving some tips.

> 
> There are not enough slots available in the system to satisfy the 4
> 
> slots that were requested by the application:
> 
>    /home/users/mollabashi/v19.2/lapw0_mpi
> 
> Either request fewer slots for your application, or make more slots
> 
> available for use.
> 
> A "slot" is the Open MPI term for an allocatable unit where we can
> 
> launch a process.  The number of slots available are defined by the
> 
> environment in which Open MPI processes are run:
> 
>    1. Hostfile, via "slots=N" clauses (N defaults to number of
> 
>       processor cores if not provided)
> 
>    2. The --host command line parameter, via a ":N" suffix on the
> 
>       hostname (N defaults to 1 if not provided)
> 
>    3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)
> 
>    4. If none of a hostfile, the --host command line parameter, or an
> 
>       RM is present, Open MPI defaults to the number of processor cores
> 
> In all the above cases, if you want Open MPI to default to the number
> 
> of hardware threads instead of the number of processor cores, use the
> 
> --use-hwthread-cpus option.
> 
> Alternatively, you can use the --oversubscribe option to ignore the
> 
> number of available slots when deciding the number of processes to
> 
> launch.
> 
> --------------------------------------------------------------------------
> 
> [1]    Exit 1                        mpirun -np 4 -machinefile .machine0 
> /home/users/mollabashi/v19.2/lapw0_mpi lapw0.def >> .time00
> 
> 0.067u 0.091s 0:02.97 5.0%      0+0k 52272+144io 54pf+0w
> 
> mollabashi at eagle:~/test1/cein$ cat .machines
> 
> Sincerely yours,
> 
> Leila Mollabashi
> 
> 
> On Sun, Apr 11, 2021 at 9:40 PM Peter Blaha 
> <pblaha at theochem.tuwien.ac.at <mailto:pblaha at theochem.tuwien.ac.at>> wrote:
> 
>     Your script is still wrong.
>     The .machines file should show:
> 
>     lapw0:e0150:4
> 
>     not
>     lapw0:e0150
>     :4
> 
>     Therefore it tries to execute lapw0 instead of lapw0_mpi.
>     -----------
>     Anyway, the first thing is to make the sequential wien2k running. You
>     claimed the WIENROOT is known in the batch job.
>     Please do:
>     ls -als $WIENROOT/lapw0
> 
>     Does it have execute permission ?
> 
>     If yes, execute lapw0 explicitly:
> 
>     x lapw0
> 
>     and a second time:
> 
>     lapw0 lapw0.def
> 
> 
>     Am 11.04.2021 um 13:17 schrieb leila mollabashi:
>      > Dear Prof. Peter Blaha,
>      >
>      > Thank you for your guides. You are right. I edited the script and
>     added
>      > “source ~/.bashrc, echo 'lapw0:'`hostname`' :'$nproc >>
>     .machines” to it.
>      >
>      > The crated .machines file is as follows:
>      >
>      > lapw0:e0150
>      >
>      > :4
>      >
>      > 1:e0150:4
>      >
>      > 1:e0150:4
>      >
>      > granularity:1
>      >
>      > extrafine:1
>      >
>      > The slurm.out file is:
>      >
>      > e0150
>      >
>      > # .machines
>      >
>      > bash: lapw0: command not found
>      >
>      > real 0m0.001s
>      >
>      > user 0m0.001s
>      >
>      > sys 0m0.000s
>      >
>      > grep: *scf1*: No such file or directory
>      >
>      > grep: lapw2*.error: No such file or directory
>      >
>      >>  stop error
>      >
>      > When I used the following commands:
>      >
>      > echo $WIENROOT
>      > which lapw0
>      > which lapw0_mpi
>      >
>      > The following paths were printed:
>      >
>      > /home/users/mollabashi/v19.2
>      >
>      > /home/users/mollabashi/v19.2/lapw0
>      >
>      > /home/users/mollabashi/v19.2/lapw0_mpi
>      >
>      > But the error is still exists:
>      >
>      > bash: lapw0: command not found
>      >
>      > When I used your script in (faq page), one time the .machines
>     file was
>      > generated.
>      >
>      > But it stopped due to an error.
>      >
>      > test.scf1_1: No such file or directory.
>      >
>      > grep: *scf1*: No such file or directory
>      >
>      > FERMI - Error
>      >
>      > When I loaded openmpi and ifort as well as icc in the script this
>     error
>      > appeared:
>      >
>      >>SLURM_NTASKS_PER_NODE:  Undefined variable.
>      >
>      > Every time after that the
>      >
>      >>SLURM_NTASKS_PER_NODE:  Undefined variable
>      >
>      >   error happened when I used your scripts without changing it. I
>     have
>      > tried several times even in a new directory with no positive effect.
>      >
>      >>SLURM_NTASKS_PER_NODE:  Undefined variable.
>      >
>      > Sincerely yours,
>      >
>      > Leila Mollabashi
>      >
>      >
>      > _______________________________________________
>      > Wien mailing list
>      > Wien at zeus.theochem.tuwien.ac.at
>     <mailto:Wien at zeus.theochem.tuwien.ac.at>
>      > http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>     <http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien>
>      > SEARCH the MAILING-LIST at:
>     http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>     <http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html>
>      >
> 
>     -- 
>     --------------------------------------------------------------------------
>     Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
>     Phone: +43-1-58801-165300             FAX: +43-1-58801-165982
>     Email: blaha at theochem.tuwien.ac.at
>     <mailto:blaha at theochem.tuwien.ac.at>    WIEN2k: http://www.wien2k.at
>     <http://www.wien2k.at>
>     WWW: http://www.imc.tuwien.ac.at <http://www.imc.tuwien.ac.at>
>     -------------------------------------------------------------------------
>     _______________________________________________
>     Wien mailing list
>     Wien at zeus.theochem.tuwien.ac.at <mailto:Wien at zeus.theochem.tuwien.ac.at>
>     http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>     <http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien>
>     SEARCH the MAILING-LIST at:
>     http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>     <http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html>
> 
> 
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:  http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
> 

-- 
--------------------------------------------------------------------------
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-165300             FAX: +43-1-58801-165982
Email: blaha at theochem.tuwien.ac.at    WIEN2k: http://www.wien2k.at
WWW:   http://www.imc.tuwien.ac.at
-------------------------------------------------------------------------


More information about the Wien mailing list