[Wien] Error in MPI run

Peter Blaha pblaha at theochem.tuwien.ac.at
Tue Mar 30 08:17:48 CEST 2021


For sure your script is not ok. Most likely you should take the slurm 
batch file from our website (faq page) as basis.
-----------------------
Please check the created   .machines file

You do not have a line:

lapw0: host:xx     where xx is number of cores, .....

Therefore the message:  lapw0 not found  (instead if lapw0_mpi not found).

So the script tried to run lapw0 in serial (not mpi-parallel).
------------------------------
Still, this means that your environment in the batch job is not ok.

Maybe you need lines like:

source .bashrc

and checks like:

echo $WIENROOT
which lapw0
which lapw0_mpi



Am 28.03.2021 um 22:37 schrieb leila mollabashi:
> Dear Wien2k users,
> 
> I have a problem with MPI parallelization while I have compiled the code 
> with no error. The v19.2 version of WIEN2k has been compiled with ifort, 
> cc and openmpi. The mkl and FFTW libraries were also used. On the SLURM 
> queue cluster I can run with the k-point parallel mode. But I could not 
> run it mpi parallel mode even on 1 node.  I used this script to run:
> 
> Sbatch submit_script.sl <http://submit_script.sl>
> 
> Which submit_script.sl <http://submit_script.sl> file is for example as 
> follows:
> 
> #! /bin/bash -l
> 
> hostname
> 
> rm -fr .machines
> 
> # for 4 cpus and kpoints (in input file)
> 
> nproc=4
> 
> #write .machines file
> 
> echo '#' .machines
> 
> # example for an MPI parallel lapw0
> 
> echo 'lapw0:'`hostname`'
> 
> #:'$nproc >> .machines
> 
> # k-point and mpi parallel lapw1/2
> 
> echo '1:'`hostname`':2' >> .machines
> 
> echo '1:'`hostname`':2' >> .machines
> 
> echo 'granularity:1' >>.machines
> 
> echo 'extrafine:1' >>.machines
> 
>   run_lapw –p
> 
> Then this error appears:
> 
> error: command /home/users/mollabashi/v19.2/lapw0para lapw0.def   failed
> 
> slurm-17032361.out file is as follows:
> 
> # .machines
> 
> bash: lapw0: command not found
> 
> real    0m0.001s
> 
> user    0m0.000s
> 
> sys     0m0.001s
> 
> grep: *scf1*: No such file or directory
> 
> grep: lapw2*.error: No such file or directory
> 
>>   stop error
> 
> Then when I manually run this error appears:
> 
> There are not enough slots available in the system to satisfy the 4
> 
> slots that were requested by the application:
> 
> /home/users/mollabashi/v19.2/lapw0_mpi
> 
> Either request fewer slots for your application, or make more slots
> 
> available for use.
> 
> A "slot" is the Open MPI term for an allocatable unit where we can
> 
> launch a process. The number of slots available are defined by the
> 
> environment in which Open MPI processes are run:
> 
>    1. Hostfile, via "slots=N" clauses (N defaults to number of
> 
>       processor cores if not provided)
> 
>    2. The --host command line parameter, via a ":N" suffix on the
> 
>       hostname (N defaults to 1 if not provided)
> 
>    3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)
> 
>    4. If none of a hostfile, the --host command line parameter, or an
> 
>       RM is present, Open MPI defaults to the number of processor cores
> 
> In all the above cases, if you want Open MPI to default to the number
> 
> of hardware threads instead of the number of processor cores, use the
> 
> --use-hwthread-cpus option.
> 
> Alternatively, you can use the --oversubscribe option to ignore the
> 
> number of available slots when deciding the number of processes to
> 
> launch.
> 
> --------------------------------------------------------------------------
> 
> [1]    Exit 1                        mpirun -np 4 -machinefile .machine0 
> /home/users/mollabashi/v19.2/lapw0_mpi lapw0.def >> .time00
> 
> --------------------------------------------------------------------------
> 
> There are not enough slots available in the system to satisfy the 2
> 
> slots that were requested by the application:
> 
> /home/users/mollabashi/v19.2/lapw1_mpi
> 
> Either request fewer slots for your application, or make more slots
> 
> available for use.
> 
> A "slot" is the Open MPI term for an allocatable unit where we can
> 
> launch a process. The number of slots available are defined by the
> 
> environment in which Open MPI processes are run:
> 
>    1. Hostfile, via "slots=N" clauses (N defaults to number of
> 
>       processor cores if not provided)
> 
>    2. The --host command line parameter, via a ":N" suffix on the
> 
>       hostname (N defaults to 1 if not provided)
> 
>    3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)
> 
>    4. If none of a hostfile, the --host command line parameter, or an
> 
>       RM is present, Open MPI defaults to the number of processor cores
> 
> In all the above cases, if you want Open MPI to default to the number
> 
> of hardware threads instead of the number of processor cores, use the
> 
> --use-hwthread-cpus option.
> 
> Alternatively, you can use the --oversubscribe option to ignore the
> 
> number of available slots when deciding the number of processes to
> 
> launch.
> 
> --------------------------------------------------------------------------
> 
> [1]  + Done                          ( cd $PWD; $t $ttt; rm -f 
> .lock_$lockfile[$p] ) >> .time1_$loop
> 
> --------------------------------------------------------------------------
> 
> There are not enough slots available in the system to satisfy the 2
> 
> slots that were requested by the application:
> 
> /home/users/mollabashi/v19.2/lapw1_mpi
> 
> Either request fewer slots for your application, or make more slots
> 
> available for use.
> 
> A "slot" is the Open MPI term for an allocatable unit where we can
> 
> launch a process. The number of slots available are defined by the
> 
> environment in which Open MPI processes are run:
> 
>    1. Hostfile, via "slots=N" clauses (N defaults to number of
> 
>       processor cores if not provided)
> 
>    2. The --host command line parameter, via a ":N" suffix on the
> 
>       hostname (N defaults to 1 if not provided)
> 
>    3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)
> 
>    4. If none of a hostfile, the --host command line parameter, or an
> 
>       RM is present, Open MPI defaults to the number of processor cores
> 
> In all the above cases, if you want Open MPI to default to the number
> 
> of hardware threads instead of the number of processor cores, use the
> 
> --use-hwthread-cpus option.
> 
> Alternatively, you can use the --oversubscribe option to ignore the
> 
> number of available slots when deciding the number of processes to
> 
> launch.
> 
> --------------------------------------------------------------------------
> 
> [1]  + Done                          ( cd $PWD; $t $ttt; rm -f 
> .lock_$lockfile[$p] ) >> .time1_$loop
> 
> ce.scf1_1: No such file or directory.
> 
> grep: *scf1*: No such file or directory
> 
> LAPW2 - Error. Check file lapw2.error
> 
> cp: cannot stat ‘.in.tmp’: No such file or directory
> 
> grep: *scf1*: No such file or directory
> 
>>   stop error
> 
> Would you please kindly guide me?
> 
> Sincerely yours,
> 
> Leila Mollabashi
> 
> 
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:  http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
> 

-- 
--------------------------------------------------------------------------
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-165300             FAX: +43-1-58801-165982
Email: blaha at theochem.tuwien.ac.at    WIEN2k: http://www.wien2k.at
WWW:   http://www.imc.tuwien.ac.at
-------------------------------------------------------------------------


More information about the Wien mailing list