[Wien] Error in MPI run
Peter Blaha
pblaha at theochem.tuwien.ac.at
Tue Mar 30 08:17:48 CEST 2021
For sure your script is not ok. Most likely you should take the slurm
batch file from our website (faq page) as basis.
-----------------------
Please check the created .machines file
You do not have a line:
lapw0: host:xx where xx is number of cores, .....
Therefore the message: lapw0 not found (instead if lapw0_mpi not found).
So the script tried to run lapw0 in serial (not mpi-parallel).
------------------------------
Still, this means that your environment in the batch job is not ok.
Maybe you need lines like:
source .bashrc
and checks like:
echo $WIENROOT
which lapw0
which lapw0_mpi
Am 28.03.2021 um 22:37 schrieb leila mollabashi:
> Dear Wien2k users,
>
> I have a problem with MPI parallelization while I have compiled the code
> with no error. The v19.2 version of WIEN2k has been compiled with ifort,
> cc and openmpi. The mkl and FFTW libraries were also used. On the SLURM
> queue cluster I can run with the k-point parallel mode. But I could not
> run it mpi parallel mode even on 1 node. I used this script to run:
>
> Sbatch submit_script.sl <http://submit_script.sl>
>
> Which submit_script.sl <http://submit_script.sl> file is for example as
> follows:
>
> #! /bin/bash -l
>
> hostname
>
> rm -fr .machines
>
> # for 4 cpus and kpoints (in input file)
>
> nproc=4
>
> #write .machines file
>
> echo '#' .machines
>
> # example for an MPI parallel lapw0
>
> echo 'lapw0:'`hostname`'
>
> #:'$nproc >> .machines
>
> # k-point and mpi parallel lapw1/2
>
> echo '1:'`hostname`':2' >> .machines
>
> echo '1:'`hostname`':2' >> .machines
>
> echo 'granularity:1' >>.machines
>
> echo 'extrafine:1' >>.machines
>
> run_lapw –p
>
> Then this error appears:
>
> error: command /home/users/mollabashi/v19.2/lapw0para lapw0.def failed
>
> slurm-17032361.out file is as follows:
>
> # .machines
>
> bash: lapw0: command not found
>
> real 0m0.001s
>
> user 0m0.000s
>
> sys 0m0.001s
>
> grep: *scf1*: No such file or directory
>
> grep: lapw2*.error: No such file or directory
>
>> stop error
>
> Then when I manually run this error appears:
>
> There are not enough slots available in the system to satisfy the 4
>
> slots that were requested by the application:
>
> /home/users/mollabashi/v19.2/lapw0_mpi
>
> Either request fewer slots for your application, or make more slots
>
> available for use.
>
> A "slot" is the Open MPI term for an allocatable unit where we can
>
> launch a process. The number of slots available are defined by the
>
> environment in which Open MPI processes are run:
>
> 1. Hostfile, via "slots=N" clauses (N defaults to number of
>
> processor cores if not provided)
>
> 2. The --host command line parameter, via a ":N" suffix on the
>
> hostname (N defaults to 1 if not provided)
>
> 3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)
>
> 4. If none of a hostfile, the --host command line parameter, or an
>
> RM is present, Open MPI defaults to the number of processor cores
>
> In all the above cases, if you want Open MPI to default to the number
>
> of hardware threads instead of the number of processor cores, use the
>
> --use-hwthread-cpus option.
>
> Alternatively, you can use the --oversubscribe option to ignore the
>
> number of available slots when deciding the number of processes to
>
> launch.
>
> --------------------------------------------------------------------------
>
> [1] Exit 1 mpirun -np 4 -machinefile .machine0
> /home/users/mollabashi/v19.2/lapw0_mpi lapw0.def >> .time00
>
> --------------------------------------------------------------------------
>
> There are not enough slots available in the system to satisfy the 2
>
> slots that were requested by the application:
>
> /home/users/mollabashi/v19.2/lapw1_mpi
>
> Either request fewer slots for your application, or make more slots
>
> available for use.
>
> A "slot" is the Open MPI term for an allocatable unit where we can
>
> launch a process. The number of slots available are defined by the
>
> environment in which Open MPI processes are run:
>
> 1. Hostfile, via "slots=N" clauses (N defaults to number of
>
> processor cores if not provided)
>
> 2. The --host command line parameter, via a ":N" suffix on the
>
> hostname (N defaults to 1 if not provided)
>
> 3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)
>
> 4. If none of a hostfile, the --host command line parameter, or an
>
> RM is present, Open MPI defaults to the number of processor cores
>
> In all the above cases, if you want Open MPI to default to the number
>
> of hardware threads instead of the number of processor cores, use the
>
> --use-hwthread-cpus option.
>
> Alternatively, you can use the --oversubscribe option to ignore the
>
> number of available slots when deciding the number of processes to
>
> launch.
>
> --------------------------------------------------------------------------
>
> [1] + Done ( cd $PWD; $t $ttt; rm -f
> .lock_$lockfile[$p] ) >> .time1_$loop
>
> --------------------------------------------------------------------------
>
> There are not enough slots available in the system to satisfy the 2
>
> slots that were requested by the application:
>
> /home/users/mollabashi/v19.2/lapw1_mpi
>
> Either request fewer slots for your application, or make more slots
>
> available for use.
>
> A "slot" is the Open MPI term for an allocatable unit where we can
>
> launch a process. The number of slots available are defined by the
>
> environment in which Open MPI processes are run:
>
> 1. Hostfile, via "slots=N" clauses (N defaults to number of
>
> processor cores if not provided)
>
> 2. The --host command line parameter, via a ":N" suffix on the
>
> hostname (N defaults to 1 if not provided)
>
> 3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)
>
> 4. If none of a hostfile, the --host command line parameter, or an
>
> RM is present, Open MPI defaults to the number of processor cores
>
> In all the above cases, if you want Open MPI to default to the number
>
> of hardware threads instead of the number of processor cores, use the
>
> --use-hwthread-cpus option.
>
> Alternatively, you can use the --oversubscribe option to ignore the
>
> number of available slots when deciding the number of processes to
>
> launch.
>
> --------------------------------------------------------------------------
>
> [1] + Done ( cd $PWD; $t $ttt; rm -f
> .lock_$lockfile[$p] ) >> .time1_$loop
>
> ce.scf1_1: No such file or directory.
>
> grep: *scf1*: No such file or directory
>
> LAPW2 - Error. Check file lapw2.error
>
> cp: cannot stat ‘.in.tmp’: No such file or directory
>
> grep: *scf1*: No such file or directory
>
>> stop error
>
> Would you please kindly guide me?
>
> Sincerely yours,
>
> Leila Mollabashi
>
>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>
--
--------------------------------------------------------------------------
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-165300 FAX: +43-1-58801-165982
Email: blaha at theochem.tuwien.ac.at WIEN2k: http://www.wien2k.at
WWW: http://www.imc.tuwien.ac.at
-------------------------------------------------------------------------
More information about the Wien
mailing list