[Wien] MPI error
leila mollabashi
le.mollabashi at gmail.com
Tue Apr 13 21:47:13 CEST 2021
Dear Prof. Peter Blaha and WIEN2k users,
Then by run x lapw1 –p:
starting parallel lapw1 at Tue Apr 13 21:04:15 CEST 2021
-> starting parallel LAPW1 jobs at Tue Apr 13 21:04:15 CEST 2021
running LAPW1 in parallel mode (using .machines)
2 number_of_parallel_jobs
[1] 14530
[e0467:14538] mca_base_component_repository_open: unable to open
mca_btl_uct: libucp.so.0: cannot open shared object file: No such file or
directory (ignored)
WARNING: There was an error initializing an OpenFabrics device.
Local host: e0467
Local device: mlx4_0
MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD
with errorcode 0.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
[e0467:14567] 1 more process has sent help message help-mpi-btl-openib.txt
/ error in device init
[e0467:14567] 1 more process has sent help message help-mpi-btl-openib.txt
/ error in device init
[e0467:14567] Set MCA parameter "orte_base_help_aggregate" to 0 to see all
help / error messages
[warn] Epoll MOD(1) on fd 27 failed. Old events were 6; read change was 0
(none); write change was 2 (del): Bad file descriptor
>Somewhere there should be some documentation how one runs an mpi job on
your system.
Only I found this:
Before ordering a task, it should be encapsulated in an appropriate script
understandable for the queue system, e.g .:
/home/users/user/submit_script.sl
Sample SLURM script:
#! / bin / bash -l
#SBATCH -N 1
#SBATCH --mem 5000
#SBATCH --time = 20:00:00
/sciezka/do/pliku/binarnego/plik_binarny.in>
/sciezka/do/pliku/wyjsciowego.out
To order a task to a specific queue, use the #SBATCH -p parameter, e.g.
#! / bin / bash -l
#SBATCH -N 1
#SBATCH --mem 5000
#SBATCH --time = 20:00:00
#SBATCH -p standard
/sciezka/do/pliku/binarnego/plik_binarny.in>
/siezka/do/pliku/wyjsciowego.out
The task must then be ordered using the *sbatch* command
sbatch /home/users/user/submit_script.sl
*Ordering interactive tasks*
Interactive tasks can be divided into two groups:
· interactive task (working in text mode)
· interactive task
*Interactive task (working in text mode)*
Ordering interactive tasks is very simple and in the simplest case it comes
down to issuing the command below.
srun --pty / bin / bash
Sincerely yours,
Leila Mollabashi
On Wed, Apr 14, 2021 at 12:03 AM leila mollabashi <le.mollabashi at gmail.com>
wrote:
> Dear Prof. Peter Blaha and WIEN2k users,
>
> Thank you for your assistances.
>
> > At least now the error: "lapw0 not found" is gone. Do you understand
> why ??
>
> Yes, I think that because now the path is clearly known.
>
> >How many slots do you get by this srun command ?
>
> Usually I went to node with 28 CPUs.
>
> >Is this the node with the name e0591 ???
>
> Yes, it is.
>
> >Of course the .machines file must be consistent (dynamically adapted)
>
> with the actual nodename.
>
> Yes, to do this I use my script.
>
> >When I use “srun --pty -n 8 /bin/bash” that goes to the node with 8 free
> cores, and run x lapw0 –p then this happens:
>
> starting parallel lapw0 at Tue Apr 13 20:50:49 CEST 2021
>
> -------- .machine0 : 4 processors
>
> [1] 12852
>
> [e0467:12859] mca_base_component_repository_open: unable to open
> mca_btl_uct: libucp.so.0: cannot open shared object file: No such file or
> directory (ignored)
>
> [e0467][[56319,1],1][btl_openib_component.c:1699:init_one_device] error
> obtaining device attributes for mlx4_0 errno says Protocol not supported
>
> [e0467:12859] mca_base_component_repository_open: unable to open
> mca_pml_ucx: libucp.so.0: cannot open shared object file: No such file or
> directory (ignored)
>
> LAPW0 END
>
> [1] Done mpirun -np 4 -machinefile .machine0
> /home/users/mollabashi/v19.2/lapw0_mpi lapw0.def >> .time00
>
> Sincerely yours,
>
> Leila Mollabashi
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20210414/25bf3d3f/attachment.htm>
More information about the Wien
mailing list