[Wien] MPI error

leila mollabashi le.mollabashi at gmail.com
Tue Apr 13 21:47:13 CEST 2021


Dear Prof. Peter Blaha and WIEN2k users,

Then by run x lapw1 –p:

starting parallel lapw1 at Tue Apr 13 21:04:15 CEST 2021

->  starting parallel LAPW1 jobs at Tue Apr 13 21:04:15 CEST 2021

running LAPW1 in parallel mode (using .machines)

2 number_of_parallel_jobs

[1] 14530

[e0467:14538] mca_base_component_repository_open: unable to open
mca_btl_uct: libucp.so.0: cannot open shared object file: No such file or
directory (ignored)

WARNING: There was an error initializing an OpenFabrics device.

  Local host:   e0467

  Local device: mlx4_0

MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD

with errorcode 0.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.

You may or may not see output from other processes, depending on

exactly when Open MPI kills them.

--------------------------------------------------------------------------

[e0467:14567] 1 more process has sent help message help-mpi-btl-openib.txt
/ error in device init

[e0467:14567] 1 more process has sent help message help-mpi-btl-openib.txt
/ error in device init

[e0467:14567] Set MCA parameter "orte_base_help_aggregate" to 0 to see all
help / error messages

[warn] Epoll MOD(1) on fd 27 failed.  Old events were 6; read change was 0
(none); write change was 2 (del): Bad file descriptor

>Somewhere there should be some documentation how one runs an mpi job on
your system.

Only I found this:

Before ordering a task, it should be encapsulated in an appropriate script
understandable for the queue system, e.g .:

/home/users/user/submit_script.sl

Sample SLURM script:

#! / bin / bash -l

#SBATCH -N 1

#SBATCH --mem 5000

#SBATCH --time = 20:00:00



/sciezka/do/pliku/binarnego/plik_binarny.in>
/sciezka/do/pliku/wyjsciowego.out

To order a task to a specific queue, use the #SBATCH -p parameter, e.g.

#! / bin / bash -l

#SBATCH -N 1

#SBATCH --mem 5000

#SBATCH --time = 20:00:00

#SBATCH -p standard



/sciezka/do/pliku/binarnego/plik_binarny.in>
/siezka/do/pliku/wyjsciowego.out

The task must then be ordered using the *sbatch* command

sbatch /home/users/user/submit_script.sl

*Ordering interactive tasks*


Interactive tasks can be divided into two groups:

·         interactive task (working in text mode)

·         interactive task

*Interactive task (working in text mode)*


Ordering interactive tasks is very simple and in the simplest case it comes
down to issuing the command below.

srun --pty / bin / bash



Sincerely yours,

Leila Mollabashi

On Wed, Apr 14, 2021 at 12:03 AM leila mollabashi <le.mollabashi at gmail.com>
wrote:

> Dear Prof. Peter Blaha and WIEN2k users,
>
> Thank you for your assistances.
>
> > At least now the error: "lapw0 not found" is gone. Do you understand
> why ??
>
> Yes, I think that because now the path is clearly known.
>
> >How many slots do you get by this srun command ?
>
> Usually I went to node with 28 CPUs.
>
> >Is this the node with the name  e0591  ???
>
> Yes, it is.
>
> >Of course the .machines file must be consistent (dynamically adapted)
>
> with the actual nodename.
>
> Yes, to do this I use my script.
>
> >When I use “srun --pty -n 8 /bin/bash” that goes to the node with 8 free
> cores, and run x lapw0 –p then this happens:
>
> starting parallel lapw0 at Tue Apr 13 20:50:49 CEST 2021
>
> -------- .machine0 : 4 processors
>
> [1] 12852
>
> [e0467:12859] mca_base_component_repository_open: unable to open
> mca_btl_uct: libucp.so.0: cannot open shared object file: No such file or
> directory (ignored)
>
> [e0467][[56319,1],1][btl_openib_component.c:1699:init_one_device] error
> obtaining device attributes for mlx4_0 errno says Protocol not supported
>
> [e0467:12859] mca_base_component_repository_open: unable to open
> mca_pml_ucx: libucp.so.0: cannot open shared object file: No such file or
> directory (ignored)
>
> LAPW0 END
>
> [1]    Done                          mpirun -np 4 -machinefile .machine0
> /home/users/mollabashi/v19.2/lapw0_mpi lapw0.def >> .time00
>
> Sincerely yours,
>
> Leila Mollabashi
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20210414/25bf3d3f/attachment.htm>


More information about the Wien mailing list