[Wien] Slurm + Intel impi

Laurence Marks laurence.marks at gmail.com
Tue Dec 12 14:22:59 CET 2023


This may be too technical, but I thought I would ask as someone might have
seen something similar.

On a supercomputer using slurm/srun I am seeing irreproducible crashes,
some a Sigsev in lapw1_mpi/elpa, sometime a bus error in lapw2_mpi. These
are large calculations (Matrix size ~94K) using hybrid omp/mpi of 2omp x
128mpi as hybrid is more memory efficient. Intel impi.

According to https://slurm.schedmd.com/mpi_guide.html I should use PMI2 with
I_MPI_PMI_LIBRARY=/path/to/slurm/lib/libpmi2.so .  (Currently
I_MPI_PMI_LIBRARY is not set.) Apparently PMI1 is not very thread safe. Has
anyone come across anything similar?

--
Emeritus Professor Laurence Marks (Laurie)
Northwestern University
www.numis.northwestern.edu
https://scholar.google.com/citations?user=zmHhI9gAAAAJ&hl=en
"Research is to see what everybody else has seen, and to think what nobody
else has thought" Albert Szent-Györgyi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20231212/29ef791b/attachment.htm>


More information about the Wien mailing list