[Wien] possible overload/underload with OpenMP or threading in general
Pavel Ondračka
pavel.ondracka at email.cz
Tue Oct 8 08:41:29 CEST 2019
Dear Wien2k mailing list,
in some recent discussion with profs. Marks and Blaha it was shown that
under some circumstances the threading parallelization in Wien2k and
its interaction with threaded BLAS/LAPACK environment variable (MKL but
possibly also OpenBLAS and others) might have unexpected behavior
potentially leading either to not perfect utilization of nodes
(underload) or too many contending threads (overload), both reducing
optimal speed of calculations.
Short story with just three points:
- Occasionally check the load of your nodes when running (either with
"top", similar program or using your job scheduler reporting). If its
much higher or lower than the number of cores, than this could be a
problem and please continue reading.
- If you have previously set MKL_NUM_THREADS, OPENBLAS_NUM_THREADS or
any other equivalent BLAS/LAPACK specific threading variable, please
unset them.
- If you linked with non-default MKL settings or linked with different
threaded BLAS/LAPACK such as OpenBLAS, please make sure that you
BLAS/LAPACK library is internally threaded with OpenMP (not pthreads,
TBB or any other threading library) and it uses the same OpenMP library
as Wien2k (one example of such problematic config would be when
compiling Wien2k with gfortran using MKL and using libiomp5 for MKL
threading but libgomp for OpenMP threading in Wien2k itself).
Best regards
Pavel
P.S.: Long story for people interested in technical details:
Wien2k links with the threaded MKL by default and threaded OpenBLAS is
usually also the default which distributions provide.
In Wien2k versions before 19 when running stuff k-parallel and without
OMP_NUM_THREADS set (or the BLAS specific equivalent env variables) the
parallel BLAS/LAPACK libraries usually try to use the maximum number of
cores, leading to overload if multiple k-points were running on single
node. This was fixed with Wien2k 19.1 where the threading is now
explicitly controlled from machines files and when no threading is
specified it defaults to one thread per process.
Another problem is with the BLAS/LAPACK specific threading variables
such as MKL_NUM_THREADS, OPENBLAS_NUM_TRHEADS, etc. They have higher
priority than the OMP_NUM_THREADS which is set by Wien2k internally
based on the omp_xxx:y lines in .machines file and therefore can
overwrite optimal threading set by the user. Unsetting them will make
the parallel BLAS/LAPACK obey settings from the .machines file.
More problems can occur when combining different threading models in
Wien2k and BLAS/LAPACK (such as OpenMP and POSIX threads) or using
OpenMP threading in both but different OpenMP libraries (for example
Intel and GNU). This is most likely to happen when using gfortran and
distro-provided OpenBLAS as its default threading is with ptreads.
The OpenMP parallelization in Wien2k works in such a way that there are
some explicit OpenMP parallel regions in which there might be also
BLAS/LAPACK calls. In other places the BLAS/LAPACK calls are done from
serial regions and we depend on parallelization at the BLAS/LAPACK
level. If using OpenMP and same omplib everywhere that in the first
case the BLAS/LAPACK library will recognize it is already being called
from parallel region and run only single threaded while in the second
case it will run with multiple threads as expected. If combining
threading models or different threading libraries the BLAS/LAPACK calls
from OpenMP parallel regions have no way of knowing there are already
multiple threads and can each spawn more threads leading again to
overload.
More information about the Wien
mailing list