[Wien] delays in parallel work

Laurence Marks laurence.marks at gmail.com
Tue Oct 6 17:21:36 CEST 2020


Dear Lyudmila,

This is almost certainly an OS problem, and there is little that you can do
except find a better supercomputer!

It could be an NFS problem, and setting SCRATCH to a local file on each
computer node might then help. Alternatively, while you are supposed to
have all of any given node, they might not be running that way -- a lot
depends upon how srun is configured.

One thing to test is internal mpi (same node) versus cross-node mpi. The
first should always be fast.

And....buy a sys admin a beer (vodka) and have him/her explain how they
have things configured in more detail.

On Tue, Oct 6, 2020 at 10:14 AM Lyudmila Dobysheva <lyuka17 at mail.ru> wrote:

> Dear all,
>
> I have started working at supercomputer and sometimes I see some delays
> during execution. They occur randomly, more frequently during lapw0, but
> in other programs also (extra 7-20 min). Administrators say that there
> can be sometimes problems with the net's speed.
> But I cannot understand: now I take only one node with 16 processors.
> I'd say that if I send the task to one node the problems of the net
> between computers should not affect till the whole task ends.
> Maybe I have wrongly set scratch variable?
> In .bashrc:
> export SCRATCH=./
>
> During execution I see how the cycle is fulfilled, that is, after lapw0
> I see its output files. This means that after lapw0 the calculating node
> sends to the governing computer the files, and, maybe, here it waits? Is
> this behavior correct? I expected that I should not see the intermediate
> stages, till the work ends.
> And the very programs lapw0, lapw1, lapw2, lcore, mixer - maybe they are
> reloaded to the calculating computer every cycle anew?
>
> Best regards
> Lyudmila Dobysheva
>
> some details WIEN2k_19.2
> ifort 64 19.1.0.166
> ---------------
> parallel_options:
> setenv TASKSET "srun "
> if ( ! $?USE_REMOTE ) setenv USE_REMOTE 1
> if ( ! $?MPI_REMOTE ) setenv MPI_REMOTE 0
> setenv WIEN_GRANULARITY 1
> setenv DELAY 0.1
> setenv SLEEPY 1
> if ( ! $?WIEN_MPIRUN) setenv WIEN_MPIRUN "srun -K -N_nodes_ -n_NP_
> -r_offset_ _PINNING_ _EXEC_"
> if ( ! $?CORES_PER_NODE) setenv CORES_PER_NODE  16
> --------------
> WIEN2k_OPTIONS:
> current:FOPT:-O -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML
> -traceback -assume buffered_io -I$(
> MKLROOT)/include
> current:FPOPT:-O -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML
> -traceback -assume buffered_io -I$
> (MKLROOT)/include
> current:OMP_SWITCH:-qopenmp
> current:LDFLAGS:$(FOPT) -L$(MKLROOT)/lib/$(MKL_TARGET_ARCH) -lpthread
> -lm -ldl -liomp5
> current:DPARALLEL:'-DParallel'
> current:R_LIBS:-lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core
> current:FFTWROOT:/home/uffff/.local/
> current:FFTW_VERSION:FFTW3
> current:FFTW_LIB:lib
> current:FFTW_LIBNAME:fftw3
> current:LIBXCROOT:
> current:LIBXC_FORTRAN:
> current:LIBXC_LIBNAME:
> current:LIBXC_LIBDNAME:
> current:SCALAPACKROOT:$(MKLROOT)/lib/
> current:SCALAPACK_LIBNAME:mkl_scalapack_lp64
> current:BLACSROOT:$(MKLROOT)/lib/
> current:BLACS_LIBNAME:mkl_blacs_intelmpi_lp64
> current:ELPAROOT:
> current:ELPA_VERSION:
> current:ELPA_LIB:
> current:ELPA_LIBNAME:
> current:MPIRUN:srun -K -N_nodes_ -n_NP_ -r_offset_ _PINNING_ _EXEC_
> current:CORES_PER_NODE:16
> current:MKL_TARGET_ARCH:intel64
>
> ------------------
>
> https://urldefense.com/v3/__http://ftiudm.ru/content/view/25/103/lang,english/__;!!Dq0X2DkFhyF93HkjWTBQKhk!DR3lyfE3O6uY7hwNXSGhDD_cUJeZJ30DGB2hyhheIjmw6g37W7S_HNcCObMl3AHsatYthw$
> Physics-Techn.Institute,
> Udmurt Federal Research Center, Ural Br. of Rus.Ac.Sci.
> 426000 Izhevsk Kirov str. 132
> Russia
> ---
> Tel. +7 (34I2)43-24-59 (office), +7 (9I2)OI9-795O (home)
> Skype: lyuka18 (office), lyuka17 (home)
> E-mail: lyuka17 at mail.ru (office), lyuka17 at gmail.com (home)
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
>
> https://urldefense.com/v3/__http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien__;!!Dq0X2DkFhyF93HkjWTBQKhk!DR3lyfE3O6uY7hwNXSGhDD_cUJeZJ30DGB2hyhheIjmw6g37W7S_HNcCObMl3AFZ-tY25Q$
> SEARCH the MAILING-LIST at:
> https://urldefense.com/v3/__http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html__;!!Dq0X2DkFhyF93HkjWTBQKhk!DR3lyfE3O6uY7hwNXSGhDD_cUJeZJ30DGB2hyhheIjmw6g37W7S_HNcCObMl3AE759vujg$
>


-- 
Professor Laurence Marks
Department of Materials Science and Engineering
Northwestern University
www.numis.northwestern.edu
Corrosion in 4D: www.numis.northwestern.edu/MURI
Co-Editor, Acta Cryst A
"Research is to see what everybody else has seen, and to think what nobody
else has thought"
Albert Szent-Gyorgi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20201006/4399cfcf/attachment.htm>


More information about the Wien mailing list