[Wien] delays in parallel work
Peter Blaha
pblaha at theochem.tuwien.ac.at
Tue Oct 6 20:55:46 CEST 2020
Compare the times in the dayfile with ls -alsrt (the times when the
corresponding files were created).
You did not say how you run the parallel calculations (.machines file).
Definitely, ask the administrator how to access a local (local to the
compute node) directory and set the SCRATCH variable to this directory.
If this does not help, you may even copy the whole directory to a local
dir, change into it, run scf there and at the end copy back all files.
For more info: On many supercomputers you can ssh to a node which is
allocated to you. when you can do this, do a top and ckeck where delays
are, ....
If you cannot ssh to the node, usually one can ask for an interactive
session in the queueing system. Then you should have access to the
allocated node.
It could also be aproblem of "pinning".
As Laurence said: probably only a good sys.admin can help. But the
problem could be that the system is not setup properly and then the
sys.admins cannot help you ...
Am 06.10.2020 um 17:09 schrieb Lyudmila Dobysheva:
> Dear all,
>
> I have started working at supercomputer and sometimes I see some delays
> during execution. They occur randomly, more frequently during lapw0, but
> in other programs also (extra 7-20 min). Administrators say that there
> can be sometimes problems with the net's speed.
> But I cannot understand: now I take only one node with 16 processors.
> I'd say that if I send the task to one node the problems of the net
> between computers should not affect till the whole task ends.
> Maybe I have wrongly set scratch variable?
> In .bashrc:
> export SCRATCH=./
>
> During execution I see how the cycle is fulfilled, that is, after lapw0
> I see its output files. This means that after lapw0 the calculating node
> sends to the governing computer the files, and, maybe, here it waits? Is
> this behavior correct? I expected that I should not see the intermediate
> stages, till the work ends.
> And the very programs lapw0, lapw1, lapw2, lcore, mixer - maybe they are
> reloaded to the calculating computer every cycle anew?
>
> Best regards
> Lyudmila Dobysheva
>
> some details WIEN2k_19.2
> ifort 64 19.1.0.166
> ---------------
> parallel_options:
> setenv TASKSET "srun "
> if ( ! $?USE_REMOTE ) setenv USE_REMOTE 1
> if ( ! $?MPI_REMOTE ) setenv MPI_REMOTE 0
> setenv WIEN_GRANULARITY 1
> setenv DELAY 0.1
> setenv SLEEPY 1
> if ( ! $?WIEN_MPIRUN) setenv WIEN_MPIRUN "srun -K -N_nodes_ -n_NP_
> -r_offset_ _PINNING_ _EXEC_"
> if ( ! $?CORES_PER_NODE) setenv CORES_PER_NODE 16
> --------------
> WIEN2k_OPTIONS:
> current:FOPT:-O -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML
> -traceback -assume buffered_io -I$(
> MKLROOT)/include
> current:FPOPT:-O -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML
> -traceback -assume buffered_io -I$
> (MKLROOT)/include
> current:OMP_SWITCH:-qopenmp
> current:LDFLAGS:$(FOPT) -L$(MKLROOT)/lib/$(MKL_TARGET_ARCH) -lpthread
> -lm -ldl -liomp5
> current:DPARALLEL:'-DParallel'
> current:R_LIBS:-lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core
> current:FFTWROOT:/home/uffff/.local/
> current:FFTW_VERSION:FFTW3
> current:FFTW_LIB:lib
> current:FFTW_LIBNAME:fftw3
> current:LIBXCROOT:
> current:LIBXC_FORTRAN:
> current:LIBXC_LIBNAME:
> current:LIBXC_LIBDNAME:
> current:SCALAPACKROOT:$(MKLROOT)/lib/
> current:SCALAPACK_LIBNAME:mkl_scalapack_lp64
> current:BLACSROOT:$(MKLROOT)/lib/
> current:BLACS_LIBNAME:mkl_blacs_intelmpi_lp64
> current:ELPAROOT:
> current:ELPA_VERSION:
> current:ELPA_LIB:
> current:ELPA_LIBNAME:
> current:MPIRUN:srun -K -N_nodes_ -n_NP_ -r_offset_ _PINNING_ _EXEC_
> current:CORES_PER_NODE:16
> current:MKL_TARGET_ARCH:intel64
>
> ------------------
> http://ftiudm.ru/content/view/25/103/lang,english/
> Physics-Techn.Institute,
> Udmurt Federal Research Center, Ural Br. of Rus.Ac.Sci.
> 426000 Izhevsk Kirov str. 132
> Russia
> ---
> Tel. +7 (34I2)43-24-59 (office), +7 (9I2)OI9-795O (home)
> Skype: lyuka18 (office), lyuka17 (home)
> E-mail: lyuka17 at mail.ru (office), lyuka17 at gmail.com (home)
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
--
--------------------------------------------------------------------------
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-165300 FAX: +43-1-58801-165982
Email: blaha at theochem.tuwien.ac.at WIEN2k: http://www.wien2k.at
WWW:
http://www.imc.tuwien.ac.at/tc_blaha-------------------------------------------------------------------------
More information about the Wien
mailing list