[Wien] delays in parallel work

Peter Blaha pblaha at theochem.tuwien.ac.at
Tue Oct 6 20:55:46 CEST 2020


Compare the times in the dayfile with ls -alsrt (the times when the 
corresponding files were created).

You did not say how you run the parallel calculations (.machines file).

Definitely, ask the administrator how to access a local (local to the 
compute node) directory and set the SCRATCH variable to this directory.

If this does not help, you may even copy the whole directory to a local 
dir, change into it, run scf there and at the end copy back all files.

For more info: On many supercomputers you can ssh to a node which is 
allocated to you. when you can do this, do a top and ckeck where delays 
are, ....
If you cannot ssh to the node, usually one can ask for an interactive 
session in the queueing system. Then you should have access to the 
allocated node.

It could also be aproblem of "pinning".

As Laurence said: probably only a good sys.admin can help. But the 
problem could be that the system is not setup properly and then the 
sys.admins cannot help you ...


Am 06.10.2020 um 17:09 schrieb Lyudmila Dobysheva:
> Dear all,
> 
> I have started working at supercomputer and sometimes I see some delays 
> during execution. They occur randomly, more frequently during lapw0, but 
> in other programs also (extra 7-20 min). Administrators say that there 
> can be sometimes problems with the net's speed.
> But I cannot understand: now I take only one node with 16 processors. 
> I'd say that if I send the task to one node the problems of the net 
> between computers should not affect till the whole task ends.
> Maybe I have wrongly set scratch variable?
> In .bashrc:
> export SCRATCH=./
> 
> During execution I see how the cycle is fulfilled, that is, after lapw0 
> I see its output files. This means that after lapw0 the calculating node 
> sends to the governing computer the files, and, maybe, here it waits? Is 
> this behavior correct? I expected that I should not see the intermediate 
> stages, till the work ends.
> And the very programs lapw0, lapw1, lapw2, lcore, mixer - maybe they are 
> reloaded to the calculating computer every cycle anew?
> 
> Best regards
> Lyudmila Dobysheva
> 
> some details WIEN2k_19.2
> ifort 64 19.1.0.166
> ---------------
> parallel_options:
> setenv TASKSET "srun "
> if ( ! $?USE_REMOTE ) setenv USE_REMOTE 1
> if ( ! $?MPI_REMOTE ) setenv MPI_REMOTE 0
> setenv WIEN_GRANULARITY 1
> setenv DELAY 0.1
> setenv SLEEPY 1
> if ( ! $?WIEN_MPIRUN) setenv WIEN_MPIRUN "srun -K -N_nodes_ -n_NP_ 
> -r_offset_ _PINNING_ _EXEC_"
> if ( ! $?CORES_PER_NODE) setenv CORES_PER_NODE  16
> --------------
> WIEN2k_OPTIONS:
> current:FOPT:-O -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML 
> -traceback -assume buffered_io -I$(
> MKLROOT)/include
> current:FPOPT:-O -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML 
> -traceback -assume buffered_io -I$
> (MKLROOT)/include
> current:OMP_SWITCH:-qopenmp
> current:LDFLAGS:$(FOPT) -L$(MKLROOT)/lib/$(MKL_TARGET_ARCH) -lpthread 
> -lm -ldl -liomp5
> current:DPARALLEL:'-DParallel'
> current:R_LIBS:-lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core
> current:FFTWROOT:/home/uffff/.local/
> current:FFTW_VERSION:FFTW3
> current:FFTW_LIB:lib
> current:FFTW_LIBNAME:fftw3
> current:LIBXCROOT:
> current:LIBXC_FORTRAN:
> current:LIBXC_LIBNAME:
> current:LIBXC_LIBDNAME:
> current:SCALAPACKROOT:$(MKLROOT)/lib/
> current:SCALAPACK_LIBNAME:mkl_scalapack_lp64
> current:BLACSROOT:$(MKLROOT)/lib/
> current:BLACS_LIBNAME:mkl_blacs_intelmpi_lp64
> current:ELPAROOT:
> current:ELPA_VERSION:
> current:ELPA_LIB:
> current:ELPA_LIBNAME:
> current:MPIRUN:srun -K -N_nodes_ -n_NP_ -r_offset_ _PINNING_ _EXEC_
> current:CORES_PER_NODE:16
> current:MKL_TARGET_ARCH:intel64
> 
> ------------------
> http://ftiudm.ru/content/view/25/103/lang,english/
> Physics-Techn.Institute,
> Udmurt Federal Research Center, Ural Br. of Rus.Ac.Sci.
> 426000 Izhevsk Kirov str. 132
> Russia
> ---
> Tel. +7 (34I2)43-24-59 (office), +7 (9I2)OI9-795O (home)
> Skype: lyuka18 (office), lyuka17 (home)
> E-mail: lyuka17 at mail.ru (office), lyuka17 at gmail.com (home)
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:  
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

-- 
--------------------------------------------------------------------------
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-165300             FAX: +43-1-58801-165982
Email: blaha at theochem.tuwien.ac.at    WIEN2k: http://www.wien2k.at
WWW: 
http://www.imc.tuwien.ac.at/tc_blaha------------------------------------------------------------------------- 



More information about the Wien mailing list