[Wien] slurm mpi

webfinder at ukr.net webfinder at ukr.net
Tue May 7 16:30:47 CEST 2019


Dear Prof. Blaha,
I'm using intel mpi 2019.3.199
the scalapack and blacs libs are located in the intel compilers_and_libraries_2019.3.199 folder 

OPTIONS file:
current:FOPT:-O1 -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback -assume buffered_io -I$(MKLROOT)/include
current:FPOPT:-O1 -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback -assume buffered_io -I$(MKLROOT)/include
current:LDFLAGS:$(FOPT) -L$(MKLROOT)/lib/intel64 -lpthread -lm -ldl -liomp5
current:DPARALLEL:-DParallel
current:R_LIBS:-lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core
current:FFTWROOT:/gpfs/home/ser/Install/FFTW/
current:FFTW_VERSION:FFTW3
current:FFTW_LIB:lib
current:FFTW_LIBNAME:fftw3
current:LIBXCROOT:/gpfs/home/ser/Install/LIBXC/
current:LIBXC_FORTRAN:xcf03
current:LIBXC_LIBNAME:xc
current:LIBXC_LIBDNAME:lib/
current:SCALAPACKROOT:/gpfs/softs/cluster/intel/psxe/2019.3/compilers_and_libraries_2019.3.199/linux/mkl/lib/
current:SCALAPACK_LIBNAME:mkl_scalapack_lp64
current:BLACSROOT:/gpfs/softs/cluster/intel/psxe/2019.3/compilers_and_libraries_2019.3.199/linux/mkl/lib/
current:BLACS_LIBNAME:mkl_blacs_intelmpi_lp64
current:ELPAROOT:
current:ELPA_VERSION:
current:MPIRUN:srun -K -N_nodes_ -n_NP_ -r_offset_ _PINNING_ _EXEC_     (changed to mpirun)
current:CORES_PER_NODE:1
current:MKL_TARGET_ARCH:intel64


part of .bashrc

module load gcc/7.3.0
module add compiler/intel/2019.3.199
module load mpi/intel/2019.3.199
source /gpfs/softs/cluster/intel/psxe/2019.3/compilers_and_libraries_2019.3.199/linux/bin/compilervars.sh intel64
source /gpfs/softs/cluster/intel/psxe/2019.3/compilers_and_libraries_2019.3.199/linux/mpi/intel64/bin/mpivars.sh

in the interactive mode
mpirun -np 4 $WIENROOT/lapw0_mpi lapw0.def
results in LAPW0 END

Actually, after I commented the following line in my script 
"if($natom < $nproc) set nproc0=$natom"
the "permission denied" error disappeared and mpi starts with the following output:

32 nodes for this job: n073 n073 n073 n073 n073 n073 n073 n073 n073 n073 n073 n073 n073 n073 n073 n073 n074 n074 n074 n074 n074 n074 n074 n074 n074 n074 n074 n074 n074 n074 n074 n074
 LAPW0 END
[1]    Done                          mpirun -n 32 -machinefile .machine0 /gpfs/home/ser/wienroot_v18/lapw0_mpi lapw0.def >> .time00
Force-convergence not possible. Forces not present.
 LAPW1 END
[1]  + Done                          ( cd $PWD; $t $ttt; rm -f .lock_$lockfile[$p] ) >> .time1_$loop
 LAPW1 END
[1]  + Done                          ( cd $PWD; $t $ttt; rm -f .lock_$lockfile[$p] ) >> .time1_$loop
LAPW2 - FERMI; weights written
LAPW2 - FERMI; weights written
 CORE  END
 CORE  END
 MIXER END

At the same time in dayfile:
Intel MKL ERROR: Parameter 3 was incorrect on entry to DGEMM .

Intel MKL ERROR: Parameter 3 was incorrect on entry to DGEMM .

Intel MKL ERROR: Parameter 3 was incorrect on entry to DGEMM .

Intel MKL ERROR: Parameter 8 was incorrect on entry to DGEMM .

Intel MKL ERROR: Parameter 8 was incorrect on entry to DGEMM .
....

and in scf file
:seclit_par:  estimate of singular value, factor:   0.3125E-01  0.1000E-14
:WARN  :seclit_par-stability trick active for: eigenvalue, sproj_ii     1  0.8207E-20
:WARN  :seclit_par-stability trick active for: eigenvalue, sproj_ii     2  0.6228E-19
:WARN  :seclit_par-stability trick active for: eigenvalue, sproj_ii     3  0.6073E-19
:WARN  :seclit_par-stability trick active for: eigenvalue, sproj_ii     4  0.6256E-19
:WARN  :seclit_par-stability trick active for: eigenvalue, sproj_ii     5  0.9136E-19
:WARN  :seclit_par-stability trick active for: eigenvalue, sproj_ii     6  0.9098E-19
:WARN  :seclit_par-stability trick active for: eigenvalue, sproj_ii     7  0.7724E-19
:WARN  :seclit_par-stability trick active for: eigenvalue, sproj_ii     8  0.7724E-19
:WARN  :seclit_par-stability trick active for: eigenvalue, sproj_ii     9  0.7534E-19
:WARN  :seclit_par-stability trick active for: eigenvalue, sproj_ii    10  0.2265E-19
:WARN  :seclit_par-stability trick active for: eigenvalue, sproj_ii    11  0.2059E-19
:WARN  :seclit_par-stability trick active for: eigenvalue, sproj_ii    12  0.2059E-19
:WARN  :seclit_par-stability trick active for: eigenvalue, sproj_ii    13  0.8401E-18
:WARN  :seclit_par-stability trick active for: eigenvalue, sproj_ii    14  0.8294E-18
:WARN  :seclit_par-stability trick active for: eigenvalue, sproj_ii    15  0.1019E-19
:WARN  :seclit_par-stability trick active for: eigenvalue, sproj_ii    16  0.2041E-19


This messages are absent in case of k-point parallelization

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20190507/05beb891/attachment.html>


More information about the Wien mailing list