[Wien] [WIEN2k] abort of CPU core parallel jobs in NMR calculations of the current
Peter Blaha
peter.blaha at tuwien.ac.at
Sat May 11 14:58:37 CEST 2024
Hmm. ?
Are you using k-parallel AND mpi-parallel ?? This could overload
the machine.
How does the .machines file look like ?
Am 10.05.2024 um 18:15 schrieb Michael Fechtelkord via Wien:
> Dear all,
>
>
> the following problem occurs to me using the NMR part of WIEN2k (23.2)
> on a opensuse LEAP 15.5 Intel platform. WIEN2k was compiled using
> one-api 2024.1 ifort and gcc 13.2.1. I am using ELPA 2024.03.01, Libxc
> 6.22, fftw 3.3.10 and MPICH 4.2.1 and the one-api 2024.1 MKL
> libraries. The CPU is a I9 14900k with 24 cores where I use eight for
> the calculations. The RAM is 130 Gb and a swap file of 16 GB on a
> Samsung PCIE 4.0 NVME SSD. The BUS width is 5600 MT / s.
>
> The structure is a layersilicate and to simulate the ratio of Si:Al =
> 3:1 I use a 1:1:2 supercell currently. The monoclinic symmetry of the
> new structure (original is C 2/c) is P 2/c and contains 40 atoms (K,
> Al, Si, O, and F).
>
> I use 3 NMR LOs for K and O and 10 for Si, Al, and F (where I need the
> chemical shifts). The k mesh is 40k points.
>
> The interesting thing is that the RAM is sufficient during NMR vector
> calculations (always under 100 Gb RAM occupied) and at the beginning
> of the electron current calculation. However, the RAM use increases to
> a critical point in the calculation and more and more data is
> outsourced into the SWAP File which is sometimes 80% occupied.
>
> As you see this time only one core failed because of memory overflow.
> But using 48k points 3 cores crashed and so the whole current
> calculation. The reason is of the crash clear to me. But I do not
> understand, why the current calculation reacts so sensitive with so
> few atoms and a small k mesh. I made calculations with more atoms and
> a 1000K point mesh on 4 cores .. they worked fine. So can it be that
> the Intel MKL library is the source of failure? So I better get back
> to 4 cores, even with longer calculation times?
>
> Have all a nice weekend!
>
>
> Best wishes from
>
> Michael Fechtelkord
>
> -----------------------------------------------
>
> cd ./ ... x lcore -f MS_2M1_Al2
> CORE END
> 0.685u 0.028s 0:00.71 98.5% 0+0k 2336+16168io 5pf+0w
>
> lcore .... ready
>
>
> EXECUTING: /usr/local/WIEN2k/nmr_mpi -case MS_2M1_Al2 -mode
> current -green -scratch /scratch/WIEN2k/ -noco
>
> [1] 20253
> [2] 20257
> [3] 20261
> [4] 20265
> [5] 20269
> [6] 20273
> [7] 20277
> [8] 20281
> [8] + Abgebrochen ( cd $dir; $exec2 >>
> nmr.out.${loop} ) >& nmr.err.$loop
> [7] + Fertig ( cd $dir; $exec2 >>
> nmr.out.${loop} ) >& nmr.err.$loop
> [6] + Fertig ( cd $dir; $exec2 >>
> nmr.out.${loop} ) >& nmr.err.$loop
> [5] + Fertig ( cd $dir; $exec2 >>
> nmr.out.${loop} ) >& nmr.err.$loop
> [4] + Fertig ( cd $dir; $exec2 >>
> nmr.out.${loop} ) >& nmr.err.$loop
> [3] + Fertig ( cd $dir; $exec2 >>
> nmr.out.${loop} ) >& nmr.err.$loop
> [2] + Fertig ( cd $dir; $exec2 >>
> nmr.out.${loop} ) >& nmr.err.$loop
> [1] + Fertig ( cd $dir; $exec2 >>
> nmr.out.${loop} ) >& nmr.err.$loop
>
> EXECUTING: /usr/local/WIEN2k/nmr -case MS_2M1_Al2 -mode sumpara
> -p 8 -green -scratch /scratch/WIEN2k/
>
>
> current .... ready
>
>
> EXECUTING: mpirun -np 1 -machinefile .machine_nmrinteg
> /usr/local/WIEN2k/nmr_mpi -case MS_2M1_Al2 -mode integ -green
>
>
> nmr: integration ... done in 4032.3s
>
>
> stop
>
--
-----------------------------------------------------------------------
Peter Blaha, Inst. f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-158801165300
Email: peter.blaha at tuwien.ac.at
WWW: http://www.imc.tuwien.ac.at WIEN2k: http://www.wien2k.at
-------------------------------------------------------------------------
More information about the Wien
mailing list