[Wien] [WIEN2k] abort of CPU core parallel jobs in NMR calculations of the current
Michael Fechtelkord
Michael.Fechtelkord at ruhr-uni-bochum.de
Sat May 11 16:19:36 CEST 2024
Hello Peter,
this is the .machines file content:
granulartity:1
omp_lapw0:8
omp_global:2
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
nmr_integ:localhost
nmr_integ:localhost
nmr_integ:localhost
nmr_integ:localhost
nmr_integ:localhost
nmr_integ:localhost
nmr_integ:localhost
nmr_integ:localhost
Best regards,
Michael
Am 11.05.2024 um 14:58 schrieb Peter Blaha:
> Hmm. ?
>
> Are you using k-parallel AND mpi-parallel ?? This could overload
> the machine.
>
> How does the .machines file look like ?
>
>
> Am 10.05.2024 um 18:15 schrieb Michael Fechtelkord via Wien:
>> Dear all,
>>
>>
>> the following problem occurs to me using the NMR part of WIEN2k
>> (23.2) on a opensuse LEAP 15.5 Intel platform. WIEN2k was compiled
>> using one-api 2024.1 ifort and gcc 13.2.1. I am using ELPA
>> 2024.03.01, Libxc 6.22, fftw 3.3.10 and MPICH 4.2.1 and the one-api
>> 2024.1 MKL libraries. The CPU is a I9 14900k with 24 cores where I
>> use eight for the calculations. The RAM is 130 Gb and a swap file of
>> 16 GB on a Samsung PCIE 4.0 NVME SSD. The BUS width is 5600 MT / s.
>>
>> The structure is a layersilicate and to simulate the ratio of Si:Al =
>> 3:1 I use a 1:1:2 supercell currently. The monoclinic symmetry of the
>> new structure (original is C 2/c) is P 2/c and contains 40 atoms (K,
>> Al, Si, O, and F).
>>
>> I use 3 NMR LOs for K and O and 10 for Si, Al, and F (where I need
>> the chemical shifts). The k mesh is 40k points.
>>
>> The interesting thing is that the RAM is sufficient during NMR vector
>> calculations (always under 100 Gb RAM occupied) and at the beginning
>> of the electron current calculation. However, the RAM use increases
>> to a critical point in the calculation and more and more data is
>> outsourced into the SWAP File which is sometimes 80% occupied.
>>
>> As you see this time only one core failed because of memory overflow.
>> But using 48k points 3 cores crashed and so the whole current
>> calculation. The reason is of the crash clear to me. But I do not
>> understand, why the current calculation reacts so sensitive with so
>> few atoms and a small k mesh. I made calculations with more atoms and
>> a 1000K point mesh on 4 cores .. they worked fine. So can it be that
>> the Intel MKL library is the source of failure? So I better get back
>> to 4 cores, even with longer calculation times?
>>
>> Have all a nice weekend!
>>
>>
>> Best wishes from
>>
>> Michael Fechtelkord
>>
>> -----------------------------------------------
>>
>> cd ./ ... x lcore -f MS_2M1_Al2
>> CORE END
>> 0.685u 0.028s 0:00.71 98.5% 0+0k 2336+16168io 5pf+0w
>>
>> lcore .... ready
>>
>>
>> EXECUTING: /usr/local/WIEN2k/nmr_mpi -case MS_2M1_Al2 -mode
>> current -green -scratch /scratch/WIEN2k/ -noco
>>
>> [1] 20253
>> [2] 20257
>> [3] 20261
>> [4] 20265
>> [5] 20269
>> [6] 20273
>> [7] 20277
>> [8] 20281
>> [8] + Abgebrochen ( cd $dir; $exec2 >>
>> nmr.out.${loop} ) >& nmr.err.$loop
>> [7] + Fertig ( cd $dir; $exec2 >>
>> nmr.out.${loop} ) >& nmr.err.$loop
>> [6] + Fertig ( cd $dir; $exec2 >>
>> nmr.out.${loop} ) >& nmr.err.$loop
>> [5] + Fertig ( cd $dir; $exec2 >>
>> nmr.out.${loop} ) >& nmr.err.$loop
>> [4] + Fertig ( cd $dir; $exec2 >>
>> nmr.out.${loop} ) >& nmr.err.$loop
>> [3] + Fertig ( cd $dir; $exec2 >>
>> nmr.out.${loop} ) >& nmr.err.$loop
>> [2] + Fertig ( cd $dir; $exec2 >>
>> nmr.out.${loop} ) >& nmr.err.$loop
>> [1] + Fertig ( cd $dir; $exec2 >>
>> nmr.out.${loop} ) >& nmr.err.$loop
>>
>> EXECUTING: /usr/local/WIEN2k/nmr -case MS_2M1_Al2 -mode sumpara
>> -p 8 -green -scratch /scratch/WIEN2k/
>>
>>
>> current .... ready
>>
>>
>> EXECUTING: mpirun -np 1 -machinefile .machine_nmrinteg
>> /usr/local/WIEN2k/nmr_mpi -case MS_2M1_Al2 -mode integ -green
>>
>>
>> nmr: integration ... done in 4032.3s
>>
>>
>> stop
>>
--
Dr. Michael Fechtelkord
Institut für Geologie, Mineralogie und Geophysik
Ruhr-Universität Bochum
Universitätsstr. 150
D-44780 Bochum
Phone: +49 (234) 32-24380
Fax: +49 (234) 32-04380
Email: Michael.Fechtelkord at ruhr-uni-bochum.de
Web Page: https://www.ruhr-uni-bochum.de/kristallographie/kc/mitarbeiter/fechtelkord/
More information about the Wien
mailing list