[Wien] [WIEN2k] abort of CPU core parallel jobs in NMR calculations of the current

Michael Fechtelkord Michael.Fechtelkord at ruhr-uni-bochum.de
Sat May 11 16:19:36 CEST 2024


Hello Peter,

this is the .machines file content:

granulartity:1
omp_lapw0:8
omp_global:2
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
1:localhost
nmr_integ:localhost
nmr_integ:localhost
nmr_integ:localhost
nmr_integ:localhost
nmr_integ:localhost
nmr_integ:localhost
nmr_integ:localhost
nmr_integ:localhost


Best regards,

Michael


Am 11.05.2024 um 14:58 schrieb Peter Blaha:
> Hmm. ?
>
> Are you using   k-parallel  AND  mpi-parallel ??  This could overload 
> the machine.
>
> How does the .machines file look like ?
>
>
> Am 10.05.2024 um 18:15 schrieb Michael Fechtelkord via Wien:
>> Dear all,
>>
>>
>> the following problem occurs to me using the NMR part of WIEN2k 
>> (23.2) on a opensuse LEAP 15.5 Intel platform. WIEN2k was compiled 
>> using one-api 2024.1 ifort and gcc 13.2.1. I am using ELPA 
>> 2024.03.01, Libxc 6.22, fftw 3.3.10 and MPICH 4.2.1 and the one-api 
>> 2024.1 MKL libraries. The CPU is a I9 14900k with 24 cores where I 
>> use eight for the calculations. The RAM is 130 Gb and a swap file of 
>> 16 GB on a Samsung PCIE 4.0 NVME SSD. The BUS width is 5600 MT / s.
>>
>> The structure is a layersilicate and to simulate the ratio of Si:Al = 
>> 3:1 I use a 1:1:2 supercell currently. The monoclinic symmetry of the 
>> new structure (original is C 2/c) is P 2/c and contains 40 atoms (K, 
>> Al, Si, O, and F).
>>
>> I use 3 NMR LOs for K and O and 10 for Si, Al, and F (where I need 
>> the chemical shifts). The k mesh is 40k points.
>>
>> The interesting thing is that the RAM is sufficient during NMR vector 
>> calculations (always under 100 Gb RAM occupied) and at the beginning 
>> of the electron current calculation. However, the RAM use increases 
>> to a critical point in the calculation and more and more data is 
>> outsourced into the SWAP File which is sometimes 80% occupied.
>>
>> As you see this time only one core failed because of memory overflow. 
>> But using 48k points 3 cores crashed and so the whole current 
>> calculation. The reason is of the crash clear to me. But I do not 
>> understand, why the current calculation reacts so sensitive with so 
>> few atoms and a small k mesh. I made calculations with more atoms and 
>> a 1000K point mesh on 4 cores .. they worked fine. So can it be that 
>> the Intel MKL library is the source of failure? So I better get back 
>> to 4 cores, even with longer calculation times?
>>
>> Have all a nice weekend!
>>
>>
>> Best wishes from
>>
>> Michael Fechtelkord
>>
>> -----------------------------------------------
>>
>> cd ./  ...  x lcore  -f MS_2M1_Al2
>>  CORE  END
>> 0.685u 0.028s 0:00.71 98.5%     0+0k 2336+16168io 5pf+0w
>>
>> lcore        ....  ready
>>
>>
>>  EXECUTING:     /usr/local/WIEN2k/nmr_mpi -case MS_2M1_Al2 -mode 
>> current    -green         -scratch /scratch/WIEN2k/ -noco
>>
>> [1] 20253
>> [2] 20257
>> [3] 20261
>> [4] 20265
>> [5] 20269
>> [6] 20273
>> [7] 20277
>> [8] 20281
>> [8]  + Abgebrochen                   ( cd $dir; $exec2 >> 
>> nmr.out.${loop} ) >& nmr.err.$loop
>> [7]  + Fertig                        ( cd $dir; $exec2 >> 
>> nmr.out.${loop} ) >& nmr.err.$loop
>> [6]  + Fertig                        ( cd $dir; $exec2 >> 
>> nmr.out.${loop} ) >& nmr.err.$loop
>> [5]  + Fertig                        ( cd $dir; $exec2 >> 
>> nmr.out.${loop} ) >& nmr.err.$loop
>> [4]  + Fertig                        ( cd $dir; $exec2 >> 
>> nmr.out.${loop} ) >& nmr.err.$loop
>> [3]  + Fertig                        ( cd $dir; $exec2 >> 
>> nmr.out.${loop} ) >& nmr.err.$loop
>> [2]  + Fertig                        ( cd $dir; $exec2 >> 
>> nmr.out.${loop} ) >& nmr.err.$loop
>> [1]  + Fertig                        ( cd $dir; $exec2 >> 
>> nmr.out.${loop} ) >& nmr.err.$loop
>>
>>  EXECUTING:     /usr/local/WIEN2k/nmr -case MS_2M1_Al2 -mode sumpara  
>> -p 8    -green -scratch /scratch/WIEN2k/
>>
>>
>> current        ....  ready
>>
>>
>>  EXECUTING:     mpirun -np 1 -machinefile .machine_nmrinteg 
>> /usr/local/WIEN2k/nmr_mpi -case MS_2M1_Al2 -mode integ -green
>>
>>
>> nmr:  integration  ... done in   4032.3s
>>
>>
>> stop
>>
-- 
Dr. Michael Fechtelkord

Institut für Geologie, Mineralogie und Geophysik
Ruhr-Universität Bochum
Universitätsstr. 150
D-44780 Bochum

Phone: +49 (234) 32-24380
Fax:  +49 (234) 32-04380
Email: Michael.Fechtelkord at ruhr-uni-bochum.de
Web Page: https://www.ruhr-uni-bochum.de/kristallographie/kc/mitarbeiter/fechtelkord/



More information about the Wien mailing list