[Wien] [WIEN2k] abort of CPU core parallel jobs in NMR calculations of the current

Michael Fechtelkord Michael.Fechtelkord at ruhr-uni-bochum.de
Sat May 11 20:10:34 CEST 2024


Hello Peter,


I just use "x_nmr_lapw -p" and the rest is initiated by the nmr script. 
The Line "/usr/local/WIEN2k/nmr_mpi -case MS_2M1_Al2 -mode current 
-green         -scratch /scratch/WIEN2k/ -noco " is just part of the 
whole procedure and not initiated by me manually.. (I only copied the 
last lines of the calculation).


Best regards,

Michael


Am 11.05.2024 um 18:08 schrieb Peter Blaha:
> Hallo Michael,
>
> I don't understand the line:
>
> /usr/local/WIEN2k/nmr_mpi -case MS_2M1_Al2 -mode current 
> -green         -scratch /scratch/WIEN2k/ -noco
>
> The mode current should run only k-parallel, not in mpi ??
>
> PS: The repetition of
>
> nmr_integ:localhost    is useless.
>
> nmr mode integ runs only once (not k-parallel, sumpara has already 
> summed up the currents)
>
> But one can use       nmr_integ:localhost:8
>
>
> Best regards
>
> Am 11.05.2024 um 16:19 schrieb Michael Fechtelkord via Wien:
>> Hello Peter,
>>
>> this is the .machines file content:
>>
>> granulartity:1
>> omp_lapw0:8
>> omp_global:2
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> nmr_integ:localhost
>> nmr_integ:localhost
>> nmr_integ:localhost
>> nmr_integ:localhost
>> nmr_integ:localhost
>> nmr_integ:localhost
>> nmr_integ:localhost
>> nmr_integ:localhost
>>
>>
>> Best regards,
>>
>> Michael
>>
>>
>> Am 11.05.2024 um 14:58 schrieb Peter Blaha:
>>> Hmm. ?
>>>
>>> Are you using   k-parallel  AND  mpi-parallel ??  This could 
>>> overload the machine.
>>>
>>> How does the .machines file look like ?
>>>
>>>
>>> Am 10.05.2024 um 18:15 schrieb Michael Fechtelkord via Wien:
>>>> Dear all,
>>>>
>>>>
>>>> the following problem occurs to me using the NMR part of WIEN2k 
>>>> (23.2) on a opensuse LEAP 15.5 Intel platform. WIEN2k was compiled 
>>>> using one-api 2024.1 ifort and gcc 13.2.1. I am using ELPA 
>>>> 2024.03.01, Libxc 6.22, fftw 3.3.10 and MPICH 4.2.1 and the one-api 
>>>> 2024.1 MKL libraries. The CPU is a I9 14900k with 24 cores where I 
>>>> use eight for the calculations. The RAM is 130 Gb and a swap file 
>>>> of 16 GB on a Samsung PCIE 4.0 NVME SSD. The BUS width is 5600 MT / s.
>>>>
>>>> The structure is a layersilicate and to simulate the ratio of Si:Al 
>>>> = 3:1 I use a 1:1:2 supercell currently. The monoclinic symmetry of 
>>>> the new structure (original is C 2/c) is P 2/c and contains 40 
>>>> atoms (K, Al, Si, O, and F).
>>>>
>>>> I use 3 NMR LOs for K and O and 10 for Si, Al, and F (where I need 
>>>> the chemical shifts). The k mesh is 40k points.
>>>>
>>>> The interesting thing is that the RAM is sufficient during NMR 
>>>> vector calculations (always under 100 Gb RAM occupied) and at the 
>>>> beginning of the electron current calculation. However, the RAM use 
>>>> increases to a critical point in the calculation and more and more 
>>>> data is outsourced into the SWAP File which is sometimes 80% occupied.
>>>>
>>>> As you see this time only one core failed because of memory 
>>>> overflow. But using 48k points 3 cores crashed and so the whole 
>>>> current calculation. The reason is of the crash clear to me. But I 
>>>> do not understand, why the current calculation reacts so sensitive 
>>>> with so few atoms and a small k mesh. I made calculations with more 
>>>> atoms and a 1000K point mesh on 4 cores .. they worked fine. So can 
>>>> it be that the Intel MKL library is the source of failure? So I 
>>>> better get back to 4 cores, even with longer calculation times?
>>>>
>>>> Have all a nice weekend!
>>>>
>>>>
>>>> Best wishes from
>>>>
>>>> Michael Fechtelkord
>>>>
>>>> -----------------------------------------------
>>>>
>>>> cd ./  ...  x lcore  -f MS_2M1_Al2
>>>>  CORE  END
>>>> 0.685u 0.028s 0:00.71 98.5%     0+0k 2336+16168io 5pf+0w
>>>>
>>>> lcore        ....  ready
>>>>
>>>>
>>>>  EXECUTING:     /usr/local/WIEN2k/nmr_mpi -case MS_2M1_Al2 -mode 
>>>> current    -green         -scratch /scratch/WIEN2k/ -noco
>>>>
>>>> [1] 20253
>>>> [2] 20257
>>>> [3] 20261
>>>> [4] 20265
>>>> [5] 20269
>>>> [6] 20273
>>>> [7] 20277
>>>> [8] 20281
>>>> [8]  + Abgebrochen                   ( cd $dir; $exec2 >> 
>>>> nmr.out.${loop} ) >& nmr.err.$loop
>>>> [7]  + Fertig                        ( cd $dir; $exec2 >> 
>>>> nmr.out.${loop} ) >& nmr.err.$loop
>>>> [6]  + Fertig                        ( cd $dir; $exec2 >> 
>>>> nmr.out.${loop} ) >& nmr.err.$loop
>>>> [5]  + Fertig                        ( cd $dir; $exec2 >> 
>>>> nmr.out.${loop} ) >& nmr.err.$loop
>>>> [4]  + Fertig                        ( cd $dir; $exec2 >> 
>>>> nmr.out.${loop} ) >& nmr.err.$loop
>>>> [3]  + Fertig                        ( cd $dir; $exec2 >> 
>>>> nmr.out.${loop} ) >& nmr.err.$loop
>>>> [2]  + Fertig                        ( cd $dir; $exec2 >> 
>>>> nmr.out.${loop} ) >& nmr.err.$loop
>>>> [1]  + Fertig                        ( cd $dir; $exec2 >> 
>>>> nmr.out.${loop} ) >& nmr.err.$loop
>>>>
>>>>  EXECUTING:     /usr/local/WIEN2k/nmr -case MS_2M1_Al2 -mode 
>>>> sumpara  -p 8    -green -scratch /scratch/WIEN2k/
>>>>
>>>>
>>>> current        ....  ready
>>>>
>>>>
>>>>  EXECUTING:     mpirun -np 1 -machinefile .machine_nmrinteg 
>>>> /usr/local/WIEN2k/nmr_mpi -case MS_2M1_Al2 -mode integ -green
>>>>
>>>>
>>>> nmr:  integration  ... done in   4032.3s
>>>>
>>>>
>>>> stop
>>>>
-- 
Dr. Michael Fechtelkord

Institut für Geologie, Mineralogie und Geophysik
Ruhr-Universität Bochum
Universitätsstr. 150
D-44780 Bochum

Phone: +49 (234) 32-24380
Fax:  +49 (234) 32-04380
Email: Michael.Fechtelkord at ruhr-uni-bochum.de
Web Page: https://www.ruhr-uni-bochum.de/kristallographie/kc/mitarbeiter/fechtelkord/



More information about the Wien mailing list