[Wien] [WIEN2k] abort of CPU core parallel jobs in NMR calculations of the current
Michael Fechtelkord
Michael.Fechtelkord at ruhr-uni-bochum.de
Sat May 11 20:10:34 CEST 2024
Hello Peter,
I just use "x_nmr_lapw -p" and the rest is initiated by the nmr script.
The Line "/usr/local/WIEN2k/nmr_mpi -case MS_2M1_Al2 -mode current
-green -scratch /scratch/WIEN2k/ -noco " is just part of the
whole procedure and not initiated by me manually.. (I only copied the
last lines of the calculation).
Best regards,
Michael
Am 11.05.2024 um 18:08 schrieb Peter Blaha:
> Hallo Michael,
>
> I don't understand the line:
>
> /usr/local/WIEN2k/nmr_mpi -case MS_2M1_Al2 -mode current
> -green -scratch /scratch/WIEN2k/ -noco
>
> The mode current should run only k-parallel, not in mpi ??
>
> PS: The repetition of
>
> nmr_integ:localhost is useless.
>
> nmr mode integ runs only once (not k-parallel, sumpara has already
> summed up the currents)
>
> But one can use nmr_integ:localhost:8
>
>
> Best regards
>
> Am 11.05.2024 um 16:19 schrieb Michael Fechtelkord via Wien:
>> Hello Peter,
>>
>> this is the .machines file content:
>>
>> granulartity:1
>> omp_lapw0:8
>> omp_global:2
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> 1:localhost
>> nmr_integ:localhost
>> nmr_integ:localhost
>> nmr_integ:localhost
>> nmr_integ:localhost
>> nmr_integ:localhost
>> nmr_integ:localhost
>> nmr_integ:localhost
>> nmr_integ:localhost
>>
>>
>> Best regards,
>>
>> Michael
>>
>>
>> Am 11.05.2024 um 14:58 schrieb Peter Blaha:
>>> Hmm. ?
>>>
>>> Are you using k-parallel AND mpi-parallel ?? This could
>>> overload the machine.
>>>
>>> How does the .machines file look like ?
>>>
>>>
>>> Am 10.05.2024 um 18:15 schrieb Michael Fechtelkord via Wien:
>>>> Dear all,
>>>>
>>>>
>>>> the following problem occurs to me using the NMR part of WIEN2k
>>>> (23.2) on a opensuse LEAP 15.5 Intel platform. WIEN2k was compiled
>>>> using one-api 2024.1 ifort and gcc 13.2.1. I am using ELPA
>>>> 2024.03.01, Libxc 6.22, fftw 3.3.10 and MPICH 4.2.1 and the one-api
>>>> 2024.1 MKL libraries. The CPU is a I9 14900k with 24 cores where I
>>>> use eight for the calculations. The RAM is 130 Gb and a swap file
>>>> of 16 GB on a Samsung PCIE 4.0 NVME SSD. The BUS width is 5600 MT / s.
>>>>
>>>> The structure is a layersilicate and to simulate the ratio of Si:Al
>>>> = 3:1 I use a 1:1:2 supercell currently. The monoclinic symmetry of
>>>> the new structure (original is C 2/c) is P 2/c and contains 40
>>>> atoms (K, Al, Si, O, and F).
>>>>
>>>> I use 3 NMR LOs for K and O and 10 for Si, Al, and F (where I need
>>>> the chemical shifts). The k mesh is 40k points.
>>>>
>>>> The interesting thing is that the RAM is sufficient during NMR
>>>> vector calculations (always under 100 Gb RAM occupied) and at the
>>>> beginning of the electron current calculation. However, the RAM use
>>>> increases to a critical point in the calculation and more and more
>>>> data is outsourced into the SWAP File which is sometimes 80% occupied.
>>>>
>>>> As you see this time only one core failed because of memory
>>>> overflow. But using 48k points 3 cores crashed and so the whole
>>>> current calculation. The reason is of the crash clear to me. But I
>>>> do not understand, why the current calculation reacts so sensitive
>>>> with so few atoms and a small k mesh. I made calculations with more
>>>> atoms and a 1000K point mesh on 4 cores .. they worked fine. So can
>>>> it be that the Intel MKL library is the source of failure? So I
>>>> better get back to 4 cores, even with longer calculation times?
>>>>
>>>> Have all a nice weekend!
>>>>
>>>>
>>>> Best wishes from
>>>>
>>>> Michael Fechtelkord
>>>>
>>>> -----------------------------------------------
>>>>
>>>> cd ./ ... x lcore -f MS_2M1_Al2
>>>> CORE END
>>>> 0.685u 0.028s 0:00.71 98.5% 0+0k 2336+16168io 5pf+0w
>>>>
>>>> lcore .... ready
>>>>
>>>>
>>>> EXECUTING: /usr/local/WIEN2k/nmr_mpi -case MS_2M1_Al2 -mode
>>>> current -green -scratch /scratch/WIEN2k/ -noco
>>>>
>>>> [1] 20253
>>>> [2] 20257
>>>> [3] 20261
>>>> [4] 20265
>>>> [5] 20269
>>>> [6] 20273
>>>> [7] 20277
>>>> [8] 20281
>>>> [8] + Abgebrochen ( cd $dir; $exec2 >>
>>>> nmr.out.${loop} ) >& nmr.err.$loop
>>>> [7] + Fertig ( cd $dir; $exec2 >>
>>>> nmr.out.${loop} ) >& nmr.err.$loop
>>>> [6] + Fertig ( cd $dir; $exec2 >>
>>>> nmr.out.${loop} ) >& nmr.err.$loop
>>>> [5] + Fertig ( cd $dir; $exec2 >>
>>>> nmr.out.${loop} ) >& nmr.err.$loop
>>>> [4] + Fertig ( cd $dir; $exec2 >>
>>>> nmr.out.${loop} ) >& nmr.err.$loop
>>>> [3] + Fertig ( cd $dir; $exec2 >>
>>>> nmr.out.${loop} ) >& nmr.err.$loop
>>>> [2] + Fertig ( cd $dir; $exec2 >>
>>>> nmr.out.${loop} ) >& nmr.err.$loop
>>>> [1] + Fertig ( cd $dir; $exec2 >>
>>>> nmr.out.${loop} ) >& nmr.err.$loop
>>>>
>>>> EXECUTING: /usr/local/WIEN2k/nmr -case MS_2M1_Al2 -mode
>>>> sumpara -p 8 -green -scratch /scratch/WIEN2k/
>>>>
>>>>
>>>> current .... ready
>>>>
>>>>
>>>> EXECUTING: mpirun -np 1 -machinefile .machine_nmrinteg
>>>> /usr/local/WIEN2k/nmr_mpi -case MS_2M1_Al2 -mode integ -green
>>>>
>>>>
>>>> nmr: integration ... done in 4032.3s
>>>>
>>>>
>>>> stop
>>>>
--
Dr. Michael Fechtelkord
Institut für Geologie, Mineralogie und Geophysik
Ruhr-Universität Bochum
Universitätsstr. 150
D-44780 Bochum
Phone: +49 (234) 32-24380
Fax: +49 (234) 32-04380
Email: Michael.Fechtelkord at ruhr-uni-bochum.de
Web Page: https://www.ruhr-uni-bochum.de/kristallographie/kc/mitarbeiter/fechtelkord/
More information about the Wien
mailing list