[Wien] [WIEN2k] abort of CPU core parallel jobs in NMR calculations of the current
Peter Blaha
peter.blaha at tuwien.ac.at
Sat May 11 18:08:11 CEST 2024
Hallo Michael,
I don't understand the line:
/usr/local/WIEN2k/nmr_mpi -case MS_2M1_Al2 -mode current -green
-scratch /scratch/WIEN2k/ -noco
The mode current should run only k-parallel, not in mpi ??
PS: The repetition of
nmr_integ:localhost is useless.
nmr mode integ runs only once (not k-parallel, sumpara has already
summed up the currents)
But one can use nmr_integ:localhost:8
Best regards
Am 11.05.2024 um 16:19 schrieb Michael Fechtelkord via Wien:
> Hello Peter,
>
> this is the .machines file content:
>
> granulartity:1
> omp_lapw0:8
> omp_global:2
> 1:localhost
> 1:localhost
> 1:localhost
> 1:localhost
> 1:localhost
> 1:localhost
> 1:localhost
> 1:localhost
> nmr_integ:localhost
> nmr_integ:localhost
> nmr_integ:localhost
> nmr_integ:localhost
> nmr_integ:localhost
> nmr_integ:localhost
> nmr_integ:localhost
> nmr_integ:localhost
>
>
> Best regards,
>
> Michael
>
>
> Am 11.05.2024 um 14:58 schrieb Peter Blaha:
>> Hmm. ?
>>
>> Are you using k-parallel AND mpi-parallel ?? This could overload
>> the machine.
>>
>> How does the .machines file look like ?
>>
>>
>> Am 10.05.2024 um 18:15 schrieb Michael Fechtelkord via Wien:
>>> Dear all,
>>>
>>>
>>> the following problem occurs to me using the NMR part of WIEN2k
>>> (23.2) on a opensuse LEAP 15.5 Intel platform. WIEN2k was compiled
>>> using one-api 2024.1 ifort and gcc 13.2.1. I am using ELPA
>>> 2024.03.01, Libxc 6.22, fftw 3.3.10 and MPICH 4.2.1 and the one-api
>>> 2024.1 MKL libraries. The CPU is a I9 14900k with 24 cores where I
>>> use eight for the calculations. The RAM is 130 Gb and a swap file of
>>> 16 GB on a Samsung PCIE 4.0 NVME SSD. The BUS width is 5600 MT / s.
>>>
>>> The structure is a layersilicate and to simulate the ratio of Si:Al
>>> = 3:1 I use a 1:1:2 supercell currently. The monoclinic symmetry of
>>> the new structure (original is C 2/c) is P 2/c and contains 40 atoms
>>> (K, Al, Si, O, and F).
>>>
>>> I use 3 NMR LOs for K and O and 10 for Si, Al, and F (where I need
>>> the chemical shifts). The k mesh is 40k points.
>>>
>>> The interesting thing is that the RAM is sufficient during NMR
>>> vector calculations (always under 100 Gb RAM occupied) and at the
>>> beginning of the electron current calculation. However, the RAM use
>>> increases to a critical point in the calculation and more and more
>>> data is outsourced into the SWAP File which is sometimes 80% occupied.
>>>
>>> As you see this time only one core failed because of memory
>>> overflow. But using 48k points 3 cores crashed and so the whole
>>> current calculation. The reason is of the crash clear to me. But I
>>> do not understand, why the current calculation reacts so sensitive
>>> with so few atoms and a small k mesh. I made calculations with more
>>> atoms and a 1000K point mesh on 4 cores .. they worked fine. So can
>>> it be that the Intel MKL library is the source of failure? So I
>>> better get back to 4 cores, even with longer calculation times?
>>>
>>> Have all a nice weekend!
>>>
>>>
>>> Best wishes from
>>>
>>> Michael Fechtelkord
>>>
>>> -----------------------------------------------
>>>
>>> cd ./ ... x lcore -f MS_2M1_Al2
>>> CORE END
>>> 0.685u 0.028s 0:00.71 98.5% 0+0k 2336+16168io 5pf+0w
>>>
>>> lcore .... ready
>>>
>>>
>>> EXECUTING: /usr/local/WIEN2k/nmr_mpi -case MS_2M1_Al2 -mode
>>> current -green -scratch /scratch/WIEN2k/ -noco
>>>
>>> [1] 20253
>>> [2] 20257
>>> [3] 20261
>>> [4] 20265
>>> [5] 20269
>>> [6] 20273
>>> [7] 20277
>>> [8] 20281
>>> [8] + Abgebrochen ( cd $dir; $exec2 >>
>>> nmr.out.${loop} ) >& nmr.err.$loop
>>> [7] + Fertig ( cd $dir; $exec2 >>
>>> nmr.out.${loop} ) >& nmr.err.$loop
>>> [6] + Fertig ( cd $dir; $exec2 >>
>>> nmr.out.${loop} ) >& nmr.err.$loop
>>> [5] + Fertig ( cd $dir; $exec2 >>
>>> nmr.out.${loop} ) >& nmr.err.$loop
>>> [4] + Fertig ( cd $dir; $exec2 >>
>>> nmr.out.${loop} ) >& nmr.err.$loop
>>> [3] + Fertig ( cd $dir; $exec2 >>
>>> nmr.out.${loop} ) >& nmr.err.$loop
>>> [2] + Fertig ( cd $dir; $exec2 >>
>>> nmr.out.${loop} ) >& nmr.err.$loop
>>> [1] + Fertig ( cd $dir; $exec2 >>
>>> nmr.out.${loop} ) >& nmr.err.$loop
>>>
>>> EXECUTING: /usr/local/WIEN2k/nmr -case MS_2M1_Al2 -mode
>>> sumpara -p 8 -green -scratch /scratch/WIEN2k/
>>>
>>>
>>> current .... ready
>>>
>>>
>>> EXECUTING: mpirun -np 1 -machinefile .machine_nmrinteg
>>> /usr/local/WIEN2k/nmr_mpi -case MS_2M1_Al2 -mode integ -green
>>>
>>>
>>> nmr: integration ... done in 4032.3s
>>>
>>>
>>> stop
>>>
--
-----------------------------------------------------------------------
Peter Blaha, Inst. f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-158801165300
Email: peter.blaha at tuwien.ac.at
WWW: http://www.imc.tuwien.ac.at WIEN2k: http://www.wien2k.at
-------------------------------------------------------------------------
More information about the Wien
mailing list