[Wien] [WIEN2k] abort of CPU core parallel jobs in NMR calculations of the current

Peter Blaha peter.blaha at tuwien.ac.at
Sat May 11 18:08:11 CEST 2024


Hallo Michael,

I don't understand the line:

/usr/local/WIEN2k/nmr_mpi -case MS_2M1_Al2 -mode current -green         
-scratch /scratch/WIEN2k/ -noco

The mode current should run only k-parallel, not in mpi ??

PS: The repetition of

nmr_integ:localhost    is useless.

nmr mode integ runs only once (not k-parallel, sumpara has already 
summed up the currents)

But one can use       nmr_integ:localhost:8


Best regards

Am 11.05.2024 um 16:19 schrieb Michael Fechtelkord via Wien:
> Hello Peter,
>
> this is the .machines file content:
>
> granulartity:1
> omp_lapw0:8
> omp_global:2
> 1:localhost
> 1:localhost
> 1:localhost
> 1:localhost
> 1:localhost
> 1:localhost
> 1:localhost
> 1:localhost
> nmr_integ:localhost
> nmr_integ:localhost
> nmr_integ:localhost
> nmr_integ:localhost
> nmr_integ:localhost
> nmr_integ:localhost
> nmr_integ:localhost
> nmr_integ:localhost
>
>
> Best regards,
>
> Michael
>
>
> Am 11.05.2024 um 14:58 schrieb Peter Blaha:
>> Hmm. ?
>>
>> Are you using   k-parallel  AND  mpi-parallel ??  This could overload 
>> the machine.
>>
>> How does the .machines file look like ?
>>
>>
>> Am 10.05.2024 um 18:15 schrieb Michael Fechtelkord via Wien:
>>> Dear all,
>>>
>>>
>>> the following problem occurs to me using the NMR part of WIEN2k 
>>> (23.2) on a opensuse LEAP 15.5 Intel platform. WIEN2k was compiled 
>>> using one-api 2024.1 ifort and gcc 13.2.1. I am using ELPA 
>>> 2024.03.01, Libxc 6.22, fftw 3.3.10 and MPICH 4.2.1 and the one-api 
>>> 2024.1 MKL libraries. The CPU is a I9 14900k with 24 cores where I 
>>> use eight for the calculations. The RAM is 130 Gb and a swap file of 
>>> 16 GB on a Samsung PCIE 4.0 NVME SSD. The BUS width is 5600 MT / s.
>>>
>>> The structure is a layersilicate and to simulate the ratio of Si:Al 
>>> = 3:1 I use a 1:1:2 supercell currently. The monoclinic symmetry of 
>>> the new structure (original is C 2/c) is P 2/c and contains 40 atoms 
>>> (K, Al, Si, O, and F).
>>>
>>> I use 3 NMR LOs for K and O and 10 for Si, Al, and F (where I need 
>>> the chemical shifts). The k mesh is 40k points.
>>>
>>> The interesting thing is that the RAM is sufficient during NMR 
>>> vector calculations (always under 100 Gb RAM occupied) and at the 
>>> beginning of the electron current calculation. However, the RAM use 
>>> increases to a critical point in the calculation and more and more 
>>> data is outsourced into the SWAP File which is sometimes 80% occupied.
>>>
>>> As you see this time only one core failed because of memory 
>>> overflow. But using 48k points 3 cores crashed and so the whole 
>>> current calculation. The reason is of the crash clear to me. But I 
>>> do not understand, why the current calculation reacts so sensitive 
>>> with so few atoms and a small k mesh. I made calculations with more 
>>> atoms and a 1000K point mesh on 4 cores .. they worked fine. So can 
>>> it be that the Intel MKL library is the source of failure? So I 
>>> better get back to 4 cores, even with longer calculation times?
>>>
>>> Have all a nice weekend!
>>>
>>>
>>> Best wishes from
>>>
>>> Michael Fechtelkord
>>>
>>> -----------------------------------------------
>>>
>>> cd ./  ...  x lcore  -f MS_2M1_Al2
>>>  CORE  END
>>> 0.685u 0.028s 0:00.71 98.5%     0+0k 2336+16168io 5pf+0w
>>>
>>> lcore        ....  ready
>>>
>>>
>>>  EXECUTING:     /usr/local/WIEN2k/nmr_mpi -case MS_2M1_Al2 -mode 
>>> current    -green         -scratch /scratch/WIEN2k/ -noco
>>>
>>> [1] 20253
>>> [2] 20257
>>> [3] 20261
>>> [4] 20265
>>> [5] 20269
>>> [6] 20273
>>> [7] 20277
>>> [8] 20281
>>> [8]  + Abgebrochen                   ( cd $dir; $exec2 >> 
>>> nmr.out.${loop} ) >& nmr.err.$loop
>>> [7]  + Fertig                        ( cd $dir; $exec2 >> 
>>> nmr.out.${loop} ) >& nmr.err.$loop
>>> [6]  + Fertig                        ( cd $dir; $exec2 >> 
>>> nmr.out.${loop} ) >& nmr.err.$loop
>>> [5]  + Fertig                        ( cd $dir; $exec2 >> 
>>> nmr.out.${loop} ) >& nmr.err.$loop
>>> [4]  + Fertig                        ( cd $dir; $exec2 >> 
>>> nmr.out.${loop} ) >& nmr.err.$loop
>>> [3]  + Fertig                        ( cd $dir; $exec2 >> 
>>> nmr.out.${loop} ) >& nmr.err.$loop
>>> [2]  + Fertig                        ( cd $dir; $exec2 >> 
>>> nmr.out.${loop} ) >& nmr.err.$loop
>>> [1]  + Fertig                        ( cd $dir; $exec2 >> 
>>> nmr.out.${loop} ) >& nmr.err.$loop
>>>
>>>  EXECUTING:     /usr/local/WIEN2k/nmr -case MS_2M1_Al2 -mode 
>>> sumpara  -p 8    -green -scratch /scratch/WIEN2k/
>>>
>>>
>>> current        ....  ready
>>>
>>>
>>>  EXECUTING:     mpirun -np 1 -machinefile .machine_nmrinteg 
>>> /usr/local/WIEN2k/nmr_mpi -case MS_2M1_Al2 -mode integ -green
>>>
>>>
>>> nmr:  integration  ... done in   4032.3s
>>>
>>>
>>> stop
>>>
-- 
-----------------------------------------------------------------------
Peter Blaha,  Inst. f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-158801165300
Email: peter.blaha at tuwien.ac.at
WWW:   http://www.imc.tuwien.ac.at      WIEN2k: http://www.wien2k.at
-------------------------------------------------------------------------



More information about the Wien mailing list