[Wien] Segmentation error - Memory allocation for each processor

Mon Jan 27 16:10:11 CET 2025

Dear Wien2k members,

I have recently installed the latest version of wien2k24.1 using Intel
OneAPI. Thanks to Prof. Gavin Abo for providing the reference document to
install wien2k using ifx.
After the successful installation, I did the parallel calculation check
using 1. Fe2VAl symmetry based structure (Fm-3m, 225, 3 equivalent atomic
positions), 2. Fe2VAl conventional unitcell (P1, with 16 non-equivalent
atomic positions). I have used following system configuration for these
calculations,

Processor : AMD Ryzen 9 7950X 16-core processor x 32

OS Name : Ubuntu 22.04.5 LTS

OS type: 64-bit

RAM : 128 GB

I have run the spin polarized scf calculation to test the system
performance & estimated the time taken for calculations, The details of the
calculations are given below,

================   1. Fe2VAl symmetry based structure (Fm-3m, 225, 3
equivalent atomic positions) ================

runsp_lapw -p -ec 0.00001

Converged at 26th cycle

Total time taken for 26 cycles : 9 mins

47 k-points

++++  .machines (run at 24 processors)++++++

47:localhost:24
granularity:1
extrafine:1

venkatesh at venkatesh-PC:~/wiendata/FVA$ grep "Matrix size" *output1* -A18

FVA.output1up_1: Matrix size          248
FVA.output1up_1-Optimum Blocksize for setup**** Excess %  0.100D+03
FVA.output1up_1-Optimum Blocksize for diag  22 Excess %  0.115D+02
FVA.output1up_1-Base Blocksize   64 Diagonalization   32
FVA.output1up_1-          allocate H         0.0 MB          dimensions
 64    64
FVA.output1up_1-          allocate S         0.0 MB          dimensions
 64    64
FVA.output1up_1-     allocate spanel         0.0 MB          dimensions
 64    64
FVA.output1up_1-     allocate hpanel         0.0 MB          dimensions
 64    64
FVA.output1up_1-   allocate spanelus         0.0 MB          dimensions
 64    64
FVA.output1up_1-       allocate slen         0.0 MB          dimensions
 64    64
FVA.output1up_1-         allocate x2         0.0 MB          dimensions
 64    64
FVA.output1up_1-   allocate legendre         0.4 MB          dimensions
 64    13    64
FVA.output1up_1-allocate al,bl (row)         0.0 MB          dimensions
 64    11
FVA.output1up_1-allocate al,bl (col)         0.0 MB          dimensions
 64    11
FVA.output1up_1-         allocate YL         0.0 MB          dimensions
 15    64     2
FVA.output1up_1- number of local orbitals, nlo (hamilt)       44
FVA.output1up_1-       allocate YL           0.1 MB          dimensions
 15   248     2
FVA.output1up_1-       allocate phsc         0.0 MB          dimensions
248
FVA.output1up_1-Time for al,bl    (hamilt, cpu/wall) :         0.00
 0.00

================   2. A. Fe2VAl conventional unitcell (P1, with 16
non-equivalent atomic positions) FAILED ================

runsp_lapw -p -ec 0.00001

32 k-points

++++  .machines (run at 24 processors)++++++

32:localhost:24
granularity:1
extrafine:1

+++++++++++++++++++
in cycle 15    ETEST: .0071660350000000   CTEST: .2687245   STRTEST 2.59
 LAPW0 END
 LAPW1 END
[2]  - Done                          ( cd $PWD; $t $exe ${def}_$loop.def;
rm -f .lock_$lockfile[$p] ) >> .time1_$loop
 LAPW1 END
[1]    Done                          ( cd $PWD; $t $ttt; rm -f
.lock_$lockfile[$p] ) >> .time1_$loop
 LAPW1 END
[2]  - Done                          ( cd $PWD; $t $exe ${def}_$loop.def;
rm -f .lock_$lockfile[$p] ) >> .time1_$loop
 LAPW1 END
[1]    Done                          ( cd $PWD; $t $ttt; rm -f
.lock_$lockfile[$p] ) >> .time1_$loop
LAPW2 - FERMI; weights written
 LAPW2 END
 LAPW2 END
[2]  + Done                          ( cd $PWD; $t $exe ${def}_${loop}.def
$loop; rm -f .lock_$lockfile[$p] ) >> .time2_$loop
[1]  + Done                          ( cd $PWD; $t $ttt $vector_split; rm
-f .lock_$lockfile[$p] ) >> .time2_$loop
 SUMPARA END
LAPW2 - FERMI; weights written
Segmentation fault

+++++++++++++++++++

FVA_1.output1up_1: Matrix size          968
FVA_1.output1up_1-Optimum Blocksize for setup  82 Excess %  0.291D+01
FVA_1.output1up_1-Optimum Blocksize for diag  18 Excess %  0.413D+01
FVA_1.output1up_1-Base Blocksize   64 Diagonalization   32
FVA_1.output1up_1-          allocate H         0.8 MB          dimensions
192   256
FVA_1.output1up_1-          allocate S         0.8 MB          dimensions
192   256
FVA_1.output1up_1-     allocate spanel         0.2 MB          dimensions
192    64
FVA_1.output1up_1-     allocate hpanel         0.2 MB          dimensions
192    64
FVA_1.output1up_1-   allocate spanelus         0.2 MB          dimensions
192    64
FVA_1.output1up_1-       allocate slen         0.1 MB          dimensions
192    64
FVA_1.output1up_1-         allocate x2         0.1 MB          dimensions
192    64
FVA_1.output1up_1-   allocate legendre         1.2 MB          dimensions
192    13    64
FVA_1.output1up_1-allocate al,bl (row)         0.1 MB          dimensions
192    11
FVA_1.output1up_1-allocate al,bl (col)         0.0 MB          dimensions
 64    11
FVA_1.output1up_1-         allocate YL         0.0 MB          dimensions
 15   192     1
FVA_1.output1up_1- number of local orbitals, nlo (hamilt)      176
FVA_1.output1up_1-       allocate YL           0.2 MB          dimensions
 15   968     1
FVA_1.output1up_1-       allocate phsc         0.0 MB          dimensions
968
FVA_1.output1up_1-Time for al,bl    (hamilt, cpu/wall) :         0.00
 0.00

================   2. B. Fe2VAl conventional unitcell (P1, with 16
non-equivalent atomic positions) SUCCESSFULLY COMPLETED ================

runsp_lapw -p -ec 0.00001

Converged at 43rd cycle

Total time taken for 26 cycles :  61 mins

32 k-points

++++  .machines (run at 4 processors)++++++

8:localhost
8:localhost
8:localhost
8:localhost
granularity:1

+++++++++++++++++++++++

venkatesh at venkatesh-PC:~/wiendata/FVA_1$ grep "Matrix size" *output1* -A18

FVA_1.output1up_4: Matrix size          968
FVA_1.output1up_4-         allocate HS        14.3 MB
FVA_1.output1up_4-         allocate Z         14.3 MB
FVA_1.output1up_4-     allocate spanel         1.9 MB          dimensions
968   128
FVA_1.output1up_4-     allocate hpanel         1.9 MB          dimensions
968   128
FVA_1.output1up_4-   allocate spanelus         1.9 MB          dimensions
968   128
FVA_1.output1up_4-       allocate slen         0.9 MB          dimensions
968   128
FVA_1.output1up_4-         allocate x2         0.9 MB          dimensions
968   128
FVA_1.output1up_4-   allocate legendre        12.3 MB          dimensions
968    13   128
FVA_1.output1up_4-allocate al,bl (row)         0.3 MB          dimensions
968    11
FVA_1.output1up_4-allocate al,bl (col)         0.0 MB          dimensions
128    11
FVA_1.output1up_4-         allocate YL         0.2 MB          dimensions
 15   968     1
FVA_1.output1up_4- number of local orbitals, nlo (hamilt)      176
FVA_1.output1up_4-       allocate YL           0.2 MB          dimensions
 15   968     1
FVA_1.output1up_4-       allocate phsc         0.0 MB          dimensions
968
FVA_1.output1up_4-Time for al,bl    (hamilt, cpu/wall) :         0.01
 0.01
FVA_1.output1up_4-Time for legendre (hamilt, cpu/wall) :         0.03
 0.03
FVA_1.output1up_4-Time for phase    (hamilt, cpu/wall) :         0.08
 0.08
FVA_1.output1up_4-Time for us       (hamilt, cpu/wall) :         0.11
 0.12

I need a few clarifications on the memory used by each processor in order
to avoid any Segmentation fault error (as shown in case 2.A).

1. I have got a segmentation error for the 16 atomic calculation (2.A) with
24 processors and repeating the calculation with the same .machines file
sometimes lead to hanging at lapw1 calculation for a given cycle more than
50 minutes (I killed the process manually to stop the calculation). I hope
this is due to the fact that memory allocation in each processor is not
sufficient while calculations are going on. However, I am using 128 GB RAM
and why the memory is not properly allocated for this case. Did I get any
clue from the MatrixSize details specified from case 2.A

2. Now as shown in case 2.B, a change in .machines file worked without any
Segmentation error using 4 processors only. By comparing the MatrixSize
details of 3 cases (1, 2.A & 2.B), Can someone suggest to me how I can tune
the .machines files so that each processor can have more memory allocation
and I can use more cores (with more memory allocation) for speeding up the
calculation.

3. My goal was to run the calculations for 64 & 128 atoms of conventional
unitcell (even it take more time) without Segmentation error. Therefore, I
need a clarification on how to increase the memory allocation for each
processor using 128 RAM available on my PC. Hence, Please suggest to me how
to improve my memory allocation for each processor in order to run
calculations for bigger unit cells.

Thanks in advance for your help and let me know if you need any further
information on the details of calculations.

Regards,
Venkat
Physics Department,
IISc Bangalore, India.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20250127/1d1a9e0a/attachment.htm>