[Wien] Segmentation error - Memory allocation for each processor
venky ch
chvenkateshphy at gmail.com
Mon Jan 27 16:10:11 CET 2025
Dear Wien2k members,
I have recently installed the latest version of wien2k24.1 using Intel
OneAPI. Thanks to Prof. Gavin Abo for providing the reference document to
install wien2k using ifx.
After the successful installation, I did the parallel calculation check
using 1. Fe2VAl symmetry based structure (Fm-3m, 225, 3 equivalent atomic
positions), 2. Fe2VAl conventional unitcell (P1, with 16 non-equivalent
atomic positions). I have used following system configuration for these
calculations,
Processor : AMD Ryzen 9 7950X 16-core processor x 32
OS Name : Ubuntu 22.04.5 LTS
OS type: 64-bit
RAM : 128 GB
I have run the spin polarized scf calculation to test the system
performance & estimated the time taken for calculations, The details of the
calculations are given below,
================ 1. Fe2VAl symmetry based structure (Fm-3m, 225, 3
equivalent atomic positions) ================
runsp_lapw -p -ec 0.00001
Converged at 26th cycle
Total time taken for 26 cycles : 9 mins
47 k-points
++++ .machines (run at 24 processors)++++++
47:localhost:24
granularity:1
extrafine:1
venkatesh at venkatesh-PC:~/wiendata/FVA$ grep "Matrix size" *output1* -A18
FVA.output1up_1: Matrix size 248
FVA.output1up_1-Optimum Blocksize for setup**** Excess % 0.100D+03
FVA.output1up_1-Optimum Blocksize for diag 22 Excess % 0.115D+02
FVA.output1up_1-Base Blocksize 64 Diagonalization 32
FVA.output1up_1- allocate H 0.0 MB dimensions
64 64
FVA.output1up_1- allocate S 0.0 MB dimensions
64 64
FVA.output1up_1- allocate spanel 0.0 MB dimensions
64 64
FVA.output1up_1- allocate hpanel 0.0 MB dimensions
64 64
FVA.output1up_1- allocate spanelus 0.0 MB dimensions
64 64
FVA.output1up_1- allocate slen 0.0 MB dimensions
64 64
FVA.output1up_1- allocate x2 0.0 MB dimensions
64 64
FVA.output1up_1- allocate legendre 0.4 MB dimensions
64 13 64
FVA.output1up_1-allocate al,bl (row) 0.0 MB dimensions
64 11
FVA.output1up_1-allocate al,bl (col) 0.0 MB dimensions
64 11
FVA.output1up_1- allocate YL 0.0 MB dimensions
15 64 2
FVA.output1up_1- number of local orbitals, nlo (hamilt) 44
FVA.output1up_1- allocate YL 0.1 MB dimensions
15 248 2
FVA.output1up_1- allocate phsc 0.0 MB dimensions
248
FVA.output1up_1-Time for al,bl (hamilt, cpu/wall) : 0.00
0.00
================ 2. A. Fe2VAl conventional unitcell (P1, with 16
non-equivalent atomic positions) FAILED ================
runsp_lapw -p -ec 0.00001
32 k-points
++++ .machines (run at 24 processors)++++++
32:localhost:24
granularity:1
extrafine:1
+++++++++++++++++++
in cycle 15 ETEST: .0071660350000000 CTEST: .2687245 STRTEST 2.59
LAPW0 END
LAPW1 END
[2] - Done ( cd $PWD; $t $exe ${def}_$loop.def;
rm -f .lock_$lockfile[$p] ) >> .time1_$loop
LAPW1 END
[1] Done ( cd $PWD; $t $ttt; rm -f
.lock_$lockfile[$p] ) >> .time1_$loop
LAPW1 END
[2] - Done ( cd $PWD; $t $exe ${def}_$loop.def;
rm -f .lock_$lockfile[$p] ) >> .time1_$loop
LAPW1 END
[1] Done ( cd $PWD; $t $ttt; rm -f
.lock_$lockfile[$p] ) >> .time1_$loop
LAPW2 - FERMI; weights written
LAPW2 END
LAPW2 END
[2] + Done ( cd $PWD; $t $exe ${def}_${loop}.def
$loop; rm -f .lock_$lockfile[$p] ) >> .time2_$loop
[1] + Done ( cd $PWD; $t $ttt $vector_split; rm
-f .lock_$lockfile[$p] ) >> .time2_$loop
SUMPARA END
LAPW2 - FERMI; weights written
Segmentation fault
+++++++++++++++++++
FVA_1.output1up_1: Matrix size 968
FVA_1.output1up_1-Optimum Blocksize for setup 82 Excess % 0.291D+01
FVA_1.output1up_1-Optimum Blocksize for diag 18 Excess % 0.413D+01
FVA_1.output1up_1-Base Blocksize 64 Diagonalization 32
FVA_1.output1up_1- allocate H 0.8 MB dimensions
192 256
FVA_1.output1up_1- allocate S 0.8 MB dimensions
192 256
FVA_1.output1up_1- allocate spanel 0.2 MB dimensions
192 64
FVA_1.output1up_1- allocate hpanel 0.2 MB dimensions
192 64
FVA_1.output1up_1- allocate spanelus 0.2 MB dimensions
192 64
FVA_1.output1up_1- allocate slen 0.1 MB dimensions
192 64
FVA_1.output1up_1- allocate x2 0.1 MB dimensions
192 64
FVA_1.output1up_1- allocate legendre 1.2 MB dimensions
192 13 64
FVA_1.output1up_1-allocate al,bl (row) 0.1 MB dimensions
192 11
FVA_1.output1up_1-allocate al,bl (col) 0.0 MB dimensions
64 11
FVA_1.output1up_1- allocate YL 0.0 MB dimensions
15 192 1
FVA_1.output1up_1- number of local orbitals, nlo (hamilt) 176
FVA_1.output1up_1- allocate YL 0.2 MB dimensions
15 968 1
FVA_1.output1up_1- allocate phsc 0.0 MB dimensions
968
FVA_1.output1up_1-Time for al,bl (hamilt, cpu/wall) : 0.00
0.00
================ 2. B. Fe2VAl conventional unitcell (P1, with 16
non-equivalent atomic positions) SUCCESSFULLY COMPLETED ================
runsp_lapw -p -ec 0.00001
Converged at 43rd cycle
Total time taken for 26 cycles : 61 mins
32 k-points
++++ .machines (run at 4 processors)++++++
8:localhost
8:localhost
8:localhost
8:localhost
granularity:1
+++++++++++++++++++++++
venkatesh at venkatesh-PC:~/wiendata/FVA_1$ grep "Matrix size" *output1* -A18
FVA_1.output1up_4: Matrix size 968
FVA_1.output1up_4- allocate HS 14.3 MB
FVA_1.output1up_4- allocate Z 14.3 MB
FVA_1.output1up_4- allocate spanel 1.9 MB dimensions
968 128
FVA_1.output1up_4- allocate hpanel 1.9 MB dimensions
968 128
FVA_1.output1up_4- allocate spanelus 1.9 MB dimensions
968 128
FVA_1.output1up_4- allocate slen 0.9 MB dimensions
968 128
FVA_1.output1up_4- allocate x2 0.9 MB dimensions
968 128
FVA_1.output1up_4- allocate legendre 12.3 MB dimensions
968 13 128
FVA_1.output1up_4-allocate al,bl (row) 0.3 MB dimensions
968 11
FVA_1.output1up_4-allocate al,bl (col) 0.0 MB dimensions
128 11
FVA_1.output1up_4- allocate YL 0.2 MB dimensions
15 968 1
FVA_1.output1up_4- number of local orbitals, nlo (hamilt) 176
FVA_1.output1up_4- allocate YL 0.2 MB dimensions
15 968 1
FVA_1.output1up_4- allocate phsc 0.0 MB dimensions
968
FVA_1.output1up_4-Time for al,bl (hamilt, cpu/wall) : 0.01
0.01
FVA_1.output1up_4-Time for legendre (hamilt, cpu/wall) : 0.03
0.03
FVA_1.output1up_4-Time for phase (hamilt, cpu/wall) : 0.08
0.08
FVA_1.output1up_4-Time for us (hamilt, cpu/wall) : 0.11
0.12
I need a few clarifications on the memory used by each processor in order
to avoid any Segmentation fault error (as shown in case 2.A).
1. I have got a segmentation error for the 16 atomic calculation (2.A) with
24 processors and repeating the calculation with the same .machines file
sometimes lead to hanging at lapw1 calculation for a given cycle more than
50 minutes (I killed the process manually to stop the calculation). I hope
this is due to the fact that memory allocation in each processor is not
sufficient while calculations are going on. However, I am using 128 GB RAM
and why the memory is not properly allocated for this case. Did I get any
clue from the MatrixSize details specified from case 2.A
2. Now as shown in case 2.B, a change in .machines file worked without any
Segmentation error using 4 processors only. By comparing the MatrixSize
details of 3 cases (1, 2.A & 2.B), Can someone suggest to me how I can tune
the .machines files so that each processor can have more memory allocation
and I can use more cores (with more memory allocation) for speeding up the
calculation.
3. My goal was to run the calculations for 64 & 128 atoms of conventional
unitcell (even it take more time) without Segmentation error. Therefore, I
need a clarification on how to increase the memory allocation for each
processor using 128 RAM available on my PC. Hence, Please suggest to me how
to improve my memory allocation for each processor in order to run
calculations for bigger unit cells.
Thanks in advance for your help and let me know if you need any further
information on the details of calculations.
Regards,
Venkat
Physics Department,
IISc Bangalore, India.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20250127/1d1a9e0a/attachment.htm>
More information about the Wien
mailing list