[Wien] Segmentation error - Memory allocation for each processor
Peter Blaha
peter.blaha at tuwien.ac.at
Mon Jan 27 18:17:03 CET 2025
i) You have a 16 core processor, so more than 16 parallel jobs are
useless and even reduce the performance.
ii) You should really read the parallelization section in the usersguide.
ii) You have to learn what the syntax in .machines really means:
One line:
47:localhost:24
means you are using mpi-parallel job (did you link with ELPA ???
otherwise the mpi job is pretty slow) on 24 cores. For your processor,
you should use at most 16 cores.
The "47" has NOTHING to do here (nothing to do with k-points,...), but
is a relative speed indicator when you would run k-parallel on 2 nodes
of different speed. Leave it at 1 for you.
So the .machines for your processor should look as:
1:localhost:16
omp_lapw1:1
omp_lapw2:1
iii) You have matrix size 248, this means you have to setup and solve a
248x248 matrix. In mpi-parallel this is decomposed in 16 (when you have
16 cores) sub matrices, each one with dimensions 62x62. This is MUCH TOO
SMALL to run efficiently in mpi mode.
For such a case one should use a mixed k-point and OpenMP
parallelization with a .machines file like:
1:localhost
1:localhost
1:localhost
1:localhost
omp_global:4
It will span 4 k-parallel jobs and each one will use 4 cores.
This is also the best configuration for your bigger case, which is still
too small (matrix size 968) for mpi.
The segmentation fault in your case has nothing to do with memory, as
you can see in your output1 files (16 Mb,....), and the crash happens in
lapw2, probably due to overloading the processor or a mpi problem
because the case is too small.
for 64 and 128 atom supercells, mpi parallelization may become useful.
-------------
iv) Checking your runs:
While it is running, use: top
it gives you the memory and shows how many cores are used for each job
(with the example above you should see 4 lapw1 executables with 400%
cpu usage.)
Also while running, view case.dayfile. you should see the cpu and wall
time of each job step.
> using 1. Fe2VAl symmetry based structure (Fm-3m, 225, 3 equivalent
> atomic positions), 2. Fe2VAl conventional unitcell (P1, with 16 non-
> equivalent atomic positions). I have used following system configuration
> for these calculations,
> ================ 1. Fe2VAl symmetry based structure (Fm-3m, 225, 3
> equivalent atomic positions) ================
>
> runsp_lapw -p -ec 0.00001
>
> Converged at 26th cycle
>
> Total time taken for 26 cycles : 9 mins
>
> 47 k-points
>
> ++++ .machines (run at 24 processors)++++++
>
> 47:localhost:24
> granularity:1
> extrafine:1
>
> venkatesh at venkatesh-PC:~/wiendata/FVA$ grep "Matrix size" *output1* -A18
>
> FVA.output1up_1: Matrix size 248
> FVA.output1up_1-Optimum Blocksize for setup**** Excess % 0.100D+03
> FVA.output1up_1-Optimum Blocksize for diag 22 Excess % 0.115D+02
> FVA.output1up_1-Base Blocksize 64 Diagonalization 32
> FVA.output1up_1- allocate H 0.0 MB dimensions
> 64 64
> FVA.output1up_1- allocate S 0.0 MB dimensions
> 64 64
> FVA.output1up_1- allocate spanel 0.0 MB dimensions
> 64 64
> FVA.output1up_1- allocate hpanel 0.0 MB dimensions
> 64 64
> FVA.output1up_1- allocate spanelus 0.0 MB dimensions
> 64 64
> FVA.output1up_1- allocate slen 0.0 MB dimensions
> 64 64
> FVA.output1up_1- allocate x2 0.0 MB dimensions
> 64 64
> FVA.output1up_1- allocate legendre 0.4 MB dimensions
> 64 13 64
> FVA.output1up_1-allocate al,bl (row) 0.0 MB dimensions
> 64 11
> FVA.output1up_1-allocate al,bl (col) 0.0 MB dimensions
> 64 11
> FVA.output1up_1- allocate YL 0.0 MB dimensions
> 15 64 2
> FVA.output1up_1- number of local orbitals, nlo (hamilt) 44
> FVA.output1up_1- allocate YL 0.1 MB dimensions
> 15 248 2
> FVA.output1up_1- allocate phsc 0.0 MB dimensions
> 248
> FVA.output1up_1-Time for al,bl (hamilt, cpu/wall) : 0.00
> 0.00
>
>
>
> ================ 2. A. Fe2VAl conventional unitcell (P1, with 16 non-
> equivalent atomic positions) FAILED ================
>
> runsp_lapw -p -ec 0.00001
>
> 32 k-points
>
> ++++ .machines (run at 24 processors)++++++
>
>
> 32:localhost:24
> granularity:1
> extrafine:1
>
> +++++++++++++++++++
> in cycle 15 ETEST: .0071660350000000 CTEST: .2687245 STRTEST 2.59
> LAPW0 END
> LAPW1 END
> [2] - Done ( cd $PWD; $t $exe ${def}
> _$loop.def; rm -f .lock_$lockfile[$p] ) >> .time1_$loop
> LAPW1 END
> [1] Done ( cd $PWD; $t $ttt; rm -
> f .lock_$lockfile[$p] ) >> .time1_$loop
> LAPW1 END
> [2] - Done ( cd $PWD; $t $exe ${def}
> _$loop.def; rm -f .lock_$lockfile[$p] ) >> .time1_$loop
> LAPW1 END
> [1] Done ( cd $PWD; $t $ttt; rm -
> f .lock_$lockfile[$p] ) >> .time1_$loop
> LAPW2 - FERMI; weights written
> LAPW2 END
> LAPW2 END
> [2] + Done ( cd $PWD; $t $exe ${def}
> _${loop}.def $loop; rm -f .lock_$lockfile[$p] ) >> .time2_$loop
> [1] + Done ( cd $PWD; $t $ttt $vector_split;
> rm -f .lock_$lockfile[$p] ) >> .time2_$loop
> SUMPARA END
> LAPW2 - FERMI; weights written
> Segmentation fault
>
> +++++++++++++++++++
>
> FVA_1.output1up_1: Matrix size 968
> FVA_1.output1up_1-Optimum Blocksize for setup 82 Excess % 0.291D+01
> FVA_1.output1up_1-Optimum Blocksize for diag 18 Excess % 0.413D+01
> FVA_1.output1up_1-Base Blocksize 64 Diagonalization 32
> FVA_1.output1up_1- allocate H 0.8 MB
> dimensions 192 256
> FVA_1.output1up_1- allocate S 0.8 MB
> dimensions 192 256
> FVA_1.output1up_1- allocate spanel 0.2 MB
> dimensions 192 64
> FVA_1.output1up_1- allocate hpanel 0.2 MB
> dimensions 192 64
> FVA_1.output1up_1- allocate spanelus 0.2 MB
> dimensions 192 64
> FVA_1.output1up_1- allocate slen 0.1 MB
> dimensions 192 64
> FVA_1.output1up_1- allocate x2 0.1 MB
> dimensions 192 64
> FVA_1.output1up_1- allocate legendre 1.2 MB
> dimensions 192 13 64
> FVA_1.output1up_1-allocate al,bl (row) 0.1 MB
> dimensions 192 11
> FVA_1.output1up_1-allocate al,bl (col) 0.0 MB
> dimensions 64 11
> FVA_1.output1up_1- allocate YL 0.0 MB
> dimensions 15 192 1
> FVA_1.output1up_1- number of local orbitals, nlo (hamilt) 176
> FVA_1.output1up_1- allocate YL 0.2 MB
> dimensions 15 968 1
> FVA_1.output1up_1- allocate phsc 0.0 MB
> dimensions 968
> FVA_1.output1up_1-Time for al,bl (hamilt, cpu/wall) : 0.00
> 0.00
>
>
>
> ================ 2. B. Fe2VAl conventional unitcell (P1, with 16 non-
> equivalent atomic positions) SUCCESSFULLY COMPLETED ================
>
> runsp_lapw -p -ec 0.00001
>
> Converged at 43rd cycle
>
> Total time taken for 26 cycles : 61 mins
>
> 32 k-points
>
> ++++ .machines (run at 4 processors)++++++
>
> 8:localhost
> 8:localhost
> 8:localhost
> 8:localhost
> granularity:1
>
> +++++++++++++++++++++++
>
> venkatesh at venkatesh-PC:~/wiendata/FVA_1$ grep "Matrix size" *output1* -A18
>
>
>
> FVA_1.output1up_4: Matrix size 968
> FVA_1.output1up_4- allocate HS 14.3 MB
> FVA_1.output1up_4- allocate Z 14.3 MB
> FVA_1.output1up_4- allocate spanel 1.9 MB
> dimensions 968 128
> FVA_1.output1up_4- allocate hpanel 1.9 MB
> dimensions 968 128
> FVA_1.output1up_4- allocate spanelus 1.9 MB
> dimensions 968 128
> FVA_1.output1up_4- allocate slen 0.9 MB
> dimensions 968 128
> FVA_1.output1up_4- allocate x2 0.9 MB
> dimensions 968 128
> FVA_1.output1up_4- allocate legendre 12.3 MB
> dimensions 968 13 128
> FVA_1.output1up_4-allocate al,bl (row) 0.3 MB
> dimensions 968 11
> FVA_1.output1up_4-allocate al,bl (col) 0.0 MB
> dimensions 128 11
> FVA_1.output1up_4- allocate YL 0.2 MB
> dimensions 15 968 1
> FVA_1.output1up_4- number of local orbitals, nlo (hamilt) 176
> FVA_1.output1up_4- allocate YL 0.2 MB
> dimensions 15 968 1
> FVA_1.output1up_4- allocate phsc 0.0 MB
> dimensions 968
> FVA_1.output1up_4-Time for al,bl (hamilt, cpu/wall) : 0.01
> 0.01
> FVA_1.output1up_4-Time for legendre (hamilt, cpu/wall) : 0.03
> 0.03
> FVA_1.output1up_4-Time for phase (hamilt, cpu/wall) : 0.08
> 0.08
> FVA_1.output1up_4-Time for us (hamilt, cpu/wall) : 0.11
> 0.12
>
>
>
> I need a few clarifications on the memory used by each processor in
> order to avoid any Segmentation fault error (as shown in case 2.A).
>
> 1. I have got a segmentation error for the 16 atomic calculation (2.A)
> with 24 processors and repeating the calculation with the same .machines
> file sometimes lead to hanging at lapw1 calculation for a given cycle
> more than 50 minutes (I killed the process manually to stop the
> calculation). I hope this is due to the fact that memory allocation in
> each processor is not sufficient while calculations are going on.
> However, I am using 128 GB RAM and why the memory is not properly
> allocated for this case. Did I get any clue from the MatrixSize details
> specified from case 2.A
>
>
> 2. Now as shown in case 2.B, a change in .machines file worked without
> any Segmentation error using 4 processors only. By comparing the
> MatrixSize details of 3 cases (1, 2.A & 2.B), Can someone suggest to me
> how I can tune the .machines files so that each processor can have more
> memory allocation and I can use more cores (with more memory allocation)
> for speeding up the calculation.
>
>
> 3. My goal was to run the calculations for 64 & 128 atoms of
> conventional unitcell (even it take more time) without Segmentation
> error. Therefore, I need a clarification on how to increase the memory
> allocation for each processor using 128 RAM available on my PC. Hence,
> Please suggest to me how to improve my memory allocation for each
> processor in order to run calculations for bigger unit cells.
>
>
> Thanks in advance for your help and let me know if you need any further
> information on the details of calculations.
>
> Regards,
> Venkat
> Physics Department,
> IISc Bangalore, India.
>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
--
-----------------------------------------------------------------------
Peter Blaha, Inst. f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-158801165300
Email: peter.blaha at tuwien.ac.at
WWW: http://www.imc.tuwien.ac.at WIEN2k: http://www.wien2k.at
-------------------------------------------------------------------------
More information about the Wien
mailing list