<div dir="ltr">Dear Wien2k members,<br><br>I have recently installed the latest version of wien2k24.1 using Intel OneAPI. Thanks to Prof. Gavin Abo for providing the reference document to install wien2k using ifx. <br>After the successful installation, I did the parallel calculation check using 1. Fe2VAl symmetry based structure (Fm-3m, 225, 3 equivalent atomic positions), 2. Fe2VAl conventional unitcell (P1, with 16 non-equivalent atomic positions). I have used following system configuration for these calculations,<br><br>Processor : AMD Ryzen 9 7950X 16-core processor x 32<br><br>OS Name : Ubuntu 22.04.5 LTS<br><br>OS type: 64-bit<br><br>RAM : 128 GB<br><br><br>I have run the spin polarized scf calculation to test the system performance & estimated the time taken for calculations, The details of the calculations are given below,<br><br>================  1. Fe2VAl symmetry based structure (Fm-3m, 225, 3 equivalent atomic positions) ================<br><br>runsp_lapw -p -ec 0.00001<br><br>Converged at 26th cycle <br><br>Total time taken for 26 cycles : 9 mins<br><br>47 k-points<br><br>++++  .machines (run at 24 processors)++++++<br><br>47:localhost:24<br>granularity:1<br>extrafine:1<br><br>venkatesh@venkatesh-PC:~/wiendata/FVA$ grep "Matrix size" *output1* -A18<br><br>FVA.output1up_1: Matrix size      248<br>FVA.output1up_1-Optimum Blocksize for setup**** Excess %  0.100D+03<br>FVA.output1up_1-Optimum Blocksize for diag  22 Excess %  0.115D+02<br>FVA.output1up_1-Base Blocksize  64 Diagonalization  32<br>FVA.output1up_1-      allocate H     0.0 MB      dimensions   64   64<br>FVA.output1up_1-      allocate S     0.0 MB      dimensions   64   64<br>FVA.output1up_1-   allocate spanel     0.0 MB      dimensions   64   64<br>FVA.output1up_1-   allocate hpanel     0.0 MB      dimensions   64   64<br>FVA.output1up_1-  allocate spanelus     0.0 MB      dimensions   64   64<br>FVA.output1up_1-    allocate slen     0.0 MB      dimensions   64   64<br>FVA.output1up_1-     allocate x2     0.0 MB      dimensions   64   64<br>FVA.output1up_1-  allocate legendre     0.4 MB      dimensions   64   13   64<br>FVA.output1up_1-allocate al,bl (row)     0.0 MB      dimensions   64   11<br>FVA.output1up_1-allocate al,bl (col)     0.0 MB      dimensions   64   11<br>FVA.output1up_1-     allocate YL     0.0 MB      dimensions   15   64   2<br>FVA.output1up_1- number of local orbitals, nlo (hamilt)    44<br>FVA.output1up_1-    allocate YL      0.1 MB      dimensions   15  248   2<br>FVA.output1up_1-    allocate phsc     0.0 MB      dimensions  248<br>FVA.output1up_1-Time for al,bl   (hamilt, cpu/wall) :     0.00     0.00<br><br><br><br>================  2. A. Fe2VAl conventional unitcell (P1, with 16 non-equivalent atomic positions) FAILED ================<br><br>runsp_lapw -p -ec 0.00001<br><br>32 k-points<br><br>++++  .machines (run at 24 processors)++++++<br><br><br>32:localhost:24<br>granularity:1<br>extrafine:1<br><br>+++++++++++++++++++<br>in cycle 15   ETEST: .0071660350000000  CTEST: .2687245  STRTEST 2.59<br> LAPW0 END<br> LAPW1 END<br>[2]  - Done              ( cd $PWD; $t $exe ${def}_$loop.def; rm -f .lock_$lockfile[$p] ) >> .time1_$loop<br> LAPW1 END<br>[1]   Done              ( cd $PWD; $t $ttt; rm -f .lock_$lockfile[$p] ) >> .time1_$loop<br> LAPW1 END<br>[2]  - Done              ( cd $PWD; $t $exe ${def}_$loop.def; rm -f .lock_$lockfile[$p] ) >> .time1_$loop<br> LAPW1 END<br>[1]   Done              ( cd $PWD; $t $ttt; rm -f .lock_$lockfile[$p] ) >> .time1_$loop<br>LAPW2 - FERMI; weights written<br> LAPW2 END<br> LAPW2 END<br>[2]  + Done              ( cd $PWD; $t $exe ${def}_${loop}.def $loop; rm -f .lock_$lockfile[$p] ) >> .time2_$loop<br>[1]  + Done              ( cd $PWD; $t $ttt $vector_split; rm -f .lock_$lockfile[$p] ) >> .time2_$loop<br> SUMPARA END<br>LAPW2 - FERMI; weights written<br>Segmentation fault<br><br>+++++++++++++++++++<br><br>FVA_1.output1up_1: Matrix size      968<br>FVA_1.output1up_1-Optimum Blocksize for setup  82 Excess %  0.291D+01<br>FVA_1.output1up_1-Optimum Blocksize for diag  18 Excess %  0.413D+01<br>FVA_1.output1up_1-Base Blocksize  64 Diagonalization  32<br>FVA_1.output1up_1-      allocate H     0.8 MB      dimensions  192  256<br>FVA_1.output1up_1-      allocate S     0.8 MB      dimensions  192  256<br>FVA_1.output1up_1-   allocate spanel     0.2 MB      dimensions  192   64<br>FVA_1.output1up_1-   allocate hpanel     0.2 MB      dimensions  192   64<br>FVA_1.output1up_1-  allocate spanelus     0.2 MB      dimensions  192   64<br>FVA_1.output1up_1-    allocate slen     0.1 MB      dimensions  192   64<br>FVA_1.output1up_1-     allocate x2     0.1 MB      dimensions  192   64<br>FVA_1.output1up_1-  allocate legendre     1.2 MB      dimensions  192   13   64<br>FVA_1.output1up_1-allocate al,bl (row)     0.1 MB      dimensions  192   11<br>FVA_1.output1up_1-allocate al,bl (col)     0.0 MB      dimensions   64   11<br>FVA_1.output1up_1-     allocate YL     0.0 MB      dimensions   15  192   1<br>FVA_1.output1up_1- number of local orbitals, nlo (hamilt)    176<br>FVA_1.output1up_1-    allocate YL      0.2 MB      dimensions   15  968   1<br>FVA_1.output1up_1-    allocate phsc     0.0 MB      dimensions  968<br>FVA_1.output1up_1-Time for al,bl   (hamilt, cpu/wall) :     0.00     0.00<br><br><br><br>================  2. B. Fe2VAl conventional unitcell (P1, with 16 non-equivalent atomic positions) SUCCESSFULLY COMPLETED ================<br><br>runsp_lapw -p -ec 0.00001<br><br>Converged at 43rd cycle <br><br>Total time taken for 26 cycles :  61 mins<br><br>32 k-points<br><br>++++  .machines (run at 4 processors)++++++<br><br>8:localhost<br>8:localhost<br>8:localhost<br>8:localhost<br>granularity:1<br><br>+++++++++++++++++++++++<br><br>venkatesh@venkatesh-PC:~/wiendata/FVA_1$ grep "Matrix size" *output1* -A18<br><br><br><br>FVA_1.output1up_4: Matrix size      968<br>FVA_1.output1up_4-     allocate HS     14.3 MB <br>FVA_1.output1up_4-     allocate Z     14.3 MB <br>FVA_1.output1up_4-   allocate spanel     1.9 MB      dimensions  968  128<br>FVA_1.output1up_4-   allocate hpanel     1.9 MB      dimensions  968  128<br>FVA_1.output1up_4-  allocate spanelus     1.9 MB      dimensions  968  128<br>FVA_1.output1up_4-    allocate slen     0.9 MB      dimensions  968  128<br>FVA_1.output1up_4-     allocate x2     0.9 MB      dimensions  968  128<br>FVA_1.output1up_4-  allocate legendre     12.3 MB      dimensions  968   13  128<br>FVA_1.output1up_4-allocate al,bl (row)     0.3 MB      dimensions  968   11<br>FVA_1.output1up_4-allocate al,bl (col)     0.0 MB      dimensions  128   11<br>FVA_1.output1up_4-     allocate YL     0.2 MB      dimensions   15  968   1<br>FVA_1.output1up_4- number of local orbitals, nlo (hamilt)    176<br>FVA_1.output1up_4-    allocate YL      0.2 MB      dimensions   15  968   1<br>FVA_1.output1up_4-    allocate phsc     0.0 MB      dimensions  968<br>FVA_1.output1up_4-Time for al,bl   (hamilt, cpu/wall) :     0.01     0.01<br>FVA_1.output1up_4-Time for legendre (hamilt, cpu/wall) :     0.03     0.03<br>FVA_1.output1up_4-Time for phase   (hamilt, cpu/wall) :     0.08     0.08<br>FVA_1.output1up_4-Time for us    (hamilt, cpu/wall) :     0.11     0.12<br><br><br><br>I need a few clarifications on the memory used by each processor in order to avoid any Segmentation fault error (as shown in case 2.A).<br><br>1. I have got a segmentation error for the 16 atomic calculation (2.A) with 24 processors and repeating the calculation with the same .machines file sometimes lead to hanging at lapw1 calculation for a given cycle more than 50 minutes (I killed the process manually to stop the calculation). I hope this is due to the fact that memory allocation in each processor is not sufficient while calculations are going on. However, I am using 128 GB RAM and why the memory is not properly allocated for this case. Did I get any clue from the MatrixSize details specified from case 2.A<br> <br><br>2. Now as shown in case 2.B, a change in .machines file worked without any Segmentation error using 4 processors only. By comparing the MatrixSize details of 3 cases (1, 2.A & 2.B), Can someone suggest to me how I can tune the .machines files so that each processor can have more memory allocation and I can use more cores (with more memory allocation) for speeding up the calculation.<br><br><br>3. My goal was to run the calculations for 64 & 128 atoms of conventional unitcell (even it take more time) without Segmentation error. Therefore, I need a clarification on how to increase the memory allocation for each processor using 128 RAM available on my PC. Hence, Please suggest to me how to improve my memory allocation for each processor in order to run calculations for bigger unit cells.<br><br><br>Thanks in advance for your help and let me know if you need any further information on the details of calculations.<br><br>Regards,<br>Venkat<br>Physics Department,<br>IISc Bangalore, India.</div>