[Wien] New findings on the lapw0 seg fault core dump error

Michael Fechtelkord Michael.Fechtelkord at ruhr-uni-bochum.de
Sun Jun 8 13:13:37 CEST 2025


Hello Gerhard and Peter,


I am using ifx 2025.1.1 and I also read that OpenMP reductions cause a 
segfault using Intel compilers. They recommend serializing the loops or 
removing the line that performs the reduction eliminate the segfault.

https://github.com/flang-compiler/flang/issues/56


I have answered Peter's question below inserted between his comments.

So can I comment the reduction procedure out (it is not needed?). 
Serializing in the first cycle I did already by setting omp_lapw0:1. 
After the first cycle lapw0 runs smooth even with 8 omp_threads.


Best regards,

Michael


Am 08.06.2025 um 10:27 schrieb Fecher, Gerhard:
> Dear Peter and Michael,
> I receive the segmentation fault  with OneAPI 2024.2 and OneAPI 2025.1
> it appears already with -O1
>
> I mentioned already some time ago: when I comment the $omp directives at lines 1649 ff. then the program runs smooth.
>
> It seems that this is an old unresolved problem, as it is mentioned in a comment by jdoumont 30/7/20
> (however, it seems not to depend on the size of the calculation)
>
> Ciao
> Gerhard
>
> DEEP THOUGHT in D. Adams; Hitchhikers Guide to the Galaxy:
> "I think the problem, to be quite honest with you,
> is that you have never actually known what the question is."
>
> ====================================
> Dr. Gerhard H. Fecher
> Institut of Physics
> Johannes Gutenberg - University
> 55099 Mainz
> ________________________________________
> Von: Wien [wien-bounces at zeus.theochem.tuwien.ac.at] im Auftrag von Peter Blaha [peter.blaha at tuwien.ac.at]
> Gesendet: Samstag, 7. Juni 2025 20:40
> An: wien at zeus.theochem.tuwien.ac.at
> Betreff: Re: [Wien] New findings on the lapw0 seg fault core dump error
>
> Very curious.
>
> Is "number of PW"  in case.clmsum   after init_lapw   and after the
> first cycle identical ?
Number of PW is 2239 in the starting case.clmsum as well as in the 
case.clmsum after the first cycle
>
> Since this is a small case: Can you manually look at the
> Fouriercoefficients in clmsum. Any "huge" numbers ? Any *** numbers,
No big numbers, no  ****
>
> After dstart, I guess none of the FK are zero. After mixer (after 1st
> iteration) the later ones should be zero.
>
> My guess is a problem in the libthread library of your compiler version
> (ifx 2025.xxx ?). The problems did not show up with previous compilers ?
I am using ifx 2025.1.1
>
>
> Am 07.06.2025 um 18:18 schrieb Michael Fechtelkord via Wien:
>> smiles .. no it is MgF2.. Just two atoms in a cubic cell. and it is not
>> dependent on the structure. It crashes for all in the first cycle using
>> the clmsum from the init_lapw
>>
>> Am 07.06.2025 um 17:34 schrieb Peter Blaha:
>>> Is this a big supercell ?
>>>
>>> The only thing I could imagine is that the number of PWs is bigger
>>> after dstart than after the 1st cycle.
>>> grep for "PW" in the clmsum files from dstart and after the 1st cycle.
>>> Eventually reduce number of PW until it works as a temporary fix.
>>> It might be a "stack" problem and I think one can increase this
>>> somehow, but I can't remember how.
>>>
>>> Am 06.06.2025 um 22:25 schrieb Michael Fechtelkord via Wien:
>>>> and a additional comment.
>>>>
>>>>
>>>> lapw0 crashes only in the first cycle with OMP_NUM_THREADS higher
>>>> than 1. When I set lapw0:1 for the first cycle (using -i 1 in
>>>> run_lapw) and then after the first run set it back to lapw0:8 it runs
>>>> without a problem for the complete scf cycle. It seems that is a
>>>> problem with  the initial case.clmsum file (init_lapw -b -prec 1).
>>>>
>>>>
>>>> Am 06.06.2025 um 22:07 schrieb Michael Fechtelkord via Wien:
>>>>> Hello Peter,
>>>>>
>>>>>
>>>>> omp_lapw0 in .machines was 8. I reduced it from 8 to 4, then to 2
>>>>> and finally to 1. Only in the case of omp_lapw0:1 lapw0 does not crash.
>>>>>
>>>>> omp_global:2
>>>>>
>>>>>
>>>>> Best regards,
>>>>>
>>>>> Michael
>>>>>
>>>>>
>>>>> Am 06.06.2025 um 17:59 schrieb Peter Blaha:
>>>>>> What was your   OMP_NUM_THREADS variable ?
>>>>>>
>>>>>> Set it to 1, 2, ... and check if the error occurs again.
>>>>>>
>>>>>> Am 06.06.2025 um 14:07 schrieb Michael Fechtelkord via Wien:
>>>>>>> I debugged the core-dump file with gdb and using debugging symbols
>>>>>>> in compilation of lapw0.
>>>>>>>
>>>>>>> The debugger gave me the line which causes the coredump
>>>>>>>
>>>>>>> _----------------------------------------
>>>>>>>
>>>>>>> Debuginfod has been enabled.
>>>>>>> To make this setting permanent, add 'set debuginfod enabled on'
>>>>>>> to .gdbinit.
>>>>>>> [Thread debugging using libthread_db enabled]
>>>>>>> Using host libthread_db library "/lib64/libthread_db.so.1".
>>>>>>> Core was generated by `/usr/local/WIEN2k/lapw0 lapw0.def'.
>>>>>>> Program terminated with signal SIGSEGV, Segmentation fault.
>>>>>>>
>>>>>>> #0  0x000000000048b89b in
>>>>>>> MAIN__.DIR.OMP.PARALLEL.LOOP.12.split63842.split63939 ()*at
>>>>>>> lapw0.F:1649*
>>>>>>>
>>>>>>> *1649    !$omp parallel do reduction(+:rhopw00,cwk,cvout) &*
>>>>>>>
>>>>>>>
>>>>>>> [Current thread is 1 (Thread 0x14823edbe740 (LWP 339344))]
>>>>>>>
>>>>>>> ------------------------------------
>>>>>>>
>>>>>>> Maybe somebody has an idea how to fix it..
>>>>>>>
>>>>>>>
>>>>>>> Best regards
>>>>>>>
>>>>>>> Michael
>>>>>>>
>>>>>>>
>>>>>>> Am 17.05.2025 um 13:48 schrieb Michael Fechtelkord via Wien:
>>>>>>>> Hello everybody,
>>>>>>>>
>>>>>>>>
>>>>>>>> I have new results considering the lapw0 crash which happens
>>>>>>>> partially (segmentation fault error - core dump).
>>>>>>>>
>>>>>>>> It seems that the crucial thing is the case.clmsum file. (I am no
>>>>>>>> expert here) But if this is somehow the key. It can produce the
>>>>>>>> lapw0 so it might be that it is sometimes triggering the lapw0.
>>>>>>>>
>>>>>>>> I calculated MgF2 and substituted the new generated clmsum by an
>>>>>>>> older one and then there was no crash. I cannot attach them
>>>>>>>> because the file size is too large.
>>>>>>>>
>>>>>>>>
>>>>>>>> I am not so into debugging, to find out why and where it happens.
>>>>>>>>
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>>
>>>>>>>> Michael
>>>>>>>>
>>>>>>>>
>>>>>>> --
>>>>>>> Dr. Michael Fechtelkord
>>>>>>>
>>>>>>> Institut für Geologie, Mineralogie und Geophysik
>>>>>>> Ruhr-Universität Bochum
>>>>>>> Universitätsstr. 150
>>>>>>> D-44780 Bochum
>>>>>>>
>>>>>>> Phone: +49 (234) 32-24380
>>>>>>> Fax:  +49 (234) 32-04380
>>>>>>> Email:Michael.Fechtelkord at ruhr-uni-bochum.de
>>>>>>> Web Page:https://www.ruhr-uni-bochum.de/kristallographie/kc/
>>>>>>> mitarbeiter/fechtelkord/
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Wien mailing list
>>>>>>> Wien at zeus.theochem.tuwien.ac.at
>>>>>>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>>>>>>> SEARCH the MAILING-LIST at: http://www.mail-archive.com/
>>>>>>> wien at zeus.theochem.tuwien.ac.at/index.html
> --
> -----------------------------------------------------------------------
> Peter Blaha,  Inst. f. Materials Chemistry, TU Vienna, A-1060 Vienna
> Phone: +43-158801165300
> Email: peter.blaha at tuwien.ac.at
> WWW:   http://www.imc.tuwien.ac.at      WIEN2k: http://www.wien2k.at
> -------------------------------------------------------------------------
>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:  http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:  http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

-- 
Dr. Michael Fechtelkord

Institut für Geologie, Mineralogie und Geophysik
Ruhr-Universität Bochum
Universitätsstr. 150
D-44780 Bochum

Phone: +49 (234) 32-24380
Fax:  +49 (234) 32-04380
Email: Michael.Fechtelkord at ruhr-uni-bochum.de
Web Page: https://www.ruhr-uni-bochum.de/kristallographie/kc/mitarbeiter/fechtelkord/



More information about the Wien mailing list