[Wien] New findings on the lapw0 seg fault core dump error

Fecher, Gerhard fecher at uni-mainz.de
Sun Jun 8 10:27:20 CEST 2025


Dear Peter and Michael,
I receive the segmentation fault  with OneAPI 2024.2 and OneAPI 2025.1
it appears already with -O1

I mentioned already some time ago: when I comment the $omp directives at lines 1649 ff. then the program runs smooth.

It seems that this is an old unresolved problem, as it is mentioned in a comment by jdoumont 30/7/20
(however, it seems not to depend on the size of the calculation)

Ciao
Gerhard

DEEP THOUGHT in D. Adams; Hitchhikers Guide to the Galaxy:
"I think the problem, to be quite honest with you,
is that you have never actually known what the question is."

====================================
Dr. Gerhard H. Fecher
Institut of Physics
Johannes Gutenberg - University
55099 Mainz
________________________________________
Von: Wien [wien-bounces at zeus.theochem.tuwien.ac.at] im Auftrag von Peter Blaha [peter.blaha at tuwien.ac.at]
Gesendet: Samstag, 7. Juni 2025 20:40
An: wien at zeus.theochem.tuwien.ac.at
Betreff: Re: [Wien] New findings on the lapw0 seg fault core dump error

Very curious.

Is "number of PW"  in case.clmsum   after init_lapw   and after the
first cycle identical ?

Since this is a small case: Can you manually look at the
Fouriercoefficients in clmsum. Any "huge" numbers ? Any *** numbers,

After dstart, I guess none of the FK are zero. After mixer (after 1st
iteration) the later ones should be zero.

My guess is a problem in the libthread library of your compiler version
(ifx 2025.xxx ?). The problems did not show up with previous compilers ?


Am 07.06.2025 um 18:18 schrieb Michael Fechtelkord via Wien:
> smiles .. no it is MgF2.. Just two atoms in a cubic cell. and it is not
> dependent on the structure. It crashes for all in the first cycle using
> the clmsum from the init_lapw
>
> Am 07.06.2025 um 17:34 schrieb Peter Blaha:
>> Is this a big supercell ?
>>
>> The only thing I could imagine is that the number of PWs is bigger
>> after dstart than after the 1st cycle.
>> grep for "PW" in the clmsum files from dstart and after the 1st cycle.
>> Eventually reduce number of PW until it works as a temporary fix.
>> It might be a "stack" problem and I think one can increase this
>> somehow, but I can't remember how.
>>
>> Am 06.06.2025 um 22:25 schrieb Michael Fechtelkord via Wien:
>>> and a additional comment.
>>>
>>>
>>> lapw0 crashes only in the first cycle with OMP_NUM_THREADS higher
>>> than 1. When I set lapw0:1 for the first cycle (using -i 1 in
>>> run_lapw) and then after the first run set it back to lapw0:8 it runs
>>> without a problem for the complete scf cycle. It seems that is a
>>> problem with  the initial case.clmsum file (init_lapw -b -prec 1).
>>>
>>>
>>> Am 06.06.2025 um 22:07 schrieb Michael Fechtelkord via Wien:
>>>> Hello Peter,
>>>>
>>>>
>>>> omp_lapw0 in .machines was 8. I reduced it from 8 to 4, then to 2
>>>> and finally to 1. Only in the case of omp_lapw0:1 lapw0 does not crash.
>>>>
>>>> omp_global:2
>>>>
>>>>
>>>> Best regards,
>>>>
>>>> Michael
>>>>
>>>>
>>>> Am 06.06.2025 um 17:59 schrieb Peter Blaha:
>>>>> What was your   OMP_NUM_THREADS variable ?
>>>>>
>>>>> Set it to 1, 2, ... and check if the error occurs again.
>>>>>
>>>>> Am 06.06.2025 um 14:07 schrieb Michael Fechtelkord via Wien:
>>>>>> I debugged the core-dump file with gdb and using debugging symbols
>>>>>> in compilation of lapw0.
>>>>>>
>>>>>> The debugger gave me the line which causes the coredump
>>>>>>
>>>>>> _----------------------------------------
>>>>>>
>>>>>> Debuginfod has been enabled.
>>>>>> To make this setting permanent, add 'set debuginfod enabled on'
>>>>>> to .gdbinit.
>>>>>> [Thread debugging using libthread_db enabled]
>>>>>> Using host libthread_db library "/lib64/libthread_db.so.1".
>>>>>> Core was generated by `/usr/local/WIEN2k/lapw0 lapw0.def'.
>>>>>> Program terminated with signal SIGSEGV, Segmentation fault.
>>>>>>
>>>>>> #0  0x000000000048b89b in
>>>>>> MAIN__.DIR.OMP.PARALLEL.LOOP.12.split63842.split63939 ()*at
>>>>>> lapw0.F:1649*
>>>>>>
>>>>>> *1649    !$omp parallel do reduction(+:rhopw00,cwk,cvout) &*
>>>>>>
>>>>>>
>>>>>> [Current thread is 1 (Thread 0x14823edbe740 (LWP 339344))]
>>>>>>
>>>>>> ------------------------------------
>>>>>>
>>>>>> Maybe somebody has an idea how to fix it..
>>>>>>
>>>>>>
>>>>>> Best regards
>>>>>>
>>>>>> Michael
>>>>>>
>>>>>>
>>>>>> Am 17.05.2025 um 13:48 schrieb Michael Fechtelkord via Wien:
>>>>>>> Hello everybody,
>>>>>>>
>>>>>>>
>>>>>>> I have new results considering the lapw0 crash which happens
>>>>>>> partially (segmentation fault error - core dump).
>>>>>>>
>>>>>>> It seems that the crucial thing is the case.clmsum file. (I am no
>>>>>>> expert here) But if this is somehow the key. It can produce the
>>>>>>> lapw0 so it might be that it is sometimes triggering the lapw0.
>>>>>>>
>>>>>>> I calculated MgF2 and substituted the new generated clmsum by an
>>>>>>> older one and then there was no crash. I cannot attach them
>>>>>>> because the file size is too large.
>>>>>>>
>>>>>>>
>>>>>>> I am not so into debugging, to find out why and where it happens.
>>>>>>>
>>>>>>>
>>>>>>> Best regards,
>>>>>>>
>>>>>>> Michael
>>>>>>>
>>>>>>>
>>>>>> --
>>>>>> Dr. Michael Fechtelkord
>>>>>>
>>>>>> Institut für Geologie, Mineralogie und Geophysik
>>>>>> Ruhr-Universität Bochum
>>>>>> Universitätsstr. 150
>>>>>> D-44780 Bochum
>>>>>>
>>>>>> Phone: +49 (234) 32-24380
>>>>>> Fax:  +49 (234) 32-04380
>>>>>> Email:Michael.Fechtelkord at ruhr-uni-bochum.de
>>>>>> Web Page:https://www.ruhr-uni-bochum.de/kristallographie/kc/
>>>>>> mitarbeiter/fechtelkord/
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Wien mailing list
>>>>>> Wien at zeus.theochem.tuwien.ac.at
>>>>>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>>>>>> SEARCH the MAILING-LIST at: http://www.mail-archive.com/
>>>>>> wien at zeus.theochem.tuwien.ac.at/index.html
>>>>>
>>

--
-----------------------------------------------------------------------
Peter Blaha,  Inst. f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-158801165300
Email: peter.blaha at tuwien.ac.at
WWW:   http://www.imc.tuwien.ac.at      WIEN2k: http://www.wien2k.at
-------------------------------------------------------------------------

_______________________________________________
Wien mailing list
Wien at zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


More information about the Wien mailing list