[Wien] New findings on the lapw0 seg fault core dump error

Sat Jun 7 20:40:40 CEST 2025

Very curious.

Is "number of PW"  in case.clmsum   after init_lapw   and after the 
first cycle identical ?

Since this is a small case: Can you manually look at the 
Fouriercoefficients in clmsum. Any "huge" numbers ? Any *** numbers,

After dstart, I guess none of the FK are zero. After mixer (after 1st 
iteration) the later ones should be zero.

My guess is a problem in the libthread library of your compiler version 
(ifx 2025.xxx ?). The problems did not show up with previous compilers ?

Am 07.06.2025 um 18:18 schrieb Michael Fechtelkord via Wien:
> smiles .. no it is MgF2.. Just two atoms in a cubic cell. and it is not 
> dependent on the structure. It crashes for all in the first cycle using 
> the clmsum from the init_lapw
> 
> Am 07.06.2025 um 17:34 schrieb Peter Blaha:
>> Is this a big supercell ?
>>
>> The only thing I could imagine is that the number of PWs is bigger 
>> after dstart than after the 1st cycle.
>> grep for "PW" in the clmsum files from dstart and after the 1st cycle.
>> Eventually reduce number of PW until it works as a temporary fix.
>> It might be a "stack" problem and I think one can increase this 
>> somehow, but I can't remember how.
>>
>> Am 06.06.2025 um 22:25 schrieb Michael Fechtelkord via Wien:
>>> and a additional comment.
>>>
>>>
>>> lapw0 crashes only in the first cycle with OMP_NUM_THREADS higher 
>>> than 1. When I set lapw0:1 for the first cycle (using -i 1 in 
>>> run_lapw) and then after the first run set it back to lapw0:8 it runs 
>>> without a problem for the complete scf cycle. It seems that is a 
>>> problem with  the initial case.clmsum file (init_lapw -b -prec 1).
>>>
>>>
>>> Am 06.06.2025 um 22:07 schrieb Michael Fechtelkord via Wien:
>>>> Hello Peter,
>>>>
>>>>
>>>> omp_lapw0 in .machines was 8. I reduced it from 8 to 4, then to 2 
>>>> and finally to 1. Only in the case of omp_lapw0:1 lapw0 does not crash.
>>>>
>>>> omp_global:2
>>>>
>>>>
>>>> Best regards,
>>>>
>>>> Michael
>>>>
>>>>
>>>> Am 06.06.2025 um 17:59 schrieb Peter Blaha:
>>>>> What was your   OMP_NUM_THREADS variable ?
>>>>>
>>>>> Set it to 1, 2, ... and check if the error occurs again.
>>>>>
>>>>> Am 06.06.2025 um 14:07 schrieb Michael Fechtelkord via Wien:
>>>>>> I debugged the core-dump file with gdb and using debugging symbols 
>>>>>> in compilation of lapw0.
>>>>>>
>>>>>> The debugger gave me the line which causes the coredump
>>>>>>
>>>>>> _----------------------------------------
>>>>>>
>>>>>> Debuginfod has been enabled.
>>>>>> To make this setting permanent, add 'set debuginfod enabled on' 
>>>>>> to .gdbinit.
>>>>>> [Thread debugging using libthread_db enabled]
>>>>>> Using host libthread_db library "/lib64/libthread_db.so.1".
>>>>>> Core was generated by `/usr/local/WIEN2k/lapw0 lapw0.def'.
>>>>>> Program terminated with signal SIGSEGV, Segmentation fault.
>>>>>>
>>>>>> #0  0x000000000048b89b in 
>>>>>> MAIN__.DIR.OMP.PARALLEL.LOOP.12.split63842.split63939 ()*at 
>>>>>> lapw0.F:1649*
>>>>>>
>>>>>> *1649    !$omp parallel do reduction(+:rhopw00,cwk,cvout) &*
>>>>>>
>>>>>>
>>>>>> [Current thread is 1 (Thread 0x14823edbe740 (LWP 339344))]
>>>>>>
>>>>>> ------------------------------------
>>>>>>
>>>>>> Maybe somebody has an idea how to fix it..
>>>>>>
>>>>>>
>>>>>> Best regards
>>>>>>
>>>>>> Michael
>>>>>>
>>>>>>
>>>>>> Am 17.05.2025 um 13:48 schrieb Michael Fechtelkord via Wien:
>>>>>>> Hello everybody,
>>>>>>>
>>>>>>>
>>>>>>> I have new results considering the lapw0 crash which happens 
>>>>>>> partially (segmentation fault error - core dump).
>>>>>>>
>>>>>>> It seems that the crucial thing is the case.clmsum file. (I am no 
>>>>>>> expert here) But if this is somehow the key. It can produce the 
>>>>>>> lapw0 so it might be that it is sometimes triggering the lapw0.
>>>>>>>
>>>>>>> I calculated MgF2 and substituted the new generated clmsum by an 
>>>>>>> older one and then there was no crash. I cannot attach them 
>>>>>>> because the file size is too large.
>>>>>>>
>>>>>>>
>>>>>>> I am not so into debugging, to find out why and where it happens.
>>>>>>>
>>>>>>>
>>>>>>> Best regards,
>>>>>>>
>>>>>>> Michael
>>>>>>>
>>>>>>>
>>>>>> -- 
>>>>>> Dr. Michael Fechtelkord
>>>>>>
>>>>>> Institut für Geologie, Mineralogie und Geophysik
>>>>>> Ruhr-Universität Bochum
>>>>>> Universitätsstr. 150
>>>>>> D-44780 Bochum
>>>>>>
>>>>>> Phone: +49 (234) 32-24380
>>>>>> Fax:  +49 (234) 32-04380
>>>>>> Email:Michael.Fechtelkord at ruhr-uni-bochum.de
>>>>>> Web Page:https://www.ruhr-uni-bochum.de/kristallographie/kc/ 
>>>>>> mitarbeiter/fechtelkord/
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Wien mailing list
>>>>>> Wien at zeus.theochem.tuwien.ac.at
>>>>>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>>>>>> SEARCH the MAILING-LIST at: http://www.mail-archive.com/ 
>>>>>> wien at zeus.theochem.tuwien.ac.at/index.html
>>>>>
>>

-- 
-----------------------------------------------------------------------
Peter Blaha,  Inst. f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-158801165300
Email: peter.blaha at tuwien.ac.at
WWW:   http://www.imc.tuwien.ac.at      WIEN2k: http://www.wien2k.at
-------------------------------------------------------------------------