[Wien] Segmentation fault in lapw2c

Sat Nov 1 12:39:55 CET 2008

Hello all!

I was trying to run a runsp_lapw job for a spin-polarized 16 atom Cr supercell in our local cluster. This is a 50 double processor node Xeon system. I'm using ifort and mkl 64-bit 10.1 versions. I tried to use k-point parallelization. I flipped half the spins in case.inst before going through a complete initialization procedure, since I try to resemble antiferromagnetic alignment. Previous tests with the same supercell size in ferromagnetic Fe went OK and a complete SCF cycle finished without errors. We're using the latest WIEN2k_08.3 version.

I found a crash in lapw2 -c -up with a SIGSEGV, segmentation fault error. The error file reads as follows:

 LAPW0 END
 LAPW1 END
 LAPW1 END
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source      
lapw2c             0000000000430441  efermi_                   812  fermi_tmp_.F
lapw2c             0000000000430199  dos_                      752  fermi_tmp_.F
lapw2c             000000000042FE3A  fermi_tetra_              556  fermi_tmp_.F
lapw2c             000000000042D7FA  fermi_                    110  fermi_tmp_.F
lapw2c             0000000000457EE7  MAIN__                    258  lapw2_tmp_.F
lapw2c             000000000040FAAA  Unknown               Unknown  Unknown
libc.so.6          000000336481C3FB  Unknown               Unknown  Unknown
lapw2c             000000000040F9EA  Unknown               Unknown  Unknown

I thought something could be wrong in my input files. I ported everything to my PC and I found the same error output to the screen, except for the line showing the "dos" routine. Of course, I tried to change from TETRA to TEMP 0.003, for instance, in case.in2c but it did not help.
The funny thing is that I once had a simliar error in running a spin-orbit plus orbital polarization correction calculation and after countless efforts from P. Blaha and L. Marks there was no conclusive workaround. I am very sorry to say I don't remember when or how I solved this problem, if I did at all. Most possibly I skipped it and turned my attention to a different issue. If desired, it can be checked at:

http://zeus.theochem.tuwien.ac.at/pipermail/wien/2006-October/008036.html

I tried to run the sequence

x lapw0
x lapw1 -c -up
x lapw1 -c -dn
x lapw2 -c -up
.....

I tested

lapw2c uplapw2.def

as well, and in both cases I got the same error. 

Soon afterwards, I did a complete clean initialization in my PC and left it running. There was again a crash in lapw2c:

$ runsp_lapw -it -I -i 200 -ec 0.00001 -cc 0.0001
hup: Command not found.
Invalid null command.
 LAPW0 END
 LAPW1 END
 LAPW1 END
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC        Routine            Line        Source             
lapw2c             082FB8D0  Unknown               Unknown  Unknown
lapw2c             080B28B6  read_vec_                  88  read_vec_tmp_.F
lapw2c             0809086A  l2main_                   507  l2main_tmp_.F
lapw2c             080A4A37  MAIN__                    543  lapw2_tmp_.F
lapw2c             0804D1F1  Unknown               Unknown  Unknown
libc.so.6          4008A450  Unknown               Unknown  Unknown
lapw2c             0804D151  Unknown               Unknown  Unknown

>   stop error

The routines have now changed, now no fermi-routine related error appears, but something is still going wrong. I found the same problem again with the x lapw* and lapw2c uplapw2.def tests.

Could it be that this is really a memory limit or system size issue?

I would be very glad to welcome all possible suggestions. Please let me know if you need any extra info.

Greetings

Roberto

Roberto Iglesias
Departamento de Física
Universidad de Oviedo
Calvo Sotelo, s/n
33013 Oviedo SPAIN