[Wien] Segmentation fault in lapw2c
ROBERTO LUIS IGLESIAS PASTRANA
roberto at uniovi.es
Sat Nov 1 12:39:55 CET 2008
Hello all!
I was trying to run a runsp_lapw job for a spin-polarized 16 atom Cr supercell in our local cluster. This is a 50 double processor node Xeon system. I'm using ifort and mkl 64-bit 10.1 versions. I tried to use k-point parallelization. I flipped half the spins in case.inst before going through a complete initialization procedure, since I try to resemble antiferromagnetic alignment. Previous tests with the same supercell size in ferromagnetic Fe went OK and a complete SCF cycle finished without errors. We're using the latest WIEN2k_08.3 version.
I found a crash in lapw2 -c -up with a SIGSEGV, segmentation fault error. The error file reads as follows:
LAPW0 END
LAPW1 END
LAPW1 END
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
lapw2c 0000000000430441 efermi_ 812 fermi_tmp_.F
lapw2c 0000000000430199 dos_ 752 fermi_tmp_.F
lapw2c 000000000042FE3A fermi_tetra_ 556 fermi_tmp_.F
lapw2c 000000000042D7FA fermi_ 110 fermi_tmp_.F
lapw2c 0000000000457EE7 MAIN__ 258 lapw2_tmp_.F
lapw2c 000000000040FAAA Unknown Unknown Unknown
libc.so.6 000000336481C3FB Unknown Unknown Unknown
lapw2c 000000000040F9EA Unknown Unknown Unknown
I thought something could be wrong in my input files. I ported everything to my PC and I found the same error output to the screen, except for the line showing the "dos" routine. Of course, I tried to change from TETRA to TEMP 0.003, for instance, in case.in2c but it did not help.
The funny thing is that I once had a simliar error in running a spin-orbit plus orbital polarization correction calculation and after countless efforts from P. Blaha and L. Marks there was no conclusive workaround. I am very sorry to say I don't remember when or how I solved this problem, if I did at all. Most possibly I skipped it and turned my attention to a different issue. If desired, it can be checked at:
http://zeus.theochem.tuwien.ac.at/pipermail/wien/2006-October/008036.html
I tried to run the sequence
x lapw0
x lapw1 -c -up
x lapw1 -c -dn
x lapw2 -c -up
.....
I tested
lapw2c uplapw2.def
as well, and in both cases I got the same error.
Soon afterwards, I did a complete clean initialization in my PC and left it running. There was again a crash in lapw2c:
$ runsp_lapw -it -I -i 200 -ec 0.00001 -cc 0.0001
hup: Command not found.
Invalid null command.
LAPW0 END
LAPW1 END
LAPW1 END
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
lapw2c 082FB8D0 Unknown Unknown Unknown
lapw2c 080B28B6 read_vec_ 88 read_vec_tmp_.F
lapw2c 0809086A l2main_ 507 l2main_tmp_.F
lapw2c 080A4A37 MAIN__ 543 lapw2_tmp_.F
lapw2c 0804D1F1 Unknown Unknown Unknown
libc.so.6 4008A450 Unknown Unknown Unknown
lapw2c 0804D151 Unknown Unknown Unknown
> stop error
The routines have now changed, now no fermi-routine related error appears, but something is still going wrong. I found the same problem again with the x lapw* and lapw2c uplapw2.def tests.
Could it be that this is really a memory limit or system size issue?
I would be very glad to welcome all possible suggestions. Please let me know if you need any extra info.
Greetings
Roberto
Roberto Iglesias
Departamento de Física
Universidad de Oviedo
Calvo Sotelo, s/n
33013 Oviedo SPAIN
More information about the Wien
mailing list