[Wien] Segmentation fault in lapw2c
Laurence Marks
L-marks at northwestern.edu
Sat Nov 1 15:04:21 CET 2008
Thankfully you compiled with -traceback because that helps (probably)
locate what is going wrong. If you look at the relevant piece of code
(in fermi.F) it reads:
! ** NB NUMBER OF BANDS *
! ** NKP NUMBER OF IRREDUCIBLE K-POINTS *
(Some lines omitted)
DIMENSION EB(NB,NKP),E(4),IKP(4)
INTEGER W(NWX)
CHARACTER*67 ERRMSG
DATA NP/1000/
CALL DEF0(NWX)
! -----------------------------------------------------------------
! -- FIND EMIN EMAX (ENERGYBANDS ARE ASSUMED -
! -- TO BE ORDERED WITH RESPECT TO SYMMETRY -
! -----------------------------------------------------------------
EMIN=EB(1,1)
<< This is line 812
EMAX=EB(NB,1)
where I've added the "<< This is line 812" and condensed it slightly.
If, somehow, the number of bands (NB) or KPTS (NKP) has got corrupted,
for instance are negative or zero, then the definition of the size of
EB "DIMENSION EB(NB,NKP)" is wrong. Almost certainly this has happened
because something has gone wrong earlier in either lapw1c or lapwso
which have run to completion but not produced sensible output. You
should look at the output files they produced and see if they are
sensible; probably not. Peter may have some specific ideas.
N.B. While the problem is almost certainly not in lapw2, rather
something earlier, there are several things you could do to help sort
out what the problem. One is to add -C to the compilation options (for
testing purposes) for lapw2. This is noticeably slower but will give
more information (but might also show some non-bugs, so be careful if
your fortran programming skills are weak). An alternative would be to
add a debug line, for instance
write(*,*)'Checking Dimensions ',NB,NKP
before line 812 in fermi.F
On Sat, Nov 1, 2008 at 6:39 AM, ROBERTO LUIS IGLESIAS PASTRANA
<roberto at uniovi.es> wrote:
> Hello all!
>
> I was trying to run a runsp_lapw job for a spin-polarized 16 atom Cr supercell in our local cluster. This is a 50 double processor node Xeon system. I'm using ifort and mkl 64-bit 10.1 versions. I tried to use k-point parallelization. I flipped half the spins in case.inst before going through a complete initialization procedure, since I try to resemble antiferromagnetic alignment. Previous tests with the same supercell size in ferromagnetic Fe went OK and a complete SCF cycle finished without errors. We're using the latest WIEN2k_08.3 version.
>
> I found a crash in lapw2 -c -up with a SIGSEGV, segmentation fault error. The error file reads as follows:
>
> LAPW0 END
> LAPW1 END
> LAPW1 END
> forrtl: severe (174): SIGSEGV, segmentation fault occurred
> Image PC Routine Line Source
> lapw2c 0000000000430441 efermi_ 812 fermi_tmp_.F
> lapw2c 0000000000430199 dos_ 752 fermi_tmp_.F
> lapw2c 000000000042FE3A fermi_tetra_ 556 fermi_tmp_.F
> lapw2c 000000000042D7FA fermi_ 110 fermi_tmp_.F
> lapw2c 0000000000457EE7 MAIN__ 258 lapw2_tmp_.F
> lapw2c 000000000040FAAA Unknown Unknown Unknown
> libc.so.6 000000336481C3FB Unknown Unknown Unknown
> lapw2c 000000000040F9EA Unknown Unknown Unknown
>
> I thought something could be wrong in my input files. I ported everything to my PC and I found the same error output to the screen, except for the line showing the "dos" routine. Of course, I tried to change from TETRA to TEMP 0.003, for instance, in case.in2c but it did not help.
> The funny thing is that I once had a simliar error in running a spin-orbit plus orbital polarization correction calculation and after countless efforts from P. Blaha and L. Marks there was no conclusive workaround. I am very sorry to say I don't remember when or how I solved this problem, if I did at all. Most possibly I skipped it and turned my attention to a different issue. If desired, it can be checked at:
>
> http://zeus.theochem.tuwien.ac.at/pipermail/wien/2006-October/008036.html
>
> I tried to run the sequence
>
> x lapw0
> x lapw1 -c -up
> x lapw1 -c -dn
> x lapw2 -c -up
> .....
>
> I tested
>
> lapw2c uplapw2.def
>
> as well, and in both cases I got the same error.
>
> Soon afterwards, I did a complete clean initialization in my PC and left it running. There was again a crash in lapw2c:
>
> $ runsp_lapw -it -I -i 200 -ec 0.00001 -cc 0.0001
> hup: Command not found.
> Invalid null command.
> LAPW0 END
> LAPW1 END
> LAPW1 END
> forrtl: severe (174): SIGSEGV, segmentation fault occurred
> Image PC Routine Line Source
> lapw2c 082FB8D0 Unknown Unknown Unknown
> lapw2c 080B28B6 read_vec_ 88 read_vec_tmp_.F
> lapw2c 0809086A l2main_ 507 l2main_tmp_.F
> lapw2c 080A4A37 MAIN__ 543 lapw2_tmp_.F
> lapw2c 0804D1F1 Unknown Unknown Unknown
> libc.so.6 4008A450 Unknown Unknown Unknown
> lapw2c 0804D151 Unknown Unknown Unknown
>
>> stop error
>
> The routines have now changed, now no fermi-routine related error appears, but something is still going wrong. I found the same problem again with the x lapw* and lapw2c uplapw2.def tests.
>
> Could it be that this is really a memory limit or system size issue?
>
> I would be very glad to welcome all possible suggestions. Please let me know if you need any extra info.
>
> Greetings
>
> Roberto
>
> Roberto Iglesias
> Departamento de Física
> Universidad de Oviedo
> Calvo Sotelo, s/n
> 33013 Oviedo SPAIN
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>
--
Laurence Marks
Department of Materials Science and Engineering
MSE Rm 2036 Cook Hall
2220 N Campus Drive
Northwestern University
Evanston, IL 60208, USA
Tel: (847) 491-3996 Fax: (847) 491-7820
email: L-marks at northwestern dot edu
Web: www.numis.northwestern.edu
Chair, Commission on Electron Crystallography of IUCR
www.numis.northwestern.edu/
Electron crystallography is the branch of science that uses electron
scattering to study the structure of matter.
More information about the Wien
mailing list