[Wien] ‘lapw2 -so’ hangs
Elias Assmann
elias.assmann at gmail.com
Mon Nov 11 20:14:49 CET 2013
Dear Peter,
I have tried to narrow things down a bit. The subroutine
‘fermi_tetra’ gets stuck in the loop labeled ‘14’. Here is a
code snippet:
498 4 K=K+1
499 if(iloop0.ne.0) KPP(ILOOP0)=K
500 !para begin
501 ! testing
502 ! write(*,*)'reading k=',K,itap,ispin,iloop,iloop0
503 !para end
504 IF(K.GT.2*NKPT) GOTO 900
505 READ(ITAP,5001,END=999) SS,T,ZZ,KNAME,N,NEHELP(K,ispin),WEI
506 nmat=MAX(n,nmat)
507
508 !para begin
509 NE(K+k1)=NEHELP(K,ispin)
510 !para end
511 if(nehelp(k,ispin).gt.nume) GOTO 920
512 if(nemax.lt.nehelp(k,ispin)) nemax=nehelp(k,ispin)
513 IF(N.GT.MAXWAV) MAXWAV=N
514 IF(N.LT.MINWAV) MINWAV=N
515 14 READ(ITAP,*) NUM,E1
516 Eb(num,K,ispin)=E1
517 if(itap.eq.30.and.(e1.gt.ebmax(num))) ebmax(num)=e1
518 if(itap.eq.30.and.e1.lt.ebmin(num)) ebmin(num)=e1
519 ! READ(ITAP) (A(I),I=1,N)
520 IF(NUM.EQ.NEHELP(K,ispin)) GOTO 4
521 GOTO 14
I put a debug statement
write(0,*) 'Hello ', k,ispin, nehelp(k,ispin), nume
before l. 511. The last few lines of output look either like this:
Hello 2434 2 54 60
Hello 2435 2 54 60
Hello 2436 2 56 60
Hello 2437 2 0 60
Hello 2438 2 0 60
Hello 2439 2 1198992928 60
FERMI - Error
where an error is raised on l. 511, or like this:
Hello 2434 2 54 60
Hello 2435 2 54 60
Hello 2436 2 56 60
Hello 2437 2 0 60
Hello 2438 2 0 60
Hello 2439 2 -820289632 60
where the program goes into the infinite loop instead.
What happens is that the NEHELP array is too small, so the READ on
l. 505 fails and NEHELP(K,ispin) ends up containing uninitialized
data. So I guess the problem stems from the ‘energysodn’ which is too
small, and I need to go look at what is going wrong in lapwso.
But I thought I should share this anyway. In particular, I do not
understand what is going on with the SIGSEGVs the program gets. They
would be caused by NEHELP being too small, but why doesn't the program
die? The Wien2k signal handling is not invoked (since this is not
parallel); I do see a call
rt_sigaction(SIGSEGV, {0x4d2480, [],
SA_RESTORER|SA_RESTART|SA_NODEFER|SA_SIGINFO, 0x2b472c1f5ca0}, NULL, 8) = 0
in the trace, but ifort seems to do this even for the simplest test
program, and that does not prevent it from dying on a SIGSEGV.
Secondly, I thought the READ on l. 515 would raise an error on EOF;
instead it seems to “busy wait” (it never returns but keeps the CPU
usage at 100%).
What is more, I found that the problem interacts in a subtle way with
ifort's (V. 11.1) ‘-ipo’ and ‘-g’ switches. Originally, I had ‘-ipo’
set, resulting in the infinite loop. For debugging, I took that out
and added ‘-g’ instead, which resulted in the behavior described
above.
Turning ‘-ipo’ back on, the debug output looks like this:
Hello 2434 2 54 60
Hello 2435 2 54 60
Hello 2436 2 56 60
Hello 2437 2 56 60
and the infinite loop always happens.
When I use neither switch, the result is what I would normally expect:
the program dies from the segfault.
Summarizing, this is what I see:
-g -ipo : silent fail
-ipo : silent fail
-g : “FERMI - Error” / silent fail
: “normal“ segfault
Sorry for the overlong e-mail.
Elias
More information about the Wien
mailing list