[Wien] ‘lapw2 -so’ hangs

Elias Assmann elias.assmann at gmail.com
Mon Nov 11 20:14:49 CET 2013


Dear Peter,

I have tried to narrow things down a bit.  The subroutine
‘fermi_tetra’ gets stuck in the loop labeled ‘14’.  Here is a
code snippet:

    498      4 K=K+1 

    499        if(iloop0.ne.0) KPP(ILOOP0)=K
    500  !para begin
    501  ! testing
    502  !      write(*,*)'reading k=',K,itap,ispin,iloop,iloop0
    503  !para end
    504        IF(K.GT.2*NKPT) GOTO 900 

    505        READ(ITAP,5001,END=999) SS,T,ZZ,KNAME,N,NEHELP(K,ispin),WEI
    506        nmat=MAX(n,nmat)
    507
    508  !para begin
    509        NE(K+k1)=NEHELP(K,ispin)
    510  !para end
    511        if(nehelp(k,ispin).gt.nume) GOTO 920
    512        if(nemax.lt.nehelp(k,ispin)) nemax=nehelp(k,ispin)
    513        IF(N.GT.MAXWAV) MAXWAV=N 

    514        IF(N.LT.MINWAV) MINWAV=N 

    515     14 READ(ITAP,*) NUM,E1
    516        Eb(num,K,ispin)=E1 

    517        if(itap.eq.30.and.(e1.gt.ebmax(num))) ebmax(num)=e1
    518        if(itap.eq.30.and.e1.lt.ebmin(num)) ebmin(num)=e1
    519  !      READ(ITAP) (A(I),I=1,N) 

    520        IF(NUM.EQ.NEHELP(K,ispin)) GOTO 4
    521        GOTO 14

I put a debug statement

       write(0,*) 'Hello ', k,ispin, nehelp(k,ispin), nume

before l. 511.  The last few lines of output look either like this:

  Hello         2434           2          54          60
  Hello         2435           2          54          60
  Hello         2436           2          56          60
  Hello         2437           2           0          60
  Hello         2438           2           0          60
  Hello         2439           2  1198992928          60
FERMI - Error

where an error is raised on l. 511, or like this:

  Hello         2434           2          54          60
  Hello         2435           2          54          60
  Hello         2436           2          56          60
  Hello         2437           2           0          60
  Hello         2438           2           0          60
  Hello         2439           2  -820289632          60

where the program goes into the infinite loop instead.

What happens is that the NEHELP array is too small, so the READ on
l. 505 fails and NEHELP(K,ispin) ends up containing uninitialized
data.  So I guess the problem stems from the ‘energysodn’ which is too
small, and I need to go look at what is going wrong in lapwso.

But I thought I should share this anyway.  In particular, I do not
understand what is going on with the SIGSEGVs the program gets.  They
would be caused by NEHELP being too small, but why doesn't the program
die?  The Wien2k signal handling is not invoked (since this is not
parallel); I do see a call

   rt_sigaction(SIGSEGV, {0x4d2480, [], 
SA_RESTORER|SA_RESTART|SA_NODEFER|SA_SIGINFO, 0x2b472c1f5ca0}, NULL, 8) = 0

in the trace, but ifort seems to do this even for the simplest test
program, and that does not prevent it from dying on a SIGSEGV.

Secondly, I thought the READ on l. 515 would raise an error on EOF;
instead it seems to “busy wait” (it never returns but keeps the CPU
usage at 100%).

What is more, I found that the problem interacts in a subtle way with
ifort's (V. 11.1) ‘-ipo’ and ‘-g’ switches.  Originally, I had ‘-ipo’
set, resulting in the infinite loop.  For debugging, I took that out
and added ‘-g’ instead, which resulted in the behavior described
above.

Turning ‘-ipo’ back on, the debug output looks like this:

  Hello         2434           2          54          60
  Hello         2435           2          54          60
  Hello         2436           2          56          60
  Hello         2437           2          56          60

and the infinite loop always happens.

When I use neither switch, the result is what I would normally expect:
the program dies from the segfault.

Summarizing, this is what I see:

-g -ipo	: silent fail
    -ipo	: silent fail
-g     	: “FERMI - Error” / silent fail
        	: “normal“ segfault


Sorry for the overlong e-mail.

	Elias


More information about the Wien mailing list