[Wien] stubborn segmentation fault

Gavin Abo gsabo at crimson.ua.edu
Thu Oct 25 16:18:12 CEST 2012


The following post might be relevant:

http://zeus.theochem.tuwien.ac.at/pipermail/wien/2012-May/017010.html

On 10/25/2012 7:21 AM, Laurence Marks wrote:
>
> I do not think the compilation issue is really a code problem, rather 
> a limit in the compiler. I am 99% certain that it is still standard 
> Fortran to use one type of array (e.g. complex) in the call and 
> another (e.g. float) within the subroutine. However, if you have the 
> compiler generate an interface I can see how this may not work.
>
> There is a possible issue with the call where the compilation stops, 
> and you may want to test adding
>
> -assume dummy_aliases
>
> Calling subroutines using different parts of an array is certainly 
> standard Fortran 77, but can be dangerous. One ifort man page I saw 
> claimed it is not standard.
>
> Stepping back, an earlier part of your email indicated that the 
> problem was at line 893 of l2main.F so the CFFT call is not the 
> problem. I suggest adding lines such as
> Write(**) 'Stephan debug C'
> so you can determine exactly where the crash is taking place. (You can 
> also edit the Makefile so it does not delete the temporary files, 
> which may help to find the line.) If you look in the code you will 
> find that there are many of these commented out.
>
> Also, are you using the latest version? I seem to remember a blocksize 
> bug for certain sizes that was fixed in the last few months.
>
> ---------------------------
> Professor Laurence Marks
> Department of Materials Science and Engineering
> Northwestern University
> www.numis.northwestern.edu <http://www.numis.northwestern.edu> 
> 1-847-491-3996
> "Research is to see what everybody else has seen, and to think what 
> nobody else has thought"
> Albert Szent-Gyorgi
>
> On Oct 25, 2012 6:47 AM, "Stefaan Cottenier" 
> <Stefaan.Cottenier at ugent.be <mailto:Stefaan.Cottenier at ugent.be>> wrote:
>
>
>     Dear wien2k community,
>
>     I do not succeed to get wien2k running flawlessly on our university
>     cluster (Intel Xeon Harpertown (L5420)). For some cases, a
>     reproducible
>     segmentation fault error appears in lapw2. Our very capable sysadmins
>     gave up, and blame it to 'a wien2k coding problem'. That's why I
>     want to
>     describe the problem for you:
>
>     A) Description of the problem:
>
>     * It is a "forrtl: severe (174): SIGSEGV, segmentation fault occurred"
>     error, which appears in lapw2 with FOR in case.in2 (never with
>     TOT). The
>     full screen output (compiled with ifort, including -g -traceback) for
>     k-point parallelization over 2 cores is:
>
>     LAPW2 - FERMI; weighs written
>     forrtl: severe (174): SIGSEGV, segmentation fault occurred
>     Image              PC                Routine            Line    
>      Source
>     lapw2              0000000000484D28  l2main_ 893
>     l2main_tmp_.F
>     lapw2              00000000004A1C2D  MAIN__  564
>     lapw2_tmp_.F
>     lapw2              0000000000403C4C  Unknown Unknown  Unknown
>     libc.so.6          000000300081D994  Unknown Unknown  Unknown
>     lapw2              0000000000403B59  Unknown Unknown  Unknown
>     forrtl: severe (174): SIGSEGV, segmentation fault occurred
>     Image              PC                Routine            Line    
>      Source
>     lapw2              0000000000484D28  l2main_ 893
>     l2main_tmp_.F
>     lapw2              00000000004A1C2D  MAIN__  564
>     lapw2_tmp_.F
>     lapw2              0000000000403C4C  Unknown Unknown  Unknown
>     libc.so.6          000000300081D994  Unknown Unknown  Unknown
>     lapw2              0000000000403B59  Unknown Unknown  Unknown
>
>     * It appears only for a limited number of cases (say 20% of all
>     the ones
>     I tried). The others run just fine.
>
>     * The problem appears only in parallel runs. If a case shows the
>     problem, one additional serial iteration is sufficient to complete the
>     scf-cycle.
>
>     * If the problem appears, it can be reproduced only by 'run_lapw
>     -p'. If
>     one tries a manual 'parallel' execution as hereunder (which I thought
>     should execute exactly the same processes), the error does no show up:
>
>     lapw0 lapw0.def
>     lapw1 lapw1.def [1]
>     lapw2 lapw2.def [1]
>     lapw1 lapw1.def [2]
>     lapw2 lapw2.def [2]
>     ...
>
>
>     B) Detailed analysis
>
>     Trying different compiler versions was the first guess. Three
>     different
>     ifort versions were tested (including the celebrated 2011.3.174
>     that was
>     reported on the wien2k mailing list to work fine for v12.1), but all
>     result in the same error:
>
>     v2011.1.073
>     v2011.3.174
>     v2011.10.319
>
>     Next, I searched for the possible reason by going through all steps
>     described at the following link (a very useful piece of
>     information for
>     this mailing list, I suggest to mention it in the FAQ):
>
>     http://software.intel.com/en-us/articles/determining-root-cause-of-sigsegv-or-sigbus-errors/
>
>     All steps described there lead to no improvement up to the first
>     half of
>     "possible cause #5". The second test described in #5 yields something,
>     however. When compiling with the additional options
>
>     -fp-stack-check -g -traceback -gen-interfaces -warn interfaces
>
>     there is the following compile crash for lapw2 :
>
>     c3fft_tmp_.F(267): error #6633: The type of the actual argument
>     differs
>     from the type of the dummy argument.   [WSAVE]
>            CALL CFFTB1 (N,C,WSAVE,WSAVE(IW1),WSAVE(IW2))
>     ----------------------------------------^
>     compilation aborted for c3fft_tmp_.F (code 1)
>
>     When searching the wien2k mailing list for c3fft, it turns out
>     there had
>     been problems before with this routine, and an updated version had
>     been
>     provided one year ago (=before v12.1):
>
>     http://zeus.theochem.tuwien.ac.at/pipermail/wien/2011-April/014541.html
>
>     It seems to have been a different problem, however, and both the
>     present
>     version and that (slightly different) version of april 2011 give the
>     same compilation error.
>
>     Can anyone use this information to find a solution?
>
>     Thanks !
>
>     Stefaan
>
>     _______________________________________________
>     Wien mailing list
>     Wien at zeus.theochem.tuwien.ac.at
>     <mailto:Wien at zeus.theochem.tuwien.ac.at>
>     http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>
>
>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20121025/6fb7b28f/attachment.htm>


More information about the Wien mailing list