[Wien] stubborn segmentation fault
Gavin Abo
gsabo at crimson.ua.edu
Thu Oct 25 16:18:12 CEST 2012
The following post might be relevant:
http://zeus.theochem.tuwien.ac.at/pipermail/wien/2012-May/017010.html
On 10/25/2012 7:21 AM, Laurence Marks wrote:
>
> I do not think the compilation issue is really a code problem, rather
> a limit in the compiler. I am 99% certain that it is still standard
> Fortran to use one type of array (e.g. complex) in the call and
> another (e.g. float) within the subroutine. However, if you have the
> compiler generate an interface I can see how this may not work.
>
> There is a possible issue with the call where the compilation stops,
> and you may want to test adding
>
> -assume dummy_aliases
>
> Calling subroutines using different parts of an array is certainly
> standard Fortran 77, but can be dangerous. One ifort man page I saw
> claimed it is not standard.
>
> Stepping back, an earlier part of your email indicated that the
> problem was at line 893 of l2main.F so the CFFT call is not the
> problem. I suggest adding lines such as
> Write(**) 'Stephan debug C'
> so you can determine exactly where the crash is taking place. (You can
> also edit the Makefile so it does not delete the temporary files,
> which may help to find the line.) If you look in the code you will
> find that there are many of these commented out.
>
> Also, are you using the latest version? I seem to remember a blocksize
> bug for certain sizes that was fixed in the last few months.
>
> ---------------------------
> Professor Laurence Marks
> Department of Materials Science and Engineering
> Northwestern University
> www.numis.northwestern.edu <http://www.numis.northwestern.edu>
> 1-847-491-3996
> "Research is to see what everybody else has seen, and to think what
> nobody else has thought"
> Albert Szent-Gyorgi
>
> On Oct 25, 2012 6:47 AM, "Stefaan Cottenier"
> <Stefaan.Cottenier at ugent.be <mailto:Stefaan.Cottenier at ugent.be>> wrote:
>
>
> Dear wien2k community,
>
> I do not succeed to get wien2k running flawlessly on our university
> cluster (Intel Xeon Harpertown (L5420)). For some cases, a
> reproducible
> segmentation fault error appears in lapw2. Our very capable sysadmins
> gave up, and blame it to 'a wien2k coding problem'. That's why I
> want to
> describe the problem for you:
>
> A) Description of the problem:
>
> * It is a "forrtl: severe (174): SIGSEGV, segmentation fault occurred"
> error, which appears in lapw2 with FOR in case.in2 (never with
> TOT). The
> full screen output (compiled with ifort, including -g -traceback) for
> k-point parallelization over 2 cores is:
>
> LAPW2 - FERMI; weighs written
> forrtl: severe (174): SIGSEGV, segmentation fault occurred
> Image PC Routine Line
> Source
> lapw2 0000000000484D28 l2main_ 893
> l2main_tmp_.F
> lapw2 00000000004A1C2D MAIN__ 564
> lapw2_tmp_.F
> lapw2 0000000000403C4C Unknown Unknown Unknown
> libc.so.6 000000300081D994 Unknown Unknown Unknown
> lapw2 0000000000403B59 Unknown Unknown Unknown
> forrtl: severe (174): SIGSEGV, segmentation fault occurred
> Image PC Routine Line
> Source
> lapw2 0000000000484D28 l2main_ 893
> l2main_tmp_.F
> lapw2 00000000004A1C2D MAIN__ 564
> lapw2_tmp_.F
> lapw2 0000000000403C4C Unknown Unknown Unknown
> libc.so.6 000000300081D994 Unknown Unknown Unknown
> lapw2 0000000000403B59 Unknown Unknown Unknown
>
> * It appears only for a limited number of cases (say 20% of all
> the ones
> I tried). The others run just fine.
>
> * The problem appears only in parallel runs. If a case shows the
> problem, one additional serial iteration is sufficient to complete the
> scf-cycle.
>
> * If the problem appears, it can be reproduced only by 'run_lapw
> -p'. If
> one tries a manual 'parallel' execution as hereunder (which I thought
> should execute exactly the same processes), the error does no show up:
>
> lapw0 lapw0.def
> lapw1 lapw1.def [1]
> lapw2 lapw2.def [1]
> lapw1 lapw1.def [2]
> lapw2 lapw2.def [2]
> ...
>
>
> B) Detailed analysis
>
> Trying different compiler versions was the first guess. Three
> different
> ifort versions were tested (including the celebrated 2011.3.174
> that was
> reported on the wien2k mailing list to work fine for v12.1), but all
> result in the same error:
>
> v2011.1.073
> v2011.3.174
> v2011.10.319
>
> Next, I searched for the possible reason by going through all steps
> described at the following link (a very useful piece of
> information for
> this mailing list, I suggest to mention it in the FAQ):
>
> http://software.intel.com/en-us/articles/determining-root-cause-of-sigsegv-or-sigbus-errors/
>
> All steps described there lead to no improvement up to the first
> half of
> "possible cause #5". The second test described in #5 yields something,
> however. When compiling with the additional options
>
> -fp-stack-check -g -traceback -gen-interfaces -warn interfaces
>
> there is the following compile crash for lapw2 :
>
> c3fft_tmp_.F(267): error #6633: The type of the actual argument
> differs
> from the type of the dummy argument. [WSAVE]
> CALL CFFTB1 (N,C,WSAVE,WSAVE(IW1),WSAVE(IW2))
> ----------------------------------------^
> compilation aborted for c3fft_tmp_.F (code 1)
>
> When searching the wien2k mailing list for c3fft, it turns out
> there had
> been problems before with this routine, and an updated version had
> been
> provided one year ago (=before v12.1):
>
> http://zeus.theochem.tuwien.ac.at/pipermail/wien/2011-April/014541.html
>
> It seems to have been a different problem, however, and both the
> present
> version and that (slightly different) version of april 2011 give the
> same compilation error.
>
> Can anyone use this information to find a solution?
>
> Thanks !
>
> Stefaan
>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> <mailto:Wien at zeus.theochem.tuwien.ac.at>
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>
>
>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20121025/6fb7b28f/attachment.htm>
More information about the Wien
mailing list