[Wien] stubborn segmentation fault
Laurence Marks
L-marks at northwestern.edu
Thu Oct 25 15:21:11 CEST 2012
I do not think the compilation issue is really a code problem, rather a
limit in the compiler. I am 99% certain that it is still standard Fortran
to use one type of array (e.g. complex) in the call and another (e.g.
float) within the subroutine. However, if you have the compiler generate an
interface I can see how this may not work.
There is a possible issue with the call where the compilation stops, and
you may want to test adding
-assume dummy_aliases
Calling subroutines using different parts of an array is certainly standard
Fortran 77, but can be dangerous. One ifort man page I saw claimed it is
not standard.
Stepping back, an earlier part of your email indicated that the problem was
at line 893 of l2main.F so the CFFT call is not the problem. I suggest
adding lines such as
Write(**) 'Stephan debug C'
so you can determine exactly where the crash is taking place. (You can also
edit the Makefile so it does not delete the temporary files, which may help
to find the line.) If you look in the code you will find that there are
many of these commented out.
Also, are you using the latest version? I seem to remember a blocksize bug
for certain sizes that was fixed in the last few months.
---------------------------
Professor Laurence Marks
Department of Materials Science and Engineering
Northwestern University
www.numis.northwestern.edu 1-847-491-3996
"Research is to see what everybody else has seen, and to think what nobody
else has thought"
Albert Szent-Gyorgi
On Oct 25, 2012 6:47 AM, "Stefaan Cottenier" <Stefaan.Cottenier at ugent.be>
wrote:
>
> Dear wien2k community,
>
> I do not succeed to get wien2k running flawlessly on our university
> cluster (Intel Xeon Harpertown (L5420)). For some cases, a reproducible
> segmentation fault error appears in lapw2. Our very capable sysadmins
> gave up, and blame it to 'a wien2k coding problem'. That's why I want to
> describe the problem for you:
>
> A) Description of the problem:
>
> * It is a "forrtl: severe (174): SIGSEGV, segmentation fault occurred"
> error, which appears in lapw2 with FOR in case.in2 (never with TOT). The
> full screen output (compiled with ifort, including -g -traceback) for
> k-point parallelization over 2 cores is:
>
> LAPW2 - FERMI; weighs written
> forrtl: severe (174): SIGSEGV, segmentation fault occurred
> Image PC Routine Line Source
> lapw2 0000000000484D28 l2main_ 893
> l2main_tmp_.F
> lapw2 00000000004A1C2D MAIN__ 564
> lapw2_tmp_.F
> lapw2 0000000000403C4C Unknown Unknown Unknown
> libc.so.6 000000300081D994 Unknown Unknown Unknown
> lapw2 0000000000403B59 Unknown Unknown Unknown
> forrtl: severe (174): SIGSEGV, segmentation fault occurred
> Image PC Routine Line Source
> lapw2 0000000000484D28 l2main_ 893
> l2main_tmp_.F
> lapw2 00000000004A1C2D MAIN__ 564
> lapw2_tmp_.F
> lapw2 0000000000403C4C Unknown Unknown Unknown
> libc.so.6 000000300081D994 Unknown Unknown Unknown
> lapw2 0000000000403B59 Unknown Unknown Unknown
>
> * It appears only for a limited number of cases (say 20% of all the ones
> I tried). The others run just fine.
>
> * The problem appears only in parallel runs. If a case shows the
> problem, one additional serial iteration is sufficient to complete the
> scf-cycle.
>
> * If the problem appears, it can be reproduced only by 'run_lapw -p'. If
> one tries a manual 'parallel' execution as hereunder (which I thought
> should execute exactly the same processes), the error does no show up:
>
> lapw0 lapw0.def
> lapw1 lapw1.def [1]
> lapw2 lapw2.def [1]
> lapw1 lapw1.def [2]
> lapw2 lapw2.def [2]
> ...
>
>
> B) Detailed analysis
>
> Trying different compiler versions was the first guess. Three different
> ifort versions were tested (including the celebrated 2011.3.174 that was
> reported on the wien2k mailing list to work fine for v12.1), but all
> result in the same error:
>
> v2011.1.073
> v2011.3.174
> v2011.10.319
>
> Next, I searched for the possible reason by going through all steps
> described at the following link (a very useful piece of information for
> this mailing list, I suggest to mention it in the FAQ):
>
>
> http://software.intel.com/en-us/articles/determining-root-cause-of-sigsegv-or-sigbus-errors/
>
> All steps described there lead to no improvement up to the first half of
> "possible cause #5". The second test described in #5 yields something,
> however. When compiling with the additional options
>
> -fp-stack-check -g -traceback -gen-interfaces -warn interfaces
>
> there is the following compile crash for lapw2 :
>
> c3fft_tmp_.F(267): error #6633: The type of the actual argument differs
> from the type of the dummy argument. [WSAVE]
> CALL CFFTB1 (N,C,WSAVE,WSAVE(IW1),WSAVE(IW2))
> ----------------------------------------^
> compilation aborted for c3fft_tmp_.F (code 1)
>
> When searching the wien2k mailing list for c3fft, it turns out there had
> been problems before with this routine, and an updated version had been
> provided one year ago (=before v12.1):
>
> http://zeus.theochem.tuwien.ac.at/pipermail/wien/2011-April/014541.html
>
> It seems to have been a different problem, however, and both the present
> version and that (slightly different) version of april 2011 give the
> same compilation error.
>
> Can anyone use this information to find a solution?
>
> Thanks !
>
> Stefaan
>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20121025/3a1ab25d/attachment.htm>
More information about the Wien
mailing list