<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">The following post might be relevant:<br>
<br>
<a class="moz-txt-link-freetext" href="http://zeus.theochem.tuwien.ac.at/pipermail/wien/2012-May/017010.html">http://zeus.theochem.tuwien.ac.at/pipermail/wien/2012-May/017010.html</a><br>
<br>
On 10/25/2012 7:21 AM, Laurence Marks wrote:<br>
</div>
<blockquote
cite="mid:CANkSMZA=ffPY1zHF4+iogAVLtSXBTihnrpeZX8reCCp8pOw3ow@mail.gmail.com"
type="cite">
<p>I do not think the compilation issue is really a code problem,
rather a limit in the compiler. I am 99% certain that it is
still standard Fortran to use one type of array (e.g. complex)
in the call and another (e.g. float) within the subroutine.
However, if you have the compiler generate an interface I can
see how this may not work.</p>
<p>There is a possible issue with the call where the compilation
stops, and you may want to test adding</p>
<p>-assume dummy_aliases</p>
<p>Calling subroutines using different parts of an array is
certainly standard Fortran 77, but can be dangerous. One ifort
man page I saw claimed it is not standard.</p>
<p>Stepping back, an earlier part of your email indicated that the
problem was at line 893 of l2main.F so the CFFT call is not the
problem. I suggest adding lines such as<br>
Write(**) 'Stephan debug C'<br>
so you can determine exactly where the crash is taking place.
(You can also edit the Makefile so it does not delete the
temporary files, which may help to find the line.) If you look
in the code you will find that there are many of these commented
out.</p>
<p>Also, are you using the latest version? I seem to remember a
blocksize bug for certain sizes that was fixed in the last few
months.</p>
<p>---------------------------<br>
Professor Laurence Marks<br>
Department of Materials Science and Engineering<br>
Northwestern University<br>
<a moz-do-not-send="true"
href="http://www.numis.northwestern.edu">www.numis.northwestern.edu</a>
1-847-491-3996<br>
"Research is to see what everybody else has seen, and to think
what nobody else has thought"<br>
Albert Szent-Gyorgi</p>
<div class="gmail_quote">On Oct 25, 2012 6:47 AM, "Stefaan
Cottenier" <<a moz-do-not-send="true"
href="mailto:Stefaan.Cottenier@ugent.be" target="_blank">Stefaan.Cottenier@ugent.be</a>>
wrote:<br type="attribution">
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
Dear wien2k community,<br>
<br>
I do not succeed to get wien2k running flawlessly on our
university<br>
cluster (Intel Xeon Harpertown (L5420)). For some cases, a
reproducible<br>
segmentation fault error appears in lapw2. Our very capable
sysadmins<br>
gave up, and blame it to 'a wien2k coding problem'. That's why
I want to<br>
describe the problem for you:<br>
<br>
A) Description of the problem:<br>
<br>
* It is a "forrtl: severe (174): SIGSEGV, segmentation fault
occurred"<br>
error, which appears in lapw2 with FOR in case.in2 (never with
TOT). The<br>
full screen output (compiled with ifort, including -g
-traceback) for<br>
k-point parallelization over 2 cores is:<br>
<br>
LAPW2 - FERMI; weighs written<br>
forrtl: severe (174): SIGSEGV, segmentation fault occurred<br>
Image PC Routine Line
Source<br>
lapw2 0000000000484D28 l2main_
893<br>
l2main_tmp_.F<br>
lapw2 00000000004A1C2D MAIN__
564<br>
lapw2_tmp_.F<br>
lapw2 0000000000403C4C Unknown
Unknown Unknown<br>
libc.so.6 000000300081D994 Unknown
Unknown Unknown<br>
lapw2 0000000000403B59 Unknown
Unknown Unknown<br>
forrtl: severe (174): SIGSEGV, segmentation fault occurred<br>
Image PC Routine Line
Source<br>
lapw2 0000000000484D28 l2main_
893<br>
l2main_tmp_.F<br>
lapw2 00000000004A1C2D MAIN__
564<br>
lapw2_tmp_.F<br>
lapw2 0000000000403C4C Unknown
Unknown Unknown<br>
libc.so.6 000000300081D994 Unknown
Unknown Unknown<br>
lapw2 0000000000403B59 Unknown
Unknown Unknown<br>
<br>
* It appears only for a limited number of cases (say 20% of
all the ones<br>
I tried). The others run just fine.<br>
<br>
* The problem appears only in parallel runs. If a case shows
the<br>
problem, one additional serial iteration is sufficient to
complete the<br>
scf-cycle.<br>
<br>
* If the problem appears, it can be reproduced only by
'run_lapw -p'. If<br>
one tries a manual 'parallel' execution as hereunder (which I
thought<br>
should execute exactly the same processes), the error does no
show up:<br>
<br>
lapw0 lapw0.def<br>
lapw1 lapw1.def [1]<br>
lapw2 lapw2.def [1]<br>
lapw1 lapw1.def [2]<br>
lapw2 lapw2.def [2]<br>
...<br>
<br>
<br>
B) Detailed analysis<br>
<br>
Trying different compiler versions was the first guess. Three
different<br>
ifort versions were tested (including the celebrated
2011.3.174 that was<br>
reported on the wien2k mailing list to work fine for v12.1),
but all<br>
result in the same error:<br>
<br>
v2011.1.073<br>
v2011.3.174<br>
v2011.10.319<br>
<br>
Next, I searched for the possible reason by going through all
steps<br>
described at the following link (a very useful piece of
information for<br>
this mailing list, I suggest to mention it in the FAQ):<br>
<br>
<a moz-do-not-send="true"
href="http://software.intel.com/en-us/articles/determining-root-cause-of-sigsegv-or-sigbus-errors/"
target="_blank">http://software.intel.com/en-us/articles/determining-root-cause-of-sigsegv-or-sigbus-errors/</a><br>
<br>
All steps described there lead to no improvement up to the
first half of<br>
"possible cause #5". The second test described in #5 yields
something,<br>
however. When compiling with the additional options<br>
<br>
-fp-stack-check -g -traceback -gen-interfaces -warn interfaces<br>
<br>
there is the following compile crash for lapw2 :<br>
<br>
c3fft_tmp_.F(267): error #6633: The type of the actual
argument differs<br>
from the type of the dummy argument. [WSAVE]<br>
CALL CFFTB1 (N,C,WSAVE,WSAVE(IW1),WSAVE(IW2))<br>
----------------------------------------^<br>
compilation aborted for c3fft_tmp_.F (code 1)<br>
<br>
When searching the wien2k mailing list for c3fft, it turns out
there had<br>
been problems before with this routine, and an updated version
had been<br>
provided one year ago (=before v12.1):<br>
<br>
<a moz-do-not-send="true"
href="http://zeus.theochem.tuwien.ac.at/pipermail/wien/2011-April/014541.html"
target="_blank">http://zeus.theochem.tuwien.ac.at/pipermail/wien/2011-April/014541.html</a><br>
<br>
It seems to have been a different problem, however, and both
the present<br>
version and that (slightly different) version of april 2011
give the<br>
same compilation error.<br>
<br>
Can anyone use this information to find a solution?<br>
<br>
Thanks !<br>
<br>
Stefaan<br>
<br>
_______________________________________________<br>
Wien mailing list<br>
<a moz-do-not-send="true"
href="mailto:Wien@zeus.theochem.tuwien.ac.at"
target="_blank">Wien@zeus.theochem.tuwien.ac.at</a><br>
<a moz-do-not-send="true"
href="http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien"
target="_blank">http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien</a><br>
</blockquote>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
Wien mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Wien@zeus.theochem.tuwien.ac.at">Wien@zeus.theochem.tuwien.ac.at</a>
<a class="moz-txt-link-freetext" href="http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien">http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien</a>
</pre>
</blockquote>
<br>
</body>
</html>