[Wien] MIXER runtime error + solution on Mac OS X
Laurence Marks
L-marks at northwestern.edu
Mon Sep 1 07:47:03 CEST 2014
I am currently at a conference in Montenegro, so don't have enough time to
check properly. While this could be a code bug, I suspect an OS bug
connected to the known problem in W2kutils for Mac of setting the stack
size. Do you have this commented out?
To expand, the reason W2kutils sets the stack size is because this was a
very common problem (look at the mail list some years ago for ulimit), some
sys_admins were setting it too low and openmpi was not by default passing
ulimit values. If it is not large enough problems occur. The argument you
are using -heap-arrays puts arrays onto disc (it is similar to the Fortran
save command). This is slower, although this does not matter much in mixer.
Unless you can identify something specific, I am not sure what I can do as
I have no access to Mac. Maybe run mixer using ddd (or gdb) ? As one
caveat, with this type of issue sometimes it does not show up at the source.
N.B. mixer is a bit of a memory hog, and sometime I should try and clean up
some of the arrays. Unfortunately this is hard with code that is changing.
On Sun, Aug 31, 2014 at 6:30 PM, Kevin Jorissen <kevinjorissenpdx at gmail.com>
wrote:
> Thanks, Martin, for sharing some advanced ideas.
>
> I spent a few minutes trying to find out more, throwing a diagnostic
> compile line at the problem :
>
> -gen-interfaces -warn interfaces -fp-stack-check -g -traceback -check
> arg_temp_created -check bounds
> trying to catch anything potentially suspicious. The problem with most
> codes I've worked on is that you typically catch a bunch of unrelated
> things that obscure the analysis :). In this case, e.g., the argument F to
> TrustStep (called before the NormS mentioned earlier) is an allocated array
> on one side and implicit on the other, and that offends the compile options
> above. I don't have much time for analysis right now - maybe the mixer
> developers will immediately spot what's going on in my earlier e-mail.
> "check bounds" or "check all" by themselves don't give any runtime
> diagnostics, so I'm guessing we're not overstepping array bounds explicitly.
>
> If you have a more specific idea for a test, I or maybe Jianxin can try
> to run it for you. I guess a basic one would be to just do the run_lapw
> calculation on Linux vs. Mac (with -heap-arrays) and see if the results are
> identical.
>
> Cheers,
>
> Kevin
>
>
>
>
>
> On Sun, Aug 31, 2014 at 4:29 PM, Martin Kroeker <
> martin at ruby.chemie.uni-freiburg.de> wrote:
>
>> This might warrant closer scrutiny - was it reproducible with any odd
>> tutorial problem, or does it require a particular case or type of
>> calculation ?
>> The "illegal instruction" abort signals that data was somehow spilling
>> over into the memory ranges holding the executable code. Now I would not
>> expect a "simple" heap-stack-collision (from an array that is simply too
>> big to put on the stack with impunity) to occur on any modern system
>> except perhaps severely constrained embedded ones. At worst, the abort
>> should have been accompanied by a "segmentation fault" message as the
>> attempt to overwrite the running program got caught. So other possible
>> explanations could be that the code tries to store more array elements
>> than the array was designed to hold, or that the indexes into the array
>> are miscalculated (overflowing or not clamped to positive values).
>> Moving data to the heap may have just changed the location of the
>> inadvertently overwritten memory to ranges where the effects are more
>> subtle (unrelated data) or not noticable (lucky hit on unused memory).
>> --
>> Dr. Martin Kroeker martin at ruby.chemie.uni-freiburg.de
>> c/o Prof.Dr. Caroline Roehr
>> Institut fuer Anorganische und Analytische Chemie der Universitaet
>> Freiburg
>>
>> _______________________________________________
>> Wien mailing list
>> Wien at zeus.theochem.tuwien.ac.at
>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>> SEARCH the MAILING-LIST at:
>> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>>
>
>
--
Professor Laurence Marks
Department of Materials Science and Engineering
Northwestern University
www.numis.northwestern.edu
Corrosion in 4D: MURI4D.numis.northwestern.edu
Co-Editor, Acta Cryst A
"Research is to see what everybody else has seen, and to think what nobody
else has thought"
Albert Szent-Gyorgi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20140901/231550aa/attachment.htm>
More information about the Wien
mailing list