[Wien] MIXER runtime error + solution on Mac OS X
Kevin Jorissen
kevinjorissenpdx at gmail.com
Mon Sep 1 08:12:42 CEST 2014
Hi Laurence,
thanks for your comments.
I hope I didn't call the issue we observed a code bug -- I meant to use
unsensational language and avoid assumptions. For sure this could be a
problem on the Mac side or in ifort (we all know these exist). I haven't
edited the W2kutils. But didn't we fix the Mac problems with that file a
few years ago? In any case, I'm not using MPI and stacksize is set to
unlimited in my shell startup file, so I doubt this is the culprit. Or
could the W2kutils somehow override my shell startup configuration?
It's probably not urgent since we have a remedy that will do for now. If
you can think of any tests you'd like to see done on Mac, let us know.
By the way, this W2kutils thing is ***NOT*** on the list of known issues
and bugs on the WIEN2k website. It would be very, very valuable and
time-saving if that list could be updated to reflect the knowledge inside
the experts' heads.
Cheers,
Kevin
On Sun, Aug 31, 2014 at 10:47 PM, Laurence Marks <L-marks at northwestern.edu>
wrote:
> I am currently at a conference in Montenegro, so don't have enough time to
> check properly. While this could be a code bug, I suspect an OS bug
> connected to the known problem in W2kutils for Mac of setting the stack
> size. Do you have this commented out?
>
> To expand, the reason W2kutils sets the stack size is because this was a
> very common problem (look at the mail list some years ago for ulimit), some
> sys_admins were setting it too low and openmpi was not by default passing
> ulimit values. If it is not large enough problems occur. The argument you
> are using -heap-arrays puts arrays onto disc (it is similar to the
> Fortran save command). This is slower, although this does not matter much
> in mixer.
>
> Unless you can identify something specific, I am not sure what I can do as
> I have no access to Mac. Maybe run mixer using ddd (or gdb) ? As one
> caveat, with this type of issue sometimes it does not show up at the source.
>
> N.B. mixer is a bit of a memory hog, and sometime I should try and clean
> up some of the arrays. Unfortunately this is hard with code that is
> changing.
>
>
> On Sun, Aug 31, 2014 at 6:30 PM, Kevin Jorissen <
> kevinjorissenpdx at gmail.com> wrote:
>
>> Thanks, Martin, for sharing some advanced ideas.
>>
>> I spent a few minutes trying to find out more, throwing a diagnostic
>> compile line at the problem :
>>
>> -gen-interfaces -warn interfaces -fp-stack-check -g -traceback -check
>> arg_temp_created -check bounds
>> trying to catch anything potentially suspicious. The problem with most
>> codes I've worked on is that you typically catch a bunch of unrelated
>> things that obscure the analysis :). In this case, e.g., the argument F to
>> TrustStep (called before the NormS mentioned earlier) is an allocated array
>> on one side and implicit on the other, and that offends the compile options
>> above. I don't have much time for analysis right now - maybe the mixer
>> developers will immediately spot what's going on in my earlier e-mail.
>> "check bounds" or "check all" by themselves don't give any runtime
>> diagnostics, so I'm guessing we're not overstepping array bounds explicitly.
>>
>> If you have a more specific idea for a test, I or maybe Jianxin can try
>> to run it for you. I guess a basic one would be to just do the run_lapw
>> calculation on Linux vs. Mac (with -heap-arrays) and see if the results are
>> identical.
>>
>> Cheers,
>>
>> Kevin
>>
>>
>>
>>
>>
>> On Sun, Aug 31, 2014 at 4:29 PM, Martin Kroeker <
>> martin at ruby.chemie.uni-freiburg.de> wrote:
>>
>>> This might warrant closer scrutiny - was it reproducible with any odd
>>> tutorial problem, or does it require a particular case or type of
>>> calculation ?
>>> The "illegal instruction" abort signals that data was somehow spilling
>>> over into the memory ranges holding the executable code. Now I would not
>>> expect a "simple" heap-stack-collision (from an array that is simply too
>>> big to put on the stack with impunity) to occur on any modern system
>>> except perhaps severely constrained embedded ones. At worst, the abort
>>> should have been accompanied by a "segmentation fault" message as the
>>> attempt to overwrite the running program got caught. So other possible
>>> explanations could be that the code tries to store more array elements
>>> than the array was designed to hold, or that the indexes into the array
>>> are miscalculated (overflowing or not clamped to positive values).
>>> Moving data to the heap may have just changed the location of the
>>> inadvertently overwritten memory to ranges where the effects are more
>>> subtle (unrelated data) or not noticable (lucky hit on unused memory).
>>> --
>>> Dr. Martin Kroeker martin at ruby.chemie.uni-freiburg.de
>>> c/o Prof.Dr. Caroline Roehr
>>> Institut fuer Anorganische und Analytische Chemie der Universitaet
>>> Freiburg
>>>
>>> _______________________________________________
>>> Wien mailing list
>>> Wien at zeus.theochem.tuwien.ac.at
>>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>>> SEARCH the MAILING-LIST at:
>>> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>>>
>>
>>
>
>
> --
> Professor Laurence Marks
> Department of Materials Science and Engineering
> Northwestern University
> www.numis.northwestern.edu
> Corrosion in 4D: MURI4D.numis.northwestern.edu
> Co-Editor, Acta Cryst A
> "Research is to see what everybody else has seen, and to think what nobody
> else has thought"
> Albert Szent-Gyorgi
>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20140831/99c728b8/attachment.htm>
More information about the Wien
mailing list