[Wien] Systematic slowing down of calculations with time

Laurence Marks L-marks at northwestern.edu
Tue Mar 19 14:12:09 CET 2013


Minor correction, x-axis is iteration*4

On Tue, Mar 19, 2013 at 8:11 AM, Laurence Marks
<L-marks at northwestern.edu> wrote:
> I have a reproducible slowing down of calculations which appears to be
> in lapw1 due to something (memory leak,?) which is going to be hard to
> track down so I welcome suggestions.
>
> I first noticed it when one newish E5-2660 node was systematically
> running at ~1/2 the speed of others, reproducibly. After rebooting it
> went back to running at the same speed as others.
>
> I have now reproduced a systematic slowing down of lapw1 (I cannot see
> anything in lapw2) for a long calculation (-it -noHinv, but I don't
> think this matters). It is shown in the attached with the x axis
> iteration, the y axis time in minutes. (The image may get shuffled to
> a link by the listserver software.) Starting from ~ 7minutes the
> slowdown is approximately 8 seconds/iteration. This is a fairly big
> calculation with a matrix size of 45456 and 835m/core (virtual)
> running on 64 cores. There is no indication that this is
> communications related, the slowdown is in CPU and WALL remains very
> close to this.
>
> Obviously recompiling with debug on is not going to be a viable
> approach. Also a scatter debug strategy, for instance trying to add
> calls to release memory from mkl calls is going to be very painful as
> we are talking about ~1 day to test. Ideal is innovative ideas to
> trace down why it has gone slow.
>
> Ideas?
>
> For reference, I am using composer_xe_2013.2.146 and Intel impi. I
> don't see this on older E5410 nodes but I have not run enough
> iterations to notice.
>
> N.B., others might want to look in long recent runs to see if they
> also have evidence for this.
>
> --
> Professor Laurence Marks
> Department of Materials Science and Engineering
> Northwestern University
> www.numis.northwestern.edu 1-847-491-3996
> "Research is to see what everybody else has seen, and to think what
> nobody else has thought"
> Albert Szent-Gyorgi



-- 
Professor Laurence Marks
Department of Materials Science and Engineering
Northwestern University
www.numis.northwestern.edu 1-847-491-3996
"Research is to see what everybody else has seen, and to think what
nobody else has thought"
Albert Szent-Gyorgi


More information about the Wien mailing list