[Wien] Error in lapw1para_lapw script causing errors when running parallel lapw2

Wed Jun 17 22:34:14 CEST 2009

Of course I do have the "official" lapw1para_lapw.

And it has a line 444 with minus signs:

             @ tail = $klist - $kold - 1 # here

If I put an    echo $tail,$klist,$kold
after that line it gives on my tcs (and apparently on all other WIEN2k 
users so far:
3,45,45

So, obviously the tcsh evaluates this "wrong" (in terms of mathematics),
but "correct" (in terms what it should do).
One can verify this on the commandline:
set a=10
set b=5
@ c = $a - $b
echo $c        #   will give 5
@ c ? $a - $b - 1
echo $c    # gives 6 !!!!

What is your Linux ? tcsh (csh ?) version.

> It depends on what you have in line 444th of lapw1para_lapw.
> 
> 430         set kold = $kbegin
> 431         if ($loop > $multi && $?extrafine) then
> 432             @ head = $kbegin
> 433             set tail = 1
> 434             @ kbegin = $kbegin + 1
> 435         else
> 436             @ head = $kbegin + $weigh[$p] - 1
> 437             set tail = $weigh[$p]
> 438             @ kbegin = $kbegin + $weigh[$p]
> 439         endif
> 440
> 441
> 442         if ($head >= $klist) then
> 443             set head    = $klist
> 444             @ tail = $klist - $kold - 1 # here
> 445         endif
> 
> Generation of 5-th part of klist follows:
> $klist = 47, $kbegin = 45, $kold = 45 before line 430th
> in line 436th, head = 45 + 11 - 1 = 55
> in line 437th, tail = 11
> in line 438th, kbegin = 45 + 11 = 56
> 
> What is the problem, we can see in lines 442-445:
> 442th: head = 55 >= klist = 47, so we follow lines 443-445.
> 443rd: head = 47     -> this is fine
> 444th: tail = 47 - 45 -1 = 1    (while it should be 47 - 45 + 1 = 3)
> Of course this will produce only last line from klist, and this will be 
> k-point number 47. We are missing k-points number 45 and 46.
> This error is quite obvious to me, and it's really amazing for me, that 
> you are getting correct results.
> 
> So if you are getting correct split of k-points in lapw1para_lapw, then
> 1) you have @ tail = $klist - $kold + 1   in line 444th  (and also 
> corrected "same thing" in line 248th of testpara_lapw).
> or
> 2) your (t)csh evaluates expressions from right to left.
> or
> 3) code which I've downloaded from wien2k site is different from the one 
> you are testing on
>> It produces 5 !!! klists, the latter one with the remaining 3 k-points 
>> and the
>> "fastest" cpu will get this junk.
>>
>> So from my point of view it works perfectly well.
> It's very strange. Could you download a copy of code from wien2k site 
> and check it on freshly downloaded TiC (or other) test case in k-parallel.
> 
> We have here 3 different environments (different distributions) of Linux 
> on x86_64, all producing the same errors.
>> > Indeed I am using $SCRATCH variable. I've also checked -it switch 
>> and it
>> > works with k-points splitted 2/2/2/2/1  on 4 cpus.
>>
>> No, you cannot use iterative diag, because with 4 lines in .machines, but
>> actually 5 !! junks, you don't know on which computer the 5th junk 
>> will be executed
>> (it will be the fastest, but that can change from iteration to 
>> iteration and you will
>> not have an old vector file.
> Unless $SCRATCH is on distributed filesystem, when each node can see all 
> parts of vector file ($case.vector_*), I guess.
> 
> 
> I'd be very gratefull if you could check what I'm writing about on 
> freshly downloaded code/data (available to download for registered users).
> 
> 
> Pawel Lesniak
> 
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien