[Wien] Error in lapw1para_lapw script causing errors when running parallel lapw2
"Paweł Leśniak, IFMPAN"
lesniak at ifmpan.poznan.pl
Wed Jun 17 18:17:33 CEST 2009
W dniu 2009-06-17 16:59, Peter Blaha pisze:
> Testing this in TiC with 47 k-points:
>
> 4 lines in .machines; granularity:1
>
> testpara_lapw produces:
>
> 1 : homer(11) 11k
> 2 : homer(11) 11k
> 3 : homer(11) 11k
> 4 : homer(11) 11k
> 5 : homer(11) 3k
>
> and also x lapw1 -p / x lapw2 -p runs fine.
>
OK, let's assume test case of TiC (still the same problem as with TiO2
test case).
It depends on what you have in line 444th of lapw1para_lapw.
430 set kold = $kbegin
431 if ($loop > $multi && $?extrafine) then
432 @ head = $kbegin
433 set tail = 1
434 @ kbegin = $kbegin + 1
435 else
436 @ head = $kbegin + $weigh[$p] - 1
437 set tail = $weigh[$p]
438 @ kbegin = $kbegin + $weigh[$p]
439 endif
440
441
442 if ($head >= $klist) then
443 set head = $klist
444 @ tail = $klist - $kold - 1 # here
445 endif
Generation of 5-th part of klist follows:
$klist = 47, $kbegin = 45, $kold = 45 before line 430th
in line 436th, head = 45 + 11 - 1 = 55
in line 437th, tail = 11
in line 438th, kbegin = 45 + 11 = 56
What is the problem, we can see in lines 442-445:
442th: head = 55 >= klist = 47, so we follow lines 443-445.
443rd: head = 47 -> this is fine
444th: tail = 47 - 45 -1 = 1 (while it should be 47 - 45 + 1 = 3)
Of course this will produce only last line from klist, and this will be
k-point number 47. We are missing k-points number 45 and 46.
This error is quite obvious to me, and it's really amazing for me, that
you are getting correct results.
So if you are getting correct split of k-points in lapw1para_lapw, then
1) you have @ tail = $klist - $kold + 1 in line 444th (and also
corrected "same thing" in line 248th of testpara_lapw).
or
2) your (t)csh evaluates expressions from right to left.
or
3) code which I've downloaded from wien2k site is different from the one
you are testing on
> It produces 5 !!! klists, the latter one with the remaining 3 k-points
> and the
> "fastest" cpu will get this junk.
>
> So from my point of view it works perfectly well.
It's very strange. Could you download a copy of code from wien2k site
and check it on freshly downloaded TiC (or other) test case in k-parallel.
We have here 3 different environments (different distributions) of Linux
on x86_64, all producing the same errors.
> > Indeed I am using $SCRATCH variable. I've also checked -it switch
> and it
> > works with k-points splitted 2/2/2/2/1 on 4 cpus.
>
> No, you cannot use iterative diag, because with 4 lines in .machines, but
> actually 5 !! junks, you don't know on which computer the 5th junk
> will be executed
> (it will be the fastest, but that can change from iteration to
> iteration and you will
> not have an old vector file.
Unless $SCRATCH is on distributed filesystem, when each node can see all
parts of vector file ($case.vector_*), I guess.
I'd be very gratefull if you could check what I'm writing about on
freshly downloaded code/data (available to download for registered users).
Pawel Lesniak
More information about the Wien
mailing list