[Wien] lapw2 mpi parallelization limits

Scott Beardsley scott at cse.ucdavis.edu
Wed Mar 18 00:50:13 CET 2009


Laurence Marks wrote:
> Do a "rm *.rec" before running lapw2

$ rm *.rec *.broyd*
rm: cannot remove `*.rec': No such file or directory
$

> and also a ls -l *.error if it fails.

$ ls -l *.error
-rw-rw-r-- 1 sbeards sbeards  0 Mar 12 11:12 dndstart.error
-rw-rw-r-- 1 sbeards sbeards  0 Mar 17 16:14 dnlapw1_1.error
-rw-rw-r-- 1 sbeards sbeards  0 Mar 17 16:14 dnlapw1.error
-rw-rw-r-- 1 sbeards sbeards  0 Mar 17 16:08 dnlapwdm_1.error
-rw-rw-r-- 1 sbeards sbeards  0 Mar 17 16:08 dnlapwdm.error
-rw-rw-r-- 1 sbeards sbeards  0 Mar 17 16:08 dnlcore.error
-rw-rw-r-- 1 sbeards sbeards  0 Mar 17 16:13 dnorb.error
-rw-rw-r-- 1 sbeards sbeards  0 Mar 17 16:08 dnsumpara.error
-rw-rw-r-- 1 sbeards sbeards  0 Mar 12 11:12 dstart.error
-rw-rw-r-- 1 sbeards sbeards  0 Mar 17 16:12 lapw0.error
-rw-rw-r-- 1 sbeards sbeards  0 Mar 17 16:08 mixer.error
-rw-rw-r-- 1 sbeards sbeards  0 Mar 12 11:12 updstart.error
-rw-rw-r-- 1 sbeards sbeards 15 Mar 17 16:14 uplapw2_1.error
-rw-rw-r-- 1 sbeards sbeards 39 Mar 17 16:14 uplapw2.error
-rw-rw-r-- 1 sbeards sbeards  0 Mar 17 16:08 uplcore.error
-rw-rw-r-- 1 sbeards sbeards  0 Mar 17 16:13 uporb.error
-rw-rw-r-- 1 sbeards sbeards  0 Mar 17 16:08 upsumpara.error
$ cat *.error
Error in LAPW2
**  testerror: Error in Parallel LAPW2
$

> How many different k-points and how many different atoms are you using?

How do I find this? I'm the sys admin not the researcher. I'm using a
config given to me by the researcher. I think it is 4 atoms (Ce, Co, In,
In) and 84 k-points but I'm not positive. Only one parallel job (ie
strictly mpi parallelization). I believe it was from an example taken
from a presentation.

> No ideas at the moment beyond turn on all plausible debug flags (not
> fun); someone like Peter Blaha may have some suggestions tomorrow.

See the strace I sent before. I enabled debug and csh tracing (-xv) but
it takes me straight to mpirun when then crashes.

Incidentally, I used the attached patch to lure WIEN into running. It
forces a maximum of 4cpus during the lapw2 stage. Ugly but it will work
for us until the underlying problem gets solved.

Scott
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lapw2para.patch
Type: text/x-patch
Size: 1091 bytes
Desc: not available
Url : http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20090318/2c8231e5/lapw2para.bin


More information about the Wien mailing list