[Wien] lapw2 mpi parallelization limits

Peter Blaha pblaha at theochem.tuwien.ac.at
Wed Mar 18 07:38:24 CET 2009


Your fix is a good choice, since lapw2 is fast anyway.

I suspect a bit that it might have to do with the specific test case.

Do you have the same problems with another test (you may need to ask 
your research group for a big one) ?

Eventually you can send me the basic input data (*.in* *.struct *.klist 
case.clmsum/up/dn (if spin-polarized)) as tar file (to my private email)
and I'll try this case. One never knows which strange bugs are present.

Reading and interpreting the strace is not straightforward for me. It 
seems to happen in subroutine atpar (reading...)? I prefer good old 
"print*, ..." statements and after a couple of tests I can trace the 
lines where the problem appears.


> 
>> How many different k-points and how many different atoms are you using?
> 
> How do I find this? I'm the sys admin not the researcher. I'm using a
> config given to me by the researcher. I think it is 4 atoms (Ce, Co, In,
> In) and 84 k-points but I'm not positive. Only one parallel job (ie
> strictly mpi parallelization). I believe it was from an example taken
> from a presentation.
> 
>> No ideas at the moment beyond turn on all plausible debug flags (not
>> fun); someone like Peter Blaha may have some suggestions tomorrow.
> 
> See the strace I sent before. I enabled debug and csh tracing (-xv) but
> it takes me straight to mpirun when then crashes.
> 
> Incidentally, I used the attached patch to lure WIEN into running. It
> forces a maximum of 4cpus during the lapw2 stage. Ugly but it will work
> for us until the underlying problem gets solved.
> 
> Scott
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien


More information about the Wien mailing list