[Wien] lapw2 mpi parallelization limits

Laurence Marks L-marks at northwestern.edu
Tue Mar 17 22:05:03 CET 2009


You clearly are more knowledgable about your OS than I am.

1) I mentioned something like mvapich since it's worked well for me
while mpich has not and in fact the version I got form my vendor
(ofed) was broken; I've never used openmpi.
2) I'm not sure you'll find anyone on this mailing list who knows as
much about nfs as you do
3) I don't know the details about what the other codes do with
information. I suspect that some (many) use mpi writes. Wien2k does
not and uses a lot of standard fortran I/O so is very susceptible to
NFS problems. Digging back through the list the bug I am referring to
is in kernels 2.6.X compiled before November 2005. You probably have a
later version than this. (I tried to find the original emails back in
2006 about this but failed, sorry.)

On Tue, Mar 17, 2009 at 3:20 PM, Scott Beardsley <scott at cse.ucdavis.edu> wrote:
> Laurence Marks wrote:
>> NFS can be very buggy.
>
> Not if configured correctly. ;) It works as advertised in my experience.
>
>> If you have an old OS, there is a definite
>> possibility that you may have a broken NFS.
>
> CentOS 5. It is old but not *that* old. And NFS is not broken afaict.
> BTW, when I run an 8cpu k-point parallelization WIEN job it too is using
> NFS.
>
>> There is a delay
>> parameter set in parallel_options and if you increase this it might
>> help or at least give some ideas.
>
> Are there any docs on the parallel_options settings?
>
>> (I know of one computer cluster at a
>> US national lab which I stopped using 12 months ago because it's NFS
>> was completely broken.)
>
> Byte-range locking won't work in NFSv3 (the version I'm using). You
> can't run a database over NFS. NFS is not "normal" filesystem. If you
> have any specific configuration recommendations I'm all ears. Does WIEN
> do byte-range locking?
>
> I can also use a local scratch disk if the WIEN devs prefer, just give
> me some pointers on the way to configure this. Is there a --tmp=/foo
> setting somewhere? Do I have to enable USE_REMOTE? I didn't want to have
> to resort to using ssh (because I'm using OMPI with tight integration)
> but I'll set it up to test at least.
>
>> Do you have root access,
>
> Yep.
>
>> so you can use something more advanced than openmpi
>
> Heh. OMPI is arguably the industry standard these days. It is tightly
> integrated with SGE, is recommended by our hardware vendor (QLogic), is
> interconnect independent (ie compile once, run anywhere), supports
> mpiexec, has supported our hardware since 2007, has devs that fix
> reported bugs within hours, etc. I have no interest in switching based
> solely on a statement that it isn't advanced enough. I'd need proof.
>
> It might also help to mention that both the lapw0 and lapw1 steps work
> fine on >4 cpus (I can actually see the remote jobs running via top).
> Also, mpi-benchmark works fine on up to 32cpus (4nodes). We've run
> numerous other software (VASP, GROMACS, ABINIT, LAMMPS, Gaussian, etc)
> with this MPI layer. All work fine on >4cpus. The problem I'm seeing is
> with lapw2_mpi _only_.
>
>> Also, was your mpif90 compiled with the same version
>> of the compiler as you are currently using?
>
> Yep.
>
> Scott
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>



-- 
Laurence Marks
Department of Materials Science and Engineering
MSE Rm 2036 Cook Hall
2220 N Campus Drive
Northwestern University
Evanston, IL 60208, USA
Tel: (847) 491-3996 Fax: (847) 491-7820
email: L-marks at northwestern dot edu
Web: www.numis.northwestern.edu
Chair, Commission on Electron Crystallography of IUCR
www.numis.northwestern.edu/
Electron crystallography is the branch of science that uses electron
scattering to study the structure of matter.


More information about the Wien mailing list