[Wien] lapw2 mpi parallelization limits

Scott Beardsley scott at cse.ucdavis.edu
Tue Mar 17 21:20:56 CET 2009


Laurence Marks wrote:
> NFS can be very buggy.

Not if configured correctly. ;) It works as advertised in my experience.

> If you have an old OS, there is a definite
> possibility that you may have a broken NFS.

CentOS 5. It is old but not *that* old. And NFS is not broken afaict.
BTW, when I run an 8cpu k-point parallelization WIEN job it too is using
NFS.

> There is a delay
> parameter set in parallel_options and if you increase this it might
> help or at least give some ideas.

Are there any docs on the parallel_options settings?

> (I know of one computer cluster at a
> US national lab which I stopped using 12 months ago because it's NFS
> was completely broken.)

Byte-range locking won't work in NFSv3 (the version I'm using). You
can't run a database over NFS. NFS is not "normal" filesystem. If you
have any specific configuration recommendations I'm all ears. Does WIEN
do byte-range locking?

I can also use a local scratch disk if the WIEN devs prefer, just give
me some pointers on the way to configure this. Is there a --tmp=/foo
setting somewhere? Do I have to enable USE_REMOTE? I didn't want to have
to resort to using ssh (because I'm using OMPI with tight integration)
but I'll set it up to test at least.

> Do you have root access,

Yep.

> so you can use something more advanced than openmpi

Heh. OMPI is arguably the industry standard these days. It is tightly
integrated with SGE, is recommended by our hardware vendor (QLogic), is
interconnect independent (ie compile once, run anywhere), supports
mpiexec, has supported our hardware since 2007, has devs that fix
reported bugs within hours, etc. I have no interest in switching based
solely on a statement that it isn't advanced enough. I'd need proof.

It might also help to mention that both the lapw0 and lapw1 steps work
fine on >4 cpus (I can actually see the remote jobs running via top).
Also, mpi-benchmark works fine on up to 32cpus (4nodes). We've run
numerous other software (VASP, GROMACS, ABINIT, LAMMPS, Gaussian, etc)
with this MPI layer. All work fine on >4cpus. The problem I'm seeing is
with lapw2_mpi _only_.

> Also, was your mpif90 compiled with the same version
> of the compiler as you are currently using?

Yep.

Scott


More information about the Wien mailing list