[Wien] Compiling lapw1_mpi with HP-mpi and MKL

Tue Jul 7 16:02:01 CEST 2009

This message (and attachments) is subject to restrictions and a disclaimer. Please refer to http://www.unisa.ac.za/disclaimer for full details.
________________________________

Dear Wien2k users and authors

We are trying to compile mpi-parallel Wien2k lapw1/2 on an infiniband system, but have not been successful up to now.

We would appreciate an indication of which combinations of mpi-library, math-library and compiler are known to work on infini-band systems?  Also what scaling has been achieved on such systems up to now?

Currently, we are compiling using different scenarios:

1. HP-MPI v2.3.1, Intel Fortran v 11.0 and MKL :  In this case the code compiles without error messages, but lapw1 crashes immediately with numerous segfaults.

2. Still using HP-MPI, with Intel Fortran v11.0, but with selfcompiled ScaLAPACK+BLAS in addition to the Intel MKL, this also compiles smoothly. However lapw1_mpi runtime behaviour depends on how the parallelization is done [mix of mpi+k-parallelization], with some cases resulting in seeming smooth runs, but crashes in lapw2: dnlapw2_XX.error files containing 'l2main' - QTL-B.GT.15., Ghostbands, check scf files".  while other combinations of k-point vs mpi-parallelization result in hanging lapw1_mpi jobs which never complete (0% CPU usage, which later segfault).

Note that 'serial' Wien2k (k-point parallelization) always works smoothly.

It would be appreciated if we could obtain known working link/compile options for mpi-parallel lapwX on infiniband systems:
1. Which MPI libraries were used?
2. Which ScaLAPACK/BLAS, and version?
3. Which Compiler and version?
4. Linking options and mpirun options?

Please let me know if there are any additional details which are needed.

Any assistance would be appreciated.

Thank you
Regards
Enrico Lombardi

NOTES ON INPUT:
In all cases the tests are based on the standard mpi-parallel benchmark, but increasing the number of k-points to match number of nodes (and first initializing the calculation in the usual way to be able to complete SCF cycles, not just lapw1).

.machines files used:
K-point parallelization only:
1:node1
1:node1
...
1:node2
1:node2
...

mpi-parallelization only:
1:node1:8 node2:8 node3:8  node4:8 .....

mixture of mpi and k-point parallelization:
1:node1:8 node2:8 node3:8 .....
1:node9:8 node10:8 node11:8 ....
....

--
Dr E B Lombardi
Physics Department
University of South Africa
P.O. Box 392
UNISA 0003
Pretoria
South Africa

Tel: 012 429 8654 / 8027
Fax: 012 429 3643
E-mail: lombaeb at unisa.ac.za<mailto:lombaeb at science.unisa.ac.za>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20090707/5699cbf2/attachment.html>