<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

</head>

<body bgcolor="#ffffff" text="#000000">

Dear Wien2k users and authors<br>

<br>

We are trying to compile mpi-parallel Wien2k lapw1/2 on an infiniband

system, but have not been successful up to now.<br>

<br>

We would appreciate an indication of which combinations of mpi-library,

math-library and compiler are known to work on infini-band systems?&nbsp;

Also what scaling has been achieved on such systems up to now?<br>

<br>

Currently, we are compiling using different scenarios:<br>

<br>

1. HP-MPI v2.3.1, Intel Fortran v 11.0 and MKL :&nbsp; In this case the code

compiles without error messages, but lapw1 crashes immediately with

numerous segfaults.<br>

<br>

2. Still using HP-MPI, with Intel Fortran v11.0, but with selfcompiled

ScaLAPACK+BLAS in addition to the Intel MKL, this also compiles

smoothly. However lapw1_mpi runtime behaviour depends on how the

parallelization is done [mix of mpi+k-parallelization], with some cases

resulting in seeming smooth runs, but crashes in lapw2:

dnlapw2_XX.error files containing <span

 style="font-family: &quot;Courier New&quot;;">'l2main'

- QTL-B.GT.15., Ghostbands, check scf files"</span>.&nbsp; while other

combinations of k-point vs mpi-parallelization result in hanging

lapw1_mpi jobs which never complete (0% CPU usage, which later

segfault).<br>

<br>

Note that 'serial' Wien2k (k-point parallelization) always works

smoothly.<br>

<br>

It would be appreciated if we could obtain known working link/compile

options for mpi-parallel lapwX on infiniband systems:<br>

1. Which MPI libraries were used?<br>

2. Which ScaLAPACK/BLAS, and version?<br>

3. Which Compiler and version?<br>

4. Linking options and mpirun options?<br>

<br>

Please let me know if there are any additional details which are needed.<br>

<br>

Any assistance would be appreciated.<br>

<br>

Thank you<br>

Regards<br>

Enrico Lombardi<br>

<br>

NOTES ON INPUT:<br>

In all cases the tests are based on the standard mpi-parallel

benchmark, but increasing the number of k-points to match number of

nodes (and first initializing the calculation in the usual way to be

able to complete SCF cycles, not just lapw1).<br>

<br>

.machines files used:<br>

K-point parallelization only:<br>

1:node1<br>

1:node1<br>

...<br>

1:node2<br>

1:node2<br>

...<br>

<br>

mpi-parallelization only:<br>

1:node1:8 node2:8 node3:8&nbsp; node4:8 .....<br>

<br>

mixture of mpi and k-point parallelization:<br>

1:node1:8 node2:8 node3:8 .....<br>

1:node9:8 node10:8 node11:8 ....<br>

....<br>

<br>

<pre class="moz-signature" cols="72">-- 

Dr E B Lombardi

Physics Department

University of South Africa

P.O. Box 392

UNISA 0003

Pretoria

South Africa

Tel: +27 (0)12 429 8027

Fax: +27 (0)12 429 3643

E-mail: <a class="moz-txt-link-abbreviated" href="mailto:lombaeb@science.unisa.ac.za">lombaeb@science.unisa.ac.za</a>

</pre>

</body>

</html>