[Wien] lapw2 QTL-B crash with MPI, but not with k-parallel

Tue Jun 17 13:30:27 CEST 2008

I cannot answer all the questions, but will do my best and maybe
others can chip in.

On Tue, Jun 17, 2008 at 5:12 AM, Johan Eriksson <joher at ifm.liu.se> wrote:
>  From a code design point of view, should there be any difference
> between lapwX and lapwX_mpi due to different parameters such as NMATMAX,
> which could give rise to these QTL-B errors? I should mention that
> NMATMAX is not limiting in my case.

I don't think that there is anything related to NMATMAX etc. There
might be an issue with the eigenvectors not being orthogonal in
lapwX_mpi which I suspect (cannot prove) might have an effect upon
upon a later -it calculation.

> You mention the iterative scheme uses a subset of eigenvectors from a
> previous iteration. Is this subset of old eigenvectors smaller or
> different when using MPI compared to k-parallel?

The subset of eigenvectors is the same, and is set at the bottom of
case.in1. In principle there should be no difference. However, I don't
think that the scalapack routines (_mpi) are as accurate as the
lapack. I think (cannot prove) that they are slightly less numerically
stable in some cases. It may also be that if the eigenvectors used are
not quite orthogonal, which they might be, there could be an
accumulation of errors. There is also (99% certain) a bug in the
current web code for iterative lapw1c_mpi.
>
> To be on the safe side I will run using MPI but without the '-it' switch
> in the future.
>
> /Johan
>
> Laurence Marks wrote:
>> This is one of many issues.
>>
>> 1) For mkl 10 make sure that you are using version 10.0.3, the earlier
>> versions of 10.X had some bugs.
>>
>> 2) Make sure that you do not have a problem in your network software.
>> I have a new cluster on which the "official" version of mvapich was
>> installed, and this had a scalapack bug. Their current version (via
>> their equivalent of cvs) works well. For you check the openmpi
>> webpage.
>>
>> 3) For mkl 10 there are some issues with the size of buffer arrays; in
>> essence unless one uses sizes at least those that the Intel code
>> "likes" (via a workspace query call), problems can occur. I think that
>> this is an Intel bug, they probably call it a "feature". While this is
>> probably not a problem for real cases (because of some code changes)
>> and non-iterative calculations, it may still be in the current version
>> on the web for complex iterative cases.
>>
>> 4) In the iterative versions only a subset of the eigenvectors from a
>> previous iteration are used. If the space of these old eigenvectors
>> does not include a good approximation to a new eigenvalue you may get
>> ghost-bands (QTL-B errors). One workaround is to use more old
>> eigenvectors, i.e. increase nband at the bottom of case.in1 or
>> case.in1c.
>>
>> 5) If 4) does not work (it does not always), consider using LAPW for
>> some of the states. For instance, with relatively large RMT's (2.0)
>> for d-electron transition elements (e.g. Ni) switching to LAPW rather
>> than APW+lo for the d's stabilized the iterative mode for some
>> calculations.
>>
>> On Fri, Jun 13, 2008 at 2:47 AM, Johan Eriksson <joher at ifm.liu.se> wrote:
>>
>>> Dear Wien community,
>>> I'm running the latest Wien2k release on a linux cluster. IFORT 10.1,
>>> cmkl 9.1, openmpi 1.2.5).
>>> The cases are running fine with k-point parallelization + MPI lapw0.
>>> However, since there are many more cpus than k-points and infiniband
>>> interconnects I want to use full MPI parallelization. First I ran my
>>> case with k-point parallel for a few cycles, stopped, ran clean_lapw and
>>> then switched to MPI. After a few iterations I started getting QTL-B
>>> warnings and it crash. If I switch back to k-point parallel it runs just
>>> fine again.
>>> What am I doing wrong here? Could it be that I'm using the iterative
>>> diagonalization scheme (-it switch)? Should I try some other mkl och MPI
>>> implementation?
>>>
>>> Also, why is it that the serial benchmark 'x lapw1 -c' is so unstable
>>> with mkl 10 then using OMP_NUM_THREADS>=4? With cmkl 9.1 it works fine
>>> with 1,2,4 and 8 threads. When mkl 10 works it is however faster than
>>> cmkl 9.1.
>>>
>>>
>>>
>>> /Johan Eriksson
>>> _______________________________________________
>>> Wien mailing list
>>> Wien at zeus.theochem.tuwien.ac.at
>>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>>>
>>>
>>
>>
>>
>>
>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>

-- 
Laurence Marks
Department of Materials Science and Engineering
MSE Rm 2036 Cook Hall
2220 N Campus Drive
Northwestern University
Evanston, IL 60208, USA
Tel: (847) 491-3996 Fax: (847) 491-7820
email: L-marks at northwestern dot edu
Web: www.numis.northwestern.edu
Commission on Electron Diffraction of IUCR
www.numis.northwestern.edu/IUCR_CED