[Wien] Large memory consumption of MPI k-point parallel version

Oleg Rubel rubel at physik.uni-marburg.de
Mon Apr 7 11:49:01 CEST 2008


Dear Wien2k Community,

I have a question related to the memory consumption of the MPI-version of 
LAPW1, which in the case of k-point parallelization is much larger than 
without k-point parallelization.

I am calculating GaAs surface passivated by pseudohydrogen from one side. 
The number of nonequivalent atoms in the suprcell totals 40. I have 12 
k-points in the irreducible BZ; RKMAX is 2.10 (because of small Rmt of 
hydrogen); there is no inversion; the matrix size is approx. 14200.

I run WIEN2k_08.1 (Release 14/12/2007) using the command

    min -i 100 -s 10 -j 'run_lapw -p -I -i 40 -fc 0.5 -ec 0.0001 -cc 0.001'

once using the following .machines file

    marc-hn:~/wien_work/GaAsBeta2_2x4> cat .machines
    granularity:1
    1:node009 node008 node126 node128
    lapw0:node009:1 node008:1 node126:1 node128:1

In that case the program needs about 3 GB of memory, but takes a lot of 
time (12 k-points). So I decided to include the k-point parallelization 
and used the new .machines file

    marc-hn:~/wien_work/GaAsBeta2_2x4> cat .machines
    granularity:1
    1:node009 node008 node126 node128
    1:node130 node127 node131 node118
    1:node124 node136 node132 node135
    1:node120 node119 node134 node125
    lapw0:node009:1 node008:1 node126:1 node128:1 node130:1 node127:1 node131:1 node118:1 node124:1 node136:1 node132:1 node135:1 node120:1 node119:1 node134:1 node125:1

My naive thought was that 4 GB would be enough. But it turns out that 9 GB 
was not enough. Why???

With the limit of 9 GB the job is killed by the queuing system, since 
LAPW1 wants to have more than 9 GB, although the sequential version runs 
fine with 4 GB limit. This is a tail of GaAsBeta2_2x4.output1_1 file:

    Matrix size        14201
    scalapack processors array (row,col):   2   2
              allocate H       782.5 MB          dimensions  7161  7161
              allocate S       782.5 MB          dimensions  7161  7161
         allocate spanel        14.0 MB          dimensions  7161   128
         allocate hpanel        14.0 MB          dimensions  7161   128
       allocate spanelus        14.0 MB          dimensions  7161   128
           allocate slen         7.0 MB          dimensions  7161   128
             allocate x2         7.0 MB          dimensions  7161   128
       allocate legendre        90.9 MB          dimensions  7161    13   128
    allocate al,bl (row)         2.4 MB          dimensions  7161    11
    allocate al,bl (col)         0.0 MB          dimensions   128    11
             allocate YL         3.5 MB          dimensions    15  7161     2
    Time for al,bl    (hamilt) :         12.4
    Time for legendre (hamilt) :          6.4
    Time for phase    (hamilt) :        124.7
    Time for us       (hamilt) :         79.9
    Time for overlaps (hamilt) :        275.4
    Time for distrib  (hamilt) :          2.0
    Time for iouter   (hamilt) :        504.4
     number of local orbitals, nlo (hamilt)      744
    Time for los      (hamilt) :          8.7
    Time for alm         (hns) :          5.3
    Time for vector      (hns) :         38.7
    Time for vector2     (hns) :         37.0
    Time for VxV         (hns) :        811.1
    Wall Time for VxV    (hns) :        885.8
    ********* end of GaAsBeta2_2x4.output1_1 ************

I do not see where 9 GB comes from and why the memory requirement of the 
k-point parallel version is so different from the sequential one?

I will be thankful for any pointers.

Oleg Rubel


P.S.
Some system details:
CPU(s): Dual Opteron 270 (DualCore 2.0GHz)
Operating System: Debian GNU/Linux v4.0 ("etch")
Queuing System: SUN GridEngine 6.0u9
Compiler: ifort 10.0
Libraries: ScaLAPACK-1.8.0 from netlib; the rest is MKL 10.0


More information about the Wien mailing list