[Wien] Large memory consumption of MPI k-point parallel version
Oleg Rubel
rubel at physik.uni-marburg.de
Mon Apr 7 11:49:01 CEST 2008
Dear Wien2k Community,
I have a question related to the memory consumption of the MPI-version of
LAPW1, which in the case of k-point parallelization is much larger than
without k-point parallelization.
I am calculating GaAs surface passivated by pseudohydrogen from one side.
The number of nonequivalent atoms in the suprcell totals 40. I have 12
k-points in the irreducible BZ; RKMAX is 2.10 (because of small Rmt of
hydrogen); there is no inversion; the matrix size is approx. 14200.
I run WIEN2k_08.1 (Release 14/12/2007) using the command
min -i 100 -s 10 -j 'run_lapw -p -I -i 40 -fc 0.5 -ec 0.0001 -cc 0.001'
once using the following .machines file
marc-hn:~/wien_work/GaAsBeta2_2x4> cat .machines
granularity:1
1:node009 node008 node126 node128
lapw0:node009:1 node008:1 node126:1 node128:1
In that case the program needs about 3 GB of memory, but takes a lot of
time (12 k-points). So I decided to include the k-point parallelization
and used the new .machines file
marc-hn:~/wien_work/GaAsBeta2_2x4> cat .machines
granularity:1
1:node009 node008 node126 node128
1:node130 node127 node131 node118
1:node124 node136 node132 node135
1:node120 node119 node134 node125
lapw0:node009:1 node008:1 node126:1 node128:1 node130:1 node127:1 node131:1 node118:1 node124:1 node136:1 node132:1 node135:1 node120:1 node119:1 node134:1 node125:1
My naive thought was that 4 GB would be enough. But it turns out that 9 GB
was not enough. Why???
With the limit of 9 GB the job is killed by the queuing system, since
LAPW1 wants to have more than 9 GB, although the sequential version runs
fine with 4 GB limit. This is a tail of GaAsBeta2_2x4.output1_1 file:
Matrix size 14201
scalapack processors array (row,col): 2 2
allocate H 782.5 MB dimensions 7161 7161
allocate S 782.5 MB dimensions 7161 7161
allocate spanel 14.0 MB dimensions 7161 128
allocate hpanel 14.0 MB dimensions 7161 128
allocate spanelus 14.0 MB dimensions 7161 128
allocate slen 7.0 MB dimensions 7161 128
allocate x2 7.0 MB dimensions 7161 128
allocate legendre 90.9 MB dimensions 7161 13 128
allocate al,bl (row) 2.4 MB dimensions 7161 11
allocate al,bl (col) 0.0 MB dimensions 128 11
allocate YL 3.5 MB dimensions 15 7161 2
Time for al,bl (hamilt) : 12.4
Time for legendre (hamilt) : 6.4
Time for phase (hamilt) : 124.7
Time for us (hamilt) : 79.9
Time for overlaps (hamilt) : 275.4
Time for distrib (hamilt) : 2.0
Time for iouter (hamilt) : 504.4
number of local orbitals, nlo (hamilt) 744
Time for los (hamilt) : 8.7
Time for alm (hns) : 5.3
Time for vector (hns) : 38.7
Time for vector2 (hns) : 37.0
Time for VxV (hns) : 811.1
Wall Time for VxV (hns) : 885.8
********* end of GaAsBeta2_2x4.output1_1 ************
I do not see where 9 GB comes from and why the memory requirement of the
k-point parallel version is so different from the sequential one?
I will be thankful for any pointers.
Oleg Rubel
P.S.
Some system details:
CPU(s): Dual Opteron 270 (DualCore 2.0GHz)
Operating System: Debian GNU/Linux v4.0 ("etch")
Queuing System: SUN GridEngine 6.0u9
Compiler: ifort 10.0
Libraries: ScaLAPACK-1.8.0 from netlib; the rest is MKL 10.0
More information about the Wien
mailing list