[Wien] Fwd: Re: MPI stuck at lapw0
Peter Blaha
pblaha at theochem.tuwien.ac.at
Tue Nov 7 16:58:12 CET 2017
The different runtime fractions for small and large systems is due to
the scaling of the time.
lapw0 scales basically linear with the number of atoms, but lapw1 scales
cubically with the basisset.
And here is the second problem: for your nanowire you get a matix size
of about 130000x130000, and this for just 97 atoms.
It is not the number of atoms which is determining the memory, but the
plane wave basis set. This info is printed in the :RKM line of the scf
file and you can even get it using
x lapw1 -nmat_only
So your cell dimensions / RMT settings must be very bad. Remember: Also
"vacuum" costs a lot in plane wave methods. You have to optimize your
RMT and reduce cell parameters (vacuum).
lapw2: you can set a line in .machines:
lapw2_vector_split:4 (or 8 or 16)
which will reduce the memory consumption of lapw2.
On 11/07/2017 04:09 PM, Luigi Maduro - TNW wrote:
>>>There are 2 different things:
>
>
>
>>>lapw0para executes:
>
>>> $remote $machine "cd $PWD;$t $exe $def.def"
>
>
>
>>>where $remote is either ssh or rsh (depending on your configuration setup)
>
>
>
>>>once this is defined, it goes to the remote node and executes
>
>>>$exe, which usually refers to mpirun
>
>
>
>>>mpirun is a script on your system, and it may acknowledge this
>
>>>I_MPI_HYDRA_BOOTSTRAP=rsh variable, while by default it seems to do ssh (even if your system does not support this). WIEN2k does not know about such variable and assumes that a plain mpirun will do the correct thing. The sysadmin should >>setup the
> system such that rsh is used by default with mpirun, or should tell
> people, which mpi-commands/variables they should set.
>
>
>
>>>PS: I do not quite understand how it can happen that you get rsh in lapw1para, but ssh in lapw0para ??
>
> I do not understand either, because when I check the lapw2para script I
> see that “set remote = rsh”
>
>
>
>
>
> I have a couple of questions concerning the parallel version of WIEN2k,
> one concerning insufficient virtual memory and the other concerning lapw1.
>
> I’ve been trying to do simulations of MoS2 in two types of
> configurations. One is a monolayer calculation (4x4x1 unit cells) with
> 48 atoms,
>
> and another calculation deals with a “nanowire” (13x2x1 unit cells) with
> 97 atoms.
>
>
>
> For the 4x4x1 unit cell I have an rkmax of 6.0 and a 10 k-point mesh.
> For the calculation I used 2 nodes and 20 processors per node (so 40 in
> total).
> The command run is: run_lapw –p –nlvdw –ec 0.0001.
>
> What I noticed is that both lapw1 and nlvdw take a long time to run.
> Lapw0 takes about a minute, as does lapw2. Lapw1 and nlvdw take about
> 16-19 minutes to run.
> When I log into the nodes and use the ‘top’ command to check the CPU% I
> see that all processors are at 100%, however I’ve been notified that
> only 2% of the requested CPU time is actually used.
>
> I don’t really understand why there is such a big discrepancy of the
> computation time between lapw1 and lapw2. In smaller calculations lapw1
> and lapw2 are in the same order of magnitude in computation time.
>
>
>
>
>
>
>
> For the nanowire calculation I chose an rkmax of 6.0 and a single
> k-point and only used LDA because I want to compare LDA with NLVDW later
> on. I always get an “forrtl: severe (41): insufficient virtual memory”
> error at lapw1 or lapw2 at the first SCF cycle no matter the amount of
> nodes I request, from 1 node to 20 nodes.
>
> Each time I requested 20 processors per node. Only with the 20 nodes and
> 20 processors did the SCF cycle make it to lapw2, but it crashed not
> long after reaching lapw2. Each node is equipped with 128 Gb of memory,
> and the end of output1_1 looks like this:
>
>
>
> MPI-parallel calculation using 400 processors
>
> Scalapack processors array (row,col): 20 20
>
> Matrix size 136632
>
> Nice Optimum Blocksize 112 Excess % 0.000D+00
>
> allocate H 712.2 MB dimensions 6832 6832
>
> allocate S 712.2 MB dimensions 6832 6832
>
> allocate spanel 11.7 MB dimensions 6832 112
>
> allocate hpanel 11.7 MB dimensions 6832 112
>
> allocate spanelus 11.7 MB dimensions 6832 112
>
> allocate slen 5.8 MB dimensions 6832 112
>
> allocate x2 5.8 MB dimensions 6832 112
>
> allocate legendre 75.9 MB dimensions 6832 13 112
>
> allocate al,bl (row) 2.3 MB dimensions 6832 11
>
> allocate al,bl (col) 0.0 MB dimensions 112 11
>
> allocate YL 1.7 MB dimensions 15 6832 1
>
> Time for al,bl (hamilt, cpu/wall) : 14.7 14.7
>
> Time for legendre (hamilt, cpu/wall) : 4.1 4.1
>
> Time for phase (hamilt, cpu/wall) : 29.7 30.2
>
> Time for us (hamilt, cpu/wall) : 38.8 39.2
>
> Time for overlaps (hamilt, cpu/wall) : 115.6 116.3
>
> Time for distrib (hamilt, cpu/wall) : 0.3 0.3
>
> Time sum iouter (hamilt, cpu/wall) : 203.5 205.7
>
> number of local orbitals, nlo (hamilt) 749
>
> allocate YL 33.4 MB dimensions 15136632 1
>
> allocate phsc 2.1 MB dimensions136632
>
> Time for los (hamilt, cpu/wall) : 0.4 0.4
>
> Time for alm (hns) : 1.0
>
> Time for vector (hns) : 7.2
>
> Time for vector2 (hns) : 6.8
>
> Time for VxV (hns) : 114.8
>
> Wall Time for VxV (hns) : 1.2
>
> Scalapack Workspace size 100.38 and 804.35 Mb
>
>
>
> Any help is appreciated.
> Kind regards,
> Luigi
>
--
P.Blaha
--------------------------------------------------------------------------
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-165300 FAX: +43-1-58801-165982
Email: blaha at theochem.tuwien.ac.at WIEN2k: http://www.wien2k.at
WWW: http://www.imc.tuwien.ac.at/TC_Blaha
--------------------------------------------------------------------------
More information about the Wien
mailing list