[Wien] [Extern] Re: RAM issues in lapw1 -bands
Peter Blaha
pblaha at theochem.tuwien.ac.at
Thu Nov 29 14:16:50 CET 2018
This problem is easily solvable (it again means: You MUST READ the UG
(parallelization), otherwise you will run VERY badly).
For such a small problem (14 atoms, matrixsize 2600) it is NOT
necessary (in fact probably even quite bad) to use mpi-parallelization.
Instead use k-parallelization (and maybe export OMP_NUM_THREAD=2).
simply put eg. 24 lines like:
1:localhost
into the .machines file, and you will run 24 parallel lapw1 (each using
2 cores when OMP_NUM_THREAD=2 is set).
--------------
With respect to your other questions:
I don't know what: lapw1para_mpi -p -band is ??
lapw1 should be always invoked using:
x lapw1 -p or x lapw1 -p -band
The difference is just that you are using either case.klist or
case.klist_band. Checkout how many k-points are in these 2 files (250
was just an "input", it seems to have made a 13x13x1 mesh and then still
applied symmetry, so you may have just ~ 30 k-points in case.klist ...)
-----------------
Another question: do you have 48 "physical cores", or only 24 ???
Do you have 2 or 4 Xeons (with 12 cores each) in your computer ??
If you have only 24 "real" cores:
The "virtual cores" which Intel gives you "for free" due to their
"hyperthreading", are usually not very effective. You can at most expect
an improvement of 10-20% when using 48 instead of 24 cores, but
sometimes, this can also degrade performance by 30% because the
memorybus gets overloaded. So test it ....
On 11/29/18 1:10 PM, Coriolan TIUSAN wrote:
> Thanks for the suggestion of dividing the band calculation.
>
> Actually, I would like to make a 'zoom' around the Gamma point (for
> X-G-X direction) with a resolution of about 0.001 Bohr-1 (to get enough
> accuracy for small Rasba splittings, k_0< 0.01 Bohr-1). I guess I could
> simply make the 'zoom' calculation?
>
> The .machines, file, having in view that I have only one node (computer)
> with 48 available CPUs is:
>
> -------------------------------------
>
> 1:localhost:48
> granularity:1
> extrafine:1
> lapw0:localhost:48
> dstart:localhost:48
> nlvdw:localhost:48
>
> --------------------------------------
>
> For a supercell here attached, I was trying to make a bandstructure
> calculations along the X-G-X direction with at least 200 points....which
> corresponds to a step of only 0.005 Bohr-1, not enough for Rashba in
> same order of magnitude.
>
> For my calculations I get: MATRIX SIZE 2606LOs: 138 RKM= 6.99 and the
> RAM of 64Gk is 100% filles plus about 100G of swap...
>
> Beyond all aspects, what I would like to understand is also why in scf
> calculation I have no memory 'overload' FOR 250K POINTS (13 13 1)...
> while when running 'lapw1para_mpi -p -band ' the memory issue seem more
> dramatic?
>
> If necessary, my struct file is:
>
> ------------------
>
> VFeMgO-vid s-o calc. M|| 1.00 0.00 0.00
> P 14
> RELA
> 5.725872 5.725872 61.131153 90.000000 90.000000 90.000000
> ATOM -1: X=0.50000000 Y=0.50000000 Z=0.01215444
> MULT= 1 ISPLIT= 8
> V 1 NPT= 781 R0=.000050000 RMT= 2.18000 Z: 23.00000
> LOCAL ROT MATRIX: 1.0000000 0.0000000 0.0000000
> 0.0000000 1.0000000 0.0000000
> 0.0000000 0.0000000 1.0000000
> ATOM -2: X=0.00000000 Y=0.00000000 Z=0.05174176
> MULT= 1 ISPLIT= 8
> V 2 NPT= 781 R0=.000050000 RMT= 2.18000 Z: 23.00000
> LOCAL ROT MATRIX: 1.0000000 0.0000000 0.0000000
> 0.0000000 1.0000000 0.0000000
> 0.0000000 0.0000000 1.0000000
> ATOM -3: X=0.50000000 Y=0.50000000 Z=0.09885823
> MULT= 1 ISPLIT= 8
> V 3 NPT= 781 R0=.000050000 RMT= 2.18000 Z: 23.00000
> LOCAL ROT MATRIX: 1.0000000 0.0000000 0.0000000
> 0.0000000 1.0000000 0.0000000
> 0.0000000 0.0000000 1.0000000
> ATOM -4: X=0.00000000 Y=0.00000000 Z=0.13971867
> MULT= 1 ISPLIT= 8
> Fe1 NPT= 781 R0=.000050000 RMT= 1.95000 Z: 26.00000
> LOCAL ROT MATRIX: 1.0000000 0.0000000 0.0000000
> 0.0000000 1.0000000 0.0000000
> 0.0000000 0.0000000 1.0000000
> ATOM -5: X=0.50000000 Y=0.50000000 Z=0.18164479
> MULT= 1 ISPLIT= 8
> Fe2 NPT= 781 R0=.000050000 RMT= 1.95000 Z: 26.00000
> LOCAL ROT MATRIX: 1.0000000 0.0000000 0.0000000
> 0.0000000 1.0000000 0.0000000
> 0.0000000 0.0000000 1.0000000
> ATOM -6: X=0.00000000 Y=0.00000000 Z=0.22284885
> MULT= 1 ISPLIT= 8
> Fe3 NPT= 781 R0=.000050000 RMT= 1.95000 Z: 26.00000
> LOCAL ROT MATRIX: 1.0000000 0.0000000 0.0000000
> 0.0000000 1.0000000 0.0000000
> 0.0000000 0.0000000 1.0000000
> ATOM -7: X=0.50000000 Y=0.50000000 Z=0.26533335
> MULT= 1 ISPLIT= 8
> Fe4 NPT= 781 R0=.000050000 RMT= 1.95000 Z: 26.00000
> LOCAL ROT MATRIX: 1.0000000 0.0000000 0.0000000
> 0.0000000 1.0000000 0.0000000
> 0.0000000 0.0000000 1.0000000
> ATOM -8: X=0.00000000 Y=0.00000000 Z=0.30245527
> MULT= 1 ISPLIT= 8
> Fe5 NPT= 781 R0=.000050000 RMT= 1.95000 Z: 26.00000
> LOCAL ROT MATRIX: 1.0000000 0.0000000 0.0000000
> 0.0000000 1.0000000 0.0000000
> 0.0000000 0.0000000 1.0000000
> ATOM -9: X=0.00000000 Y=0.00000000 Z=0.36627712
> MULT= 1 ISPLIT= 8
> O 1 NPT= 781 R0=.000100000 RMT= 1.68000 Z: 8.00000
> LOCAL ROT MATRIX: 1.0000000 0.0000000 0.0000000
> 0.0000000 1.0000000 0.0000000
> 0.0000000 0.0000000 1.0000000
> ATOM -10: X=0.50000000 Y=0.50000000 Z=0.36416415
> MULT= 1 ISPLIT= 8
> Mg1 NPT= 781 R0=.000100000 RMT= 1.87000 Z: 12.00000
> LOCAL ROT MATRIX: 1.0000000 0.0000000 0.0000000
> 0.0000000 1.0000000 0.0000000
> 0.0000000 0.0000000 1.0000000
> ATOM -11: X=0.50000000 Y=0.50000000 Z=0.43034285
> MULT= 1 ISPLIT= 8
> O 2 NPT= 781 R0=.000100000 RMT= 1.68000 Z: 8.00000
> LOCAL ROT MATRIX: 1.0000000 0.0000000 0.0000000
> 0.0000000 1.0000000 0.0000000
> 0.0000000 0.0000000 1.0000000
> ATOM -12: X=0.00000000 Y=0.00000000 Z=0.43127365
> MULT= 1 ISPLIT= 8
> Mg2 NPT= 781 R0=.000100000 RMT= 1.87000 Z: 12.00000
> LOCAL ROT MATRIX: 1.0000000 0.0000000 0.0000000
> 0.0000000 1.0000000 0.0000000
> 0.0000000 0.0000000 1.0000000
> ATOM -13: X=0.00000000 Y=0.00000000 Z=0.49684798
> MULT= 1 ISPLIT= 8
> O 3 NPT= 781 R0=.000100000 RMT= 1.68000 Z: 8.00000
> LOCAL ROT MATRIX: 1.0000000 0.0000000 0.0000000
> 0.0000000 1.0000000 0.0000000
> 0.0000000 0.0000000 1.0000000
> ATOM -14: X=0.50000000 Y=0.50000000 Z=0.49541730
> MULT= 1 ISPLIT= 8
> Mg3 NPT= 781 R0=.000100000 RMT= 1.87000 Z: 12.00000
> LOCAL ROT MATRIX: 1.0000000 0.0000000 0.0000000
> 0.0000000 1.0000000 0.0000000
> 0.0000000 0.0000000 1.0000000
> 4 NUMBER OF SYMMETRY OPERATIONS
> -1 0 0 0.00000000
> 0 1 0 0.00000000
> 0 0 1 0.00000000
> 1 A 1 so. oper. type orig. index
> 1 0 0 0.00000000
> 0 1 0 0.00000000
> 0 0 1 0.00000000
> 2 A 2
> -1 0 0 0.00000000
> 0-1 0 0.00000000
> 0 0 1 0.00000000
> 3 B 3
> 1 0 0 0.00000000
> 0-1 0 0.00000000
> 0 0 1 0.00000000
> 4 B 4
> ---------------------------
>
>
> La 29/11/2018 13:05, Peter Blaha a scris:
>> You never listed your .machines file, nor do we know how many k-points
>> are in the scf and the bandstructure cases and what the matrix
>> size(:RKM)/ real/ complex details are.
>>
>> The memory leakage of intels mpi seems to be very version dependent,
>> but there's nothing we can do against from the wien2k side.
>>
>> Besides installing a different mpi version, one could more easily run
>> the bandstructure in pieces. Simply divide your klist_band file into
>> several pieces and calculate one after the other.
>>
>> The resulting case.outputso_1,2,3.. files can simply be concatenated
>> (cat file1 file2 file3 > file) together.
>>
>>
>>
>> On 11/28/18 1:41 PM, Coriolan TIUSAN wrote:
>>> Dear wien2k users,
>>>
>>> I am running wien 18.2 on Ubuntu 18.04 , installed on a HP station:
>>> 64GB, Intel® Xeon(R) Gold 5118 CPU @ 2.30GHz × 48.
>>>
>>> The fortran compiler/math library are ifc and intel mkl library. For
>>> parallel execution I have MPI+SCALAPACK, FFTW.
>>>
>>> For parallel execution (-p options +.machines), I have dimensioned
>>> NMATMAX/NUME according to user guide. Therefore, standard
>>> calculations in SCF loops turn well, without any memory paging
>>> issues, about 90% of physical RAM being used.
>>>
>>> However, in supercells, once getting case.vector files, when
>>> calculating bands (lapw1c -bands -p) with fine k structure (e.g.
>>> above 150-200k on line X-G-X), necessary because I am looking to
>>> small Rashba shifts at metel-insulator interfaces...all available
>>> physical memory plus a huge amount of swap (>100G) are filled/used...
>>>
>>> Any suggestion/ideea for overcoming this issue...without adding
>>> additional RAM?
>>>
>>> Why in lapw1 -p for selfonsistance memory looks enough while with
>>> switch -band overload memory?
>>>
>>> With thanks in advance,
>>>
>>> C. Tiusan
>>>
>>>
>>>
>>> _______________________________________________
>>> Wien mailing list
>>> Wien at zeus.theochem.tuwien.ac.at
>>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>>> SEARCH the MAILING-LIST at:
>>> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>>
--
P.Blaha
--------------------------------------------------------------------------
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-165300 FAX: +43-1-58801-165982
Email: blaha at theochem.tuwien.ac.at WIEN2k: http://www.wien2k.at
WWW: http://www.imc.tuwien.ac.at/TC_Blaha
--------------------------------------------------------------------------
More information about the Wien
mailing list