[Wien] RAM usage for large k-list of a big slab
pluto
pluto at physics.ucdavis.edu
Mon Jun 10 13:49:37 CEST 2024
Dear Prof. Blaha,
Thank you for the response and suggestion!
I will test with granularity it will take a while.
My i9 machine is a standalone Linux desktop, therefore, I assume there
is no issue with global scratch, because everything will be saved to the
"case" directory?
Best,
Lukasz
PS: I noticed that at these standalone consumer Intel machines faster
RAM makes quite a difference in speed. There seems to be noticeable
speed increase between DDR-5 5600 and 7200 (something like 44 sec vs. 37
sec in the test run). I think this has to do with only 2 memory channels
in consumer desktop machines, with RAM speed often being the bottleneck
despite CPU having 8 performance cores. WIEN2k test run that took around
60s at i7 13700K with DDR-4 3600 takes around 37 sec with i9 14900K wit
DDR-5 7200 (same WIEN2k compilation). Typically 4x localhost and 2x OMP
is the fastest. And I think temperature throttling perhaps also plays
some role at longer heavy runs.
On 2024-06-10 09:54, Peter Blaha wrote:
> Yes, I can confirm that RAM increases from k-point to k-point in lapw1
> using ifort+mkl.
>
> However, this happens ONLY with ifort-mkl, not with
> gfortran+openblas.
>
> Thus I conclude it is an "mkl-problem".
>
> However, switching to gfortran+openblas is not the best solution,
> because it seems that on the latest Intel I9-cores, the mkl is
> fundamentally faster than openblas (below with omp_lapw:8,but it
> happens also without omp):
> gfortran:
> TIME HAMILT (CPU) = 23.2, HNS = 31.9, HORB = 0.0,
> DIAG = 144.3, SYNC = 0.0
> TIME HAMILT (WALL) = 3.1, HNS = 5.8, HORB = 0.0,
> DIAG = 47.8, SYNC = 0.0
> ifort:
> TIME HAMILT (CPU) = 29.4, HNS = 43.6, HORB = 0.0,
> DIAG = 89.4, SYNC = 0.0
> TIME HAMILT (WALL) = 3.7, HNS = 5.5, HORB = 0.0,
> DIAG = 20.5, SYNC = 0.0
>
> As you can see from these numbers, gfortran is as good (or even
> better) as ifort for hamilt and hns, but the diagonalization with
> openblas is much slower.
>
> The solution to your problem, is however, quite simple. Use
> granularity:2 (or 3) in your .machines file (you have to use a
> global scratch directory, i.e. SCRATCH=./).
> This will not span 4 lapw1 jobs with 650 k-points each, but decomposes
> the k-list further, so that each lapw1 run calculates less k-points
> (use testpara to check, but note that there will still be max 4 jobs
> run at the same time). This way, the memory increase can be limited.
>
> Best regards
> Peter Blaha
>
>
>
>
> Am 07.06.2024 um 09:27 schrieb pluto via Wien:
>> Dear All,
>>
>> I would appreciate if you could comment on the RAM use during the band
>> calculation.
>>
>> I attach a graph of RAM use over several hours (this is now on i9
>> 14900k with approx. 110 GB of RAM). This is during the calculation of
>> 51x51=2601 k-points for a very large slab (60 non-equivalent atoms).
>> This is running x lapw1 -band -up -p with 4x localhost, and without
>> omp:
>>
>> Wed Jun 5 01:25:27 PM CEST 2024> (x) lapw1 -band -up -p
>>
>> You can see that at some point swap activates, then actually after a
>> some time one of the localhost runs crashes.
>>
>> Is the behavior normal? Can something be changed by adjusting some
>> settings? Possible problem with the WIEN2k compilation or with this
>> particular calculation?
>>
>> Best,
>> Lukasz
>>
>> _______________________________________________
>> Wien mailing list
>> Wien at zeus.theochem.tuwien.ac.at
>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>> SEARCH the MAILING-LIST at:
>> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
More information about the Wien
mailing list