[Wien] RAM usage for large k-list of a big slab

Mon Jun 10 13:49:37 CEST 2024

Dear Prof. Blaha,

Thank you for the response and suggestion!

I will test with granularity it will take a while.

My i9 machine is a standalone Linux desktop, therefore, I assume there 
is no issue with global scratch, because everything will be saved to the 
"case" directory?

Best,
Lukasz

PS: I noticed that at these standalone consumer Intel machines faster 
RAM makes quite a difference in speed. There seems to be noticeable 
speed increase between DDR-5 5600 and 7200 (something like 44 sec vs. 37 
sec in the test run). I think this has to do with only 2 memory channels 
in consumer desktop machines, with RAM speed often being the bottleneck 
despite CPU having 8 performance cores. WIEN2k test run that took around 
60s at i7 13700K with DDR-4 3600 takes around 37 sec with i9 14900K wit 
DDR-5 7200 (same WIEN2k compilation). Typically 4x localhost and 2x OMP 
is the fastest. And I think temperature throttling perhaps also plays 
some role at longer heavy runs.

On 2024-06-10 09:54, Peter Blaha wrote:
> Yes, I can confirm that RAM increases from k-point to k-point in lapw1
> using ifort+mkl.
> 
> However, this happens ONLY  with   ifort-mkl,  not with 
> gfortran+openblas.
> 
> Thus I conclude it is an "mkl-problem".
> 
> However, switching to gfortran+openblas is not the best solution,
> because it seems that on the latest Intel I9-cores, the mkl is
> fundamentally faster than openblas (below with omp_lapw:8,but it
> happens also without omp):
> gfortran:
>        TIME HAMILT (CPU)  =    23.2, HNS =    31.9, HORB =     0.0,
> DIAG =   144.3, SYNC =     0.0
>        TIME HAMILT (WALL) =     3.1, HNS =     5.8, HORB =     0.0,
> DIAG =    47.8, SYNC =     0.0
> ifort:
>        TIME HAMILT (CPU)  =    29.4, HNS =    43.6, HORB =     0.0,
> DIAG =    89.4, SYNC =     0.0
>        TIME HAMILT (WALL) =     3.7, HNS =     5.5, HORB =     0.0,
> DIAG =    20.5, SYNC =     0.0
> 
> As you can see from these numbers, gfortran is as good (or even
> better) as ifort for hamilt and hns, but the diagonalization with
> openblas is much slower.
> 
> The solution to your problem, is however, quite simple.  Use
> granularity:2 (or 3)    in your .machines file (you have to use a
> global scratch directory, i.e. SCRATCH=./).
> This will not span 4 lapw1 jobs with 650 k-points each, but decomposes
> the k-list further, so that each lapw1 run calculates less k-points
> (use testpara to check, but note that there will still be max 4 jobs
> run at the same time). This way, the memory increase can be limited.
> 
> Best regards
> Peter Blaha
> 
> 
> 
> 
> Am 07.06.2024 um 09:27 schrieb pluto via Wien:
>> Dear All,
>> 
>> I would appreciate if you could comment on the RAM use during the band 
>> calculation.
>> 
>> I attach a graph of RAM use over several hours (this is now on i9 
>> 14900k with approx. 110 GB of RAM). This is during the calculation of 
>> 51x51=2601 k-points for a very large slab (60 non-equivalent atoms). 
>> This is running x lapw1 -band -up -p with 4x localhost, and without 
>> omp:
>> 
>> Wed Jun  5 01:25:27 PM CEST 2024> (x) lapw1 -band -up -p
>> 
>> You can see that at some point swap activates, then actually after a 
>> some time one of the localhost runs crashes.
>> 
>> Is the behavior normal? Can something be changed by adjusting some 
>> settings? Possible problem with the WIEN2k compilation or with this 
>> particular calculation?
>> 
>> Best,
>> Lukasz
>> 
>> _______________________________________________
>> Wien mailing list
>> Wien at zeus.theochem.tuwien.ac.at
>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>> SEARCH the MAILING-LIST at:  
>> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html