[Wien] Wien post from pascal.boulet

Peter Blaha peter.blaha at tuwien.ac.at
Tue Mar 21 08:49:15 CET 2023


> Actually, I only issued the commands:
> x dstart
> optimized.job
> in the submission script to slurm. No more change…
> 
> I did not modify the optimize.job script.

optimize.job contains by default this run_lapw line:

  run_lapw -ec 0.0001   # -p -it -cc 0.01 -fc 1 -min

This means, you are NOT running in parallel, but only in sequential mode 
(on one core only ? - depending on your OMP-setup).
Is this the reason for the long cpu-time ?

> Yes, I did. But one think that is not clear to me: can we do it (-str 
> 0.1) after a first complete SCF cycle has been done? I mean that I ran a 
> SCF and then restarted by including the -str 0.1 -I -i 1 options. Maybe 
> it is wrong.

You can do it, but without -i 1 ! As I said: similar to forces, it will 
check for 3 iterations in the scf file if the stress-criterium is 
fulfilled and then switch the case.in2 file and do one more cycle with 
the full stress tensor.
Of course, you could also manually check the "partial stress", change 
the case.in2 file manually and run just ONE iteration.

> But the problem is the way the computers hours used on my project are 
> counted.
> 
> I have tried with a serial calculation, it takes 4 hours. By the way I 
> use RKM=7.0, Gmax=16, 26 k-points, 11400 PW.
> If I use 128 cores (=1 node) it take 40 minutes/core, so 85 hours. But, 
> whether I use 1 core or 128, the system counts 1 node (or 2 if I between 
> 129 and 256 cores, etc.).  So 4 hours on 1 cores cost 512h on my account!
> So the waste of time depends on which side you consider it. But perhaps 
> the scaling can be improved…
See the comments above. Are you really running in parallel ?

Of course I was not suggesting running to run in serial on one core. In 
most computing centers they will charge per node and not by core.

My suggestion is to use more k-parallelism and less mpi-cores per job.
For 26 k-points you could try eg.   13 k-parallel jobs with either 8 (or 
12) mpi-cores per job. This would mean some under- or overloading of a 
node, but you have to find out what is the fastest way to run for a 
particular system.

At the moment you seem to have a matrix size of 11400 and run in 2 
k-parallel jobs with 64 cores each. Inversion ????

PS: I have a case with NMAT=12400 and 32 k-points. I run it on 4 old 
6-core PCs (costs 8 years ago 1500 Euro each) - so on "24 cores" - with 
4x4 jobs and omp=2 (some overloading) and it takes  10 minutes for lapw1 
to complete.

And don't forget to add the -p  to the run_lapw command ....

-- 
--------------------------------------------------------------------------
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-165300
Email: peter.blaha at tuwien.ac.at    WIEN2k: http://www.wien2k.at
WWW:   http://www.imc.tuwien.ac.at
-------------------------------------------------------------------------


More information about the Wien mailing list