[Wien] optimize_abc continuation run

Peter Blaha pblaha at theochem.tuwien.ac.at
Fri Feb 25 23:03:07 CET 2022


Of course, the syntax is the same for .machines and .machines_x

optimize_abc uses the following strategy:

scf-cycle for a0b0c0  ,   uses  .machines

parallel calculation of 9 cases, namely
a0+-delta,b0,c0   (2 sequential scf cycles each)
a0,b0+-delta,c0
.....

Each of these 9 cases uses its own .machines_x file (because they run in 
parallel, these .machines_x files should use different cores.

So if you have 256 nodes, use in

.machines   as many nodes as is efficient (depends on k-list and 
cpu-time, note that more cores does NOT always mean shorter run time. 
You have to find out the optimum from case.dayfile). In many cases it 
could be that you should NOT use all cores.

.machines_x     distribute the cores to these 9 files.

Some more general hints:
how many atoms do you have in the unit cell ? a first guess is to use

lapw0:node1:XX node2:XX     where 2*XX is equal or a bit larger than NAT
                             why only 4 cores in your example ??

lapw1 was running almost 18 hours ? If this is true, massive 
parallelization should be possible. So how many k-points are you using ?
What says :rkm  (matrix size).
Depending on this, 16 k-parallelization with 16 mpi-cores each (as in 
your example) might be possible in .machines, but consider: if you have 
17 k-points, 16 k-parallel job would be VERY inefficient....

For the 9  .machines_x  files, you have to split the 256 cores.
One possibility is to use 2 k-parallel lines with 16 mpi each. This 
would lead to a slight overload (18*16 cores necessary) - if your system 
allows it, it is ok, but some queuing systems may complain !!!, or use 
only 1 k-parallel job, but with eg. 25 mpi-cores (using 9*25 cores), .....

If the sequential lapw1 took 18h, a good parallelization on an efficient 
256 core machine should bring a factor of 150-200, 5-10 minutes for 
lapw1 on the a0b0c0 case and 9 times longer for the other parallel cases.

I hope you have ELPA !!!!

Hope this helps and you can do it in 24 h (otherwise reduce the number 
of k-points and use more mpi-cores ....)

Peter

Am 25.02.2022 um 21:23 schrieb pboulet:
> Dear Peter,
> 
> Well, now with the command:
> optimize_abc_lapw -t 3 -n 1 -p -j "run_lapw -p -ec 0.0001 -cc 0.001 »
> 
> I have the weird behaviour that each program seems to be executed on one 
> core only! I asked for 2 nodes/256 cores. In the case.dayfile I can read:
> lapw0   -p  (21:53:37) running lapw0 in single mode
> lapw1  -p           (22:02:30) running lapw1 in single mode
> lapw2 -p            (15:47:14) running in single mode
> 
> I guess the problem is because I have not created the .machines_1..9 files.
> But if so, what should these files contain? The same as .machines (see 
> post scriptum below)?
> 
> Best
> Pascal
> 
> PS. Here is the content of the .machines file:
> # OMP parallelization
> omp_global:1
> #omp_lapw1:1
> #omp_lapw2:1
> #omp_lapwso:1
> #omp_dstart:1
> #omp_sumpara:1
> #omp_nlvdw:1
> 
> # k-point parallelization for lapw1/2 hf lapwso qtl irrep  nmr  optic
> 1:irene4274:16
> 1:irene4274:16
> 1:irene4274:16
> 1:irene4274:16
> 1:irene4274:16
> 1:irene4274:16
> 1:irene4274:16
> 1:irene4274:16
> 1:irene4305:16
> 1:irene4305:16
> 1:irene4305:16
> 1:irene4305:16
> 1:irene4305:16
> 1:irene4305:16
> 1:irene4305:16
> 1:irene4305:16
> 
> # MPI parallelization for dstart lapw0 nlvdw
> dstart: irene4274:4
> lapw0: irene4274:4
> nlvdw: irene4274:4
> 
> granularity:1
> extrafine:1
> 
> 
>> Le 24 févr. 2022 à 18:20, Peter Blaha <pblaha at theochem.tuwien.ac.at 
>> <mailto:pblaha at theochem.tuwien.ac.at>> a écrit :
>>
>> What you need is always to finish a complete "step" (19 scf cycles, 
>> which can be done highly parallel).
>>
>> optimize_abc  -n 1  .....
>>
>> would do this. This command can be repeated until you find convergence.
>> (If it crashes after more than 1 step, you can still continue, but all 
>> calculations after a full step will be lost.
>> If you "see" that a step has finished (parabol_fit done) but it is 
>> clear that another one will not within the time limit, you can kill 
>> the whole job and submit another one.
>>
>> Regards
>> Peter Blaha
>>
>> Am 24.02.2022 um 14:08 schrieb pboulet:
>>> Dear all,
>>> I am optimizing an orthorhombic structure with optimize_abc 
>>> (wien2k_21). As the structure is big I suspect the job will not 
>>> finish before the queue reaches the CPU time limit of 24 hours.
>>> The command I use is:
>>> optimize_abc_lapw -t 3 -p -j "run_lapw -p -ec 0.0001 -cc 0.001 -fc 
>>> 1.0 -min"
>>> Can I continue the run properly with optimize_abc_lapw when the queue 
>>> will stop the job, for instance with the command:
>>> optimize_abc_lapw -t 3 -p -j "run_lapw -p -ec 0.0001 -cc 0.001 -fc 
>>> 1.0 -min"  ?
>>> If not, is there a way?
>>> Thank you for your hints,
>>> Best regards
>>> Pascal
>>> Pascal Boulet
>>>>>> /Professor in computational materials chemistry - DEPARTMENT OF 
>>> CHEMISTRY/
>>> University of Aix-Marseille - Avenue Escadrille Normandie Niemen - 
>>> F-13013 Marseille - FRANCE
>>> Tél: +33(0)4 13 55 18 10 - Fax : +33(0)4 13 55 18 50
>>> Email :pascal.boulet at univ-amu.fr 
>>> <mailto:pascal.boulet at univ-amu.fr><mailto:pascal.boulet at univ-amu.fr 
>>> <mailto:pascal.boulet at univ-amu.fr>>
>>> _______________________________________________
>>> Wien mailing list
>>> Wien at zeus.theochem.tuwien.ac.at <mailto:Wien at zeus.theochem.tuwien.ac.at>
>>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien 
>>> <http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien>
>>> SEARCH the MAILING-LIST at: 
>>> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html <http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html>
>>
>> --
>> --------------------------------------------------------------------------
>> Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
>> Phone: +43-1-58801-165300             FAX: +43-1-58801-165982
>> Email:blaha at theochem.tuwien.ac.at 
>> <mailto:blaha at theochem.tuwien.ac.at>   WIEN2k:http://www.wien2k.at 
>> <http://www.wien2k.at/>
>> WWW: http://www.imc.tuwien.ac.at <http://www.imc.tuwien.ac.at/>
>> -------------------------------------------------------------------------
>> _______________________________________________
>> Wien mailing list
>> Wien at zeus.theochem.tuwien.ac.at <mailto:Wien at zeus.theochem.tuwien.ac.at>
>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien 
>> <http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien>
>> SEARCH the MAILING-LIST at: 
>> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html 
>> <http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html>
> 
> Pascal Boulet
>> /Professor in computational materials chemistry - DEPARTMENT OF CHEMISTRY/
> University of Aix-Marseille - Avenue Escadrille Normandie Niemen - 
> F-13013 Marseille - FRANCE
> Tél: +33(0)4 13 55 18 10 - Fax : +33(0)4 13 55 18 50
> Email : pascal.boulet at univ-amu.fr <mailto:pascal.boulet at univ-amu.fr>
> 
> 
> 
> 
> 
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:  http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

-- 
--------------------------------------------------------------------------
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-165300             FAX: +43-1-58801-165982
Email: blaha at theochem.tuwien.ac.at    WIEN2k: http://www.wien2k.at
WWW:   http://www.imc.tuwien.ac.at
-------------------------------------------------------------------------


More information about the Wien mailing list