<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">

<meta name="Generator" content="Microsoft Exchange Server">

<!-- converted from text --><style><!-- .EmailQuote { margin-left: 1pt; padding-left: 4pt; border-left: #800000 2px solid; } --></style>

</head>

<body>

<meta content="text/html; charset=UTF-8">

<style type="text/css" style="">

<!--

p

        {margin-top:0;

        margin-bottom:0}

-->

</style>

<div dir="ltr">

<div id="x_divtagdefaultwrapper" dir="ltr" style="font-size:12pt; color:#000000; font-family:Calibri,Helvetica,sans-serif">

<p>Thanks for the suggestion regarding the .processes file; this will probably come in handy at a later stage. Regarding the qtl program, my end goal is to calculate an ELNES spectrum for the structures I am investigating. To this end, is there any difference

 between running 'x lapw2 -p -qtl' and 'x qtl -p -telnes' (assuming case.innes is present)? Specifically, will the workflow:</p>

<p><br>

</p>

<p>script 1:<br>

</p>

<p>run_lapw -p</p>

<p>x -qtl -telnes<br>

</p>

<p>x telnes3<br>

</p>

<p><br>

</p>

<p>perform the same task as this:</p>

<p><br>

</p>

<p>script 1:<br>

</p>

<p>run_lapw</p>

<p><br>

</p>

<p>followed by script 2:<br>

</p>

<p>lapw1 -p -d >&/dev/null</p>

<p>lapw2 -p -qtl</p>

<div>x telnes3</div>

<div><br>

</div>

<div>My question is both related to the parallellization schemes (assuming I use the same setup of nodes and generate the .machines file in the same way) and related to the behaviour of the programs (will lapw2 -p -qtl calculate suitable input files for telnes3,

 assuming case.innes is present)? I realize that telnes3 can probably be run locally, but I included the command for completeness.</div>

<div><br>

</div>

<div>Best regards</div>

<div>Christian<br>

</div>

</div>

<hr tabindex="-1" style="display:inline-block; width:98%">

<div id="x_divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" color="#000000" style="font-size:11pt"><b>Fra:</b> Wien <wien-bounces@zeus.theochem.tuwien.ac.at> på vegne af Peter Blaha <pblaha@theochem.tuwien.ac.at><br>

<b>Sendt:</b> 12. oktober 2020 11:58:22<br>

<b>Til:</b> wien@zeus.theochem.tuwien.ac.at<br>

<b>Emne:</b> Re: [Wien] .machines for several nodes</font>

<div> </div>

</div>

</div>

<font size="2"><span style="font-size:10pt;">

<div class="PlainText">Yes, this is ok when your have nodes with 16 cores !!!<br>

<br>

(Only the lapw0 line could use :16 instead of 8 if you have 96 atoms, <br>

but most likely this is fairly negligible).<br>

<br>

Yes, the QTL calculation in lapw2 is also affected by the <br>

parallelization. but it reads from a .processes file, which is created <br>

by lapw1.<br>

<br>

If you run     x lapw2 -p -qtl   in an extra job, you should add the <br>

following line to create a "correct" .processes file:<br>

<br>

x lapw1 -p -d >&/dev/null  # Create .processes (necessary for <br>

standalone-lapw2)<br>

<br>

On 10/12/20 11:45 AM, Christian Søndergaard Pedersen wrote:<br>

> This went a long way towards clearing up my confusion, thanks again. I <br>

> will try starting an MPI-parallel calculations for 4 nodes with 16 cores <br>

> each using the following .machines-file:<br>

> <br>

> 1:g008:16<br>

> 1:g021:16<br>

> 1:g025:16<br>

> 1:g028:16<br>

> lapw0: g008:8 g021:8 g025:8 g028:8<br>

> <br>

> dstart: g008:8 g021:8 g025:8 g028:8<br>

> <br>

> <br>

> ... and see how it performs. If the matrix sizes are small, I understand <br>

> that I could also have each node work on 2 (or more) k-points at the <br>

> same time, by specifying:<br>

> <br>

> <br>

> 1:g008:8<br>

> 1:g008:8<br>

> 1:g021:8<br>

> 1:g021:8<br>

> 1:g025:8<br>

> 1:g025:8<br>

> 1:g028:8<br>

> 1:g028:8<br>

> <br>

> so that for instance g008 will work on 2 kpoints using 8 cores for each <br>

> k point, am I right? And a (hopefully) final question, since qtl <br>

> according to the manual runs in k-point parallel, is it also affected by <br>

> the parallellization scheme specified for lapw1 and lapw2 (unless I <br>

> deliberately change it)?<br>

> <br>

> <br>

> <br>

> ------------------------------------------------------------------------<br>

> *Fra:* Wien <wien-bounces@zeus.theochem.tuwien.ac.at> på vegne af Ruh, <br>

> Thomas <thomas.ruh@tuwien.ac.at><br>

> *Sendt:* 12. oktober 2020 10:59:09<br>

> *Til:* A Mailing list for WIEN2k users<br>

> *Emne:* Re: [Wien] .machines for several nodes<br>

> <br>

> I am afraid, there is still some confusion.<br>

> <br>

> <br>

> First about /lapw1/:<br>

> <br>

> Sorry for my unclear statement - I meant that you need one line per <br>

> k-parallel job in the sense that #lines k-points are run simultaneously, <br>

> i. e. if you speficify this part of the machines file like this:<br>

> <br>

> <br>

> 1:g008:16<br>

> <br>

> 1:g021:16<br>

> <br>

> 1:g025:16<br>

> <br>

> 1:g028:16<br>

> <br>

> <br>

> your k-point list will be split into 4 parts of 56 k-points each [1] , <br>

> which will be processed step-by-step. Node g008 will work in its first <br>

> k-point, while node g021 will do the same for its first k-point, and so on<br>

> <br>

> You need the ":16" after the name of the node. Otherwise, on every node <br>

> only *one* core would be used. If it is useful to use 16 mpi-parallel <br>

> jobs per k-point (meaning that the matrices will distributed on 16 cores <br>

> with each core getting only 1/16 of the matrix elements) depends on your <br>

> matrix sizes (which in turn depend on your rkmax). You should check that <br>

> by grepping :rkm in your case.scf file. If the matrix size there is <br>

> small, using OMP_NUM_THREADS 16 might be much faster (since MPI adds <br>

> overhead to your calculation).<br>

> <br>

> <br>

> <br>

> Regarding /lapw0/dstart/:<br>

> <br>

> The way you set the calculation up could lead to (possible severe) <br>

> overloading of your nodes: WIEN2k will start 24 jobs on each node (so <br>

> 1.5 times the number of cores) at the same time doing the calculation <br>

> for 1 atom each.<br>

> <br>

> As one possible alternative, you specify only 8 cores per node (i.e. for <br>

> example "lapw0: g008:8" and so on) 8 jobs per node, which would lead to <br>

> step-by-step calculations for 3 atoms per core.<br>

> <br>

> Which option is faster is hard to tell and depends a lot on your hardware.<br>

> <br>

> <br>

> So what you could do - in principle - is to test multiple configurations <br>

> (you can modify your .machines file on the fly during a SCF run) in the <br>

> first cycles, compare the times (in case.dayfile), and use the faster <br>

> one for the rest of the run.<br>

> <br>

> <br>

> <br>

> Regards,<br>

> Thomas<br>

> <br>

> <br>

> [1] Sidenote: This splitting is controlled by the first number - in this <br>

> case 4 equal sublists will be set-up - you could also specifiy different <br>

> "weights", for instance, if your nodes are of different speeds, the <br>

> machinesfile could then read for example:<br>

> <br>

> <br>

> 3:g008:16<br>

> <br>

> 2:g021:16<br>

> <br>

> 2:g025:16<br>

> <br>

> 1:g028:16<br>

> <br>

> <br>

> In this case, the first node would "get" 3/8 of the k-points (84), nodes <br>

> g021 and g025 would geht 2/8 each (56), and the last one (because it is <br>

> very slow) would get only 28 k-points.<br>

> <br>

> <br>

> ------------------------------------------------------------------------<br>

> *Von:* Wien <wien-bounces@zeus.theochem.tuwien.ac.at> im Auftrag von <br>

> Christian Søndergaard Pedersen <chrsop@dtu.dk><br>

> *Gesendet:* Montag, 12. Oktober 2020 10:24<br>

> *An:* A Mailing list for WIEN2k users<br>

> *Betreff:* Re: [Wien] .machines for several nodes<br>

> <br>

> Thanks a lot for your answer. After re-reading the relevant pages in the <br>

> User Guide, I am still left with some questions. Specifically, I am <br>

> working with a system containing 96 atoms (as described in the <br>

> case.struct-file) and 224 inequivalent k points; i.e. 500 kpoints <br>

> distributed as a 7x8x8 grid (448 total) reduced to 224 kpoints. Running <br>

> on 4 nodes each with 16 cores, I want each of the 4 nodes to calculate <br>

> 56 k points (224/4 = 56). Meanwhile, each node should handle 24 atoms <br>

> (96/4 = 24).<br>

> <br>

> <br>

> Part of my confusion stems from your suggestion that I repeat the line <br>

> "1:g008:4 [...]" a number of times equal to the number of k points I <br>

> want to run in parallel, and that each repetition should refer to a <br>

> different node. The reason is that the line in question already contains <br>

> the names of all four nodes that were assigned to the job. However, <br>

> combining your advice with the example on page 86, the lines should read:<br>

> <br>

> <br>

> 1:g008<br>

> <br>

> 1:g021<br>

> <br>

> 1:g025<br>

> <br>

> 1:g028 # k points distributed over 4 jobs, running on 1 node each<br>

> <br>

> extrafine:1<br>

> <br>

> <br>

> As for the parallellization over atoms for dstart and lapw0, I <br>

> understand that the numbers assigned to each individual node should sum <br>

> up to the number of atoms in the system, like this:<br>

> <br>

> <br>

> dstart:g008:24 g021:24 g025:24 g028:24<br>

> <br>

> lapw0:g008:24 g021:24 g025:24 g028:24<br>

> <br>

> <br>

> so the final .machines-file would be a combination of the above pieces. <br>

> Have I understood this correctly, or am I missing the mark? Also, is <br>

> there any difference between distributing the k points across four jobs <br>

> (1 for each node), and across 224 jobs (by repeating each of the 1:gxxx <br>

> lines 56 times)?<br>

> <br>

> <br>

> Best regards<br>

> <br>

> Christian<br>

> <br>

> ------------------------------------------------------------------------<br>

> *Fra:* Wien <wien-bounces@zeus.theochem.tuwien.ac.at> på vegne af Ruh, <br>

> Thomas <thomas.ruh@tuwien.ac.at><br>

> *Sendt:* 12. oktober 2020 09:29:37<br>

> *Til:* A Mailing list for WIEN2k users<br>

> *Emne:* Re: [Wien] .machines for several nodes<br>

> <br>

> Hi,<br>

> <br>

> <br>

> your .machines is wrong.<br>

> <br>

> <br>

> The nodes for /lapw1 /are prefaced not with "lapw1:" but only with "1:". <br>

> /lapw2 /needs no line, as it takes the same nodes as lapw1 before.<br>

> <br>

> <br>

> So an example for your usecase would be:<br>

> <br>

> <br>

> #<br>

> <br>

> dstart:g008:4 g021:4 g025:4 g028:4<br>

> <br>

> lapw0:g008:4 g021:4 g025:4 g028:4<br>

> <br>

> 1:g008:4 g021:4 g025:4 g028:4<br>

> <br>

> granularity:1<br>

> <br>

> extrafine:1<br>

> <br>

> <br>

> The line starting with "1:" has to be repeated (with different nodes, of <br>

> course) x times, if you want to run x k-points in parallel (you can find <br>

> more details about this in the usersguide, pages 84-91).<br>

> <br>

> <br>

> Regards,<br>

> <br>

> Thomas<br>

> <br>

> <br>

> PS: As a sidenote: Both /dstart /and /lapw0 /parallelize over atoms, so <br>

> 16 nodes might not be the best choice for your example.<br>

> <br>

> ------------------------------------------------------------------------<br>

> *Von:* Wien <wien-bounces@zeus.theochem.tuwien.ac.at> im Auftrag von <br>

> Christian Søndergaard Pedersen <chrsop@dtu.dk><br>

> *Gesendet:* Montag, 12. Oktober 2020 09:06<br>

> *An:* wien@zeus.theochem.tuwien.ac.at<br>

> *Betreff:* [Wien] .machines for several nodes<br>

> <br>

> Hello everybody<br>

> <br>

> <br>

> I am new to WIEN2k, and am struggling with parallellizing calculations <br>

> on our HPC cluster beyond what can be achieved using OMP. In particular, <br>

> I want to execute run_lapw and/or runsp_lapw running on four identical <br>

> nodes (16 cores each), parallellizing over k points (unless there's a <br>

> more efficient scheme). To achieve this, I try to mimic the example from <br>

> the User Guide (without the extra Alpha node), but my .machines-file <br>

> does not work the way I intended. This is what I have:<br>

> <br>

> <br>

> #<br>

> <br>

> dstart:g008:4 g021:4 g025:4 g028:4<br>

> <br>

> lapw0:g008:4 g021:4 g025:4 g028:4<br>

> <br>

> lapw1:g008:4 g021:4 g025:4 g028:4<br>

> <br>

> lapw2:g008:4 g021:4 g025:4 g028:4<br>

> <br>

> granularity:1<br>

> <br>

> extrafine:1<br>

> <br>

> <br>

> The node names gxxx are read from SLURM_JOB_NODELIST in the submit <br>

> script, and a couple of regular expressions generate the above lines. <br>

> Afterwards, my job script does the following:<br>

> <br>

> <br>

> srun hostname -s > slurm.hosts<br>

> run_lapw -p<br>

> <br>

> which results in a job that idles for the entire walltime and finishes <br>

> with a CPU efficiency of 0.00%. I would appreciate any help in figuring <br>

> out where I've gone wrong.<br>

> <br>

> <br>

> Best regards<br>

> Christian<br>

> <br>

> <br>

> _______________________________________________<br>

> Wien mailing list<br>

> Wien@zeus.theochem.tuwien.ac.at<br>

> <a href="http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien">http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien</a><br>

> SEARCH the MAILING-LIST at:  <a href="http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html">

http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html</a><br>

> <br>

<br>

-- <br>

<br>

                                       P.Blaha<br>

--------------------------------------------------------------------------<br>

Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna<br>

Phone: +43-1-58801-165300             FAX: +43-1-58801-165982<br>

Email: blaha@theochem.tuwien.ac.at    WIEN2k: <a href="http://www.wien2k.at">http://www.wien2k.at</a><br>

WWW:   <a href="http://www.imc.tuwien.ac.at/TC_Blaha">http://www.imc.tuwien.ac.at/TC_Blaha</a><br>

--------------------------------------------------------------------------<br>

_______________________________________________<br>

Wien mailing list<br>

Wien@zeus.theochem.tuwien.ac.at<br>

<a href="http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien">http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien</a><br>

SEARCH the MAILING-LIST at:  <a href="http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html">

http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html</a><br>

</div>

</span></font>

</body>

</html>