[Wien] .machines problem

Peter Blaha pblaha at theochem.tuwien.ac.at
Fri Sep 30 09:44:03 CEST 2016


Parallelization has always its limits and you need to do it "sensible". 
In general it is NOT true, that more cores always mean faster execution, 
but it could even slow down the calculations dramatically.

a) Hardware: You say your computers have "8 threads". Do you mean they 
have really 8 cores (like some Xeons), or are these 4 core machines with 
hyperthreading. In the latter case 8 parallel jobs are useless, as 
hyperthreading provides only a logical core, but not a "real" one.
In addition, it is well known that modern multi-core cpus are very often 
"memory-bound", this means, their memory bus is too slow to saturate all 
cores simultaneously. Thus it is often "natural" that a N core job is 
NOT N times as fast as a single core job.
Another factor is disk I/O, which on some systems can become VERY slow 
(over the network or on a single node) the more jobs are running.

b) Software: There is a "multithreading" option with Intel, and setting 
OMP_NUM_THREAD=2 makes lapw1 nearly twice as fast as OMP_NUM_THREAD=1. 
Of course, when using this, you should reduce the number of parallel 
jobs by 2. Check with "top" your cpu usage. When you see "200 %" for an 
lapw1 process, it is this multithreading.
lapw1para: it starts the parallel processes with some "DELAY", otherwise 
this leads to problems on some systems. If for instance DELAY=1, it 
means that spanning 16 lapw1 will take at least 16 seconds. If your 
testcase runs only for 2 seconds/lapw1, you can imagine that you will 
not get any speedup, but a drastic slowdown. If it runs for 5 min, the 
16 seconds are negligible and you should see a speedup from 5 to 2.5 min 
(provided you have enough k-points !, check with "testpara").

It is always good, if you can "watch" your parallel job on the two nodes 
with "top" (in two different windows). You should see how they start, 
how they run (do the get nearly 100 or 200% of the cores most of the 
time), and how they stop (nearly same time, or very unbalanced) ?


On 09/28/2016 03:21 PM, John Rundgren wrote:
> Dear W2k team,
> On my desk are two identical computers alpha and beta of 8 threads each.
>
> How is .machines set up such that k-point parallelization goes twice as
> fast using alpha & beta compared with using single alpha?
>
> Unfortunately, my testing UG 5.5.4 responds with error diagnostics.
>
> When I try the following .machines with and without #,
>   1:alpha
>   #1:beta
>   1:alpha
>   #1:beta
>   1:alpha
>   #1:beta
>   1:alpha
>   #1:beta
>   1:alpha
>   #1:beta
>   1:alpha
>   #1:beta
>   1:alpha
>   #1:beta
>   1:alpha
>   #1:beta
>   granularity:1
>   extrafine:1
> computing time comes out similar in both cases. I would like to see
> sixteen threads executing twice as fast as eight.
>
> Regards,
> John Rundgren KTH
>
>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

-- 

                                       P.Blaha
--------------------------------------------------------------------------
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-165300             FAX: +43-1-58801-165982
Email: blaha at theochem.tuwien.ac.at    WIEN2k: http://www.wien2k.at
WWW:   http://www.imc.tuwien.ac.at/TC_Blaha
--------------------------------------------------------------------------


More information about the Wien mailing list