[Wien] .machines problem
Gavin Abo
gsabo at crimson.ua.edu
Sat Oct 1 20:21:34 CEST 2016
Maybe ping homer and odysvs. For example, in a terminal:
ping homer
ping odysvs
In the output, you will probably see something like:
PING homer (xxx.x.x.x)
The ip address xxx.x.x.x of homer and odysvs should be different. If
they are both resolving the localhost (127.0.1.1) ip address like you
thought they are, then maybe there is a problem with the configuration
of the hosts file [ https://en.wikipedia.org/wiki/Hosts_(file) ].
For the test you mentioned below, are you using a Gigabit Ethernet
network [
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg13632.html
,
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg09334.html
,
https://en.wikipedia.org/wiki/List_of_device_bit_rates#Local_area_networks
], OMP_NUM_THREADS=1 [
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg00992.html
], and hyperthreading turned OFF [
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg05474.html ]?
On 10/1/2016 7:19 AM, John Rundgren wrote:
> On 09/30/2016 09:44 AM, Peter Blaha wrote:
>> Parallelization has always its limits and you need to do it
>> "sensible". In general it is NOT true, that more cores always mean
>> faster execution, but it could even slow down the calculations
>> dramatically.
>>
>> a) Hardware: You say your computers have "8 threads". Do you mean
>> they have really 8 cores (like some Xeons), or are these 4 core
>> machines with hyperthreading. In the latter case 8 parallel jobs are
>> useless, as hyperthreading provides only a logical core, but not a
>> "real" one.
>> In addition, it is well known that modern multi-core cpus are very
>> often "memory-bound", this means, their memory bus is too slow to
>> saturate all cores simultaneously. Thus it is often "natural" that a
>> N core job is NOT N times as fast as a single core job.
>> Another factor is disk I/O, which on some systems can become VERY
>> slow (over the network or on a single node) the more jobs are running.
>>
>> b) Software: There is a "multithreading" option with Intel, and
>> setting OMP_NUM_THREAD=2 makes lapw1 nearly twice as fast as
>> OMP_NUM_THREAD=1. Of course, when using this, you should reduce the
>> number of parallel jobs by 2. Check with "top" your cpu usage. When
>> you see "200 %" for an lapw1 process, it is this multithreading.
>> lapw1para: it starts the parallel processes with some "DELAY",
>> otherwise this leads to problems on some systems. If for instance
>> DELAY=1, it means that spanning 16 lapw1 will take at least 16
>> seconds. If your testcase runs only for 2 seconds/lapw1, you can
>> imagine that you will not get any speedup, but a drastic slowdown. If
>> it runs for 5 min, the 16 seconds are negligible and you should see a
>> speedup from 5 to 2.5 min (provided you have enough k-points !, check
>> with "testpara").
>>
>> It is always good, if you can "watch" your parallel job on the two
>> nodes with "top" (in two different windows). You should see how they
>> start, how they run (do the get nearly 100 or 200% of the cores most
>> of the time), and how they stop (nearly same time, or very unbalanced) ?
>>
>>
>> On 09/28/2016 03:21 PM, John Rundgren wrote:
>>> Dear W2k team,
>>> On my desk are two identical computers alpha and beta of 8 threads
>>> each.
>>>
>>> How is .machines set up such that k-point parallelization goes twice as
>>> fast using alpha & beta compared with using single alpha?
>>>
>>> Unfortunately, my testing UG 5.5.4 responds with error diagnostics.
>>>
>>> When I try the following .machines with and without #,
>>> 1:alpha
>>> #1:beta
>>> 1:alpha
>>> #1:beta
>>> 1:alpha
>>> #1:beta
>>> 1:alpha
>>> #1:beta
>>> 1:alpha
>>> #1:beta
>>> 1:alpha
>>> #1:beta
>>> 1:alpha
>>> #1:beta
>>> 1:alpha
>>> #1:beta
>>> granularity:1
>>> extrafine:1
>>> computing time comes out similar in both cases. I would like to see
>>> sixteen threads executing twice as fast as eight.
>>>
>>> Regards,
>>> John Rundgren KTH
>>>
>>>
>>> _______________________________________________
>>> Wien mailing list
>>> Wien at zeus.theochem.tuwien.ac.at
>>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>>> SEARCH the MAILING-LIST at:
>>> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>>
> Dear Peter,
> Thanks for your comments on the use of several computers. A simple
> reason to my failure seems to be that my Linux set-up is defective.
>
> My computers are,
> homer = Xenon E3-1270 v3 @ 3.50GHz, 4 cpus and 8 threads,
> odysvs = i7-3770 3.40GHz, 4 cpus and 8 threads,
> homer being the main computer.
>
> When the following .machines files,
> 1:homer
> 1:homer
> 1:homer
> 1:homer
> 1:homer
> 1:homer
> 1:homer
> 1:homer
> granularity:1
> extrafine:1
> and
> 1:odysvs
> 1:odysvs
> 1:odysvs
> 1:odysvs
> 1:odysvs
> 1:odysvs
> 1:odysvs
> 1:odysvs
> granularity:1
> extrafine:1
> are used separately as input to homer, the execution takes place in
> homer. In both cases the System Monitor of odysvs is idle, although in
> the second case the dayfile refers to odysvs.
>
> The following ssh commands were made beforehand,
> homer> ssh-keygen -t rsa
> homer> ssh-copy-id odysvs,
> test,
> homer> ssh odysvs pwd > /home/jru, without password.
> Any computer mentioned in .machines seems to be treated as "localhost".
>
> Does this test give a clue to what fails?
> Regards, John
More information about the Wien
mailing list