[Wien] .machines problem

John Rundgren jru at kth.se
Tue Oct 4 11:21:36 CEST 2016


Peter and Gavin,
Many thanks for your comments: I realize that a local net is too slow 
for two computers to make up an efficient cluster; I shall study 
multi-threading by testing on various codes and different compilers.
Regards,
John


On 10/01/2016 08:21 PM, Gavin Abo wrote:
> Maybe ping homer and odysvs.  For example, in a terminal:
>
> ping homer
> ping odysvs
>
> In the output, you will probably see something like:
>
> PING homer (xxx.x.x.x)
>
> The ip address xxx.x.x.x of homer and odysvs should be different. If 
> they are both resolving the localhost (127.0.1.1) ip address like you 
> thought they are, then maybe there is a problem with the configuration 
> of the hosts file [ https://en.wikipedia.org/wiki/Hosts_(file) ].
>
> For the test you mentioned below, are you using a Gigabit Ethernet 
> network [ 
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg13632.html 
> , 
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg09334.html 
> , 
> https://en.wikipedia.org/wiki/List_of_device_bit_rates#Local_area_networks 
> ], OMP_NUM_THREADS=1 [ 
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg00992.html 
> ], and hyperthreading turned OFF [ 
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg05474.html 
> ]?
>
> On 10/1/2016 7:19 AM, John Rundgren wrote:
>> On 09/30/2016 09:44 AM, Peter Blaha wrote:
>>> Parallelization has always its limits and you need to do it 
>>> "sensible". In general it is NOT true, that more cores always mean 
>>> faster execution, but it could even slow down the calculations 
>>> dramatically.
>>>
>>> a) Hardware: You say your computers have "8 threads". Do you mean 
>>> they have really 8 cores (like some Xeons), or are these 4 core 
>>> machines with hyperthreading. In the latter case 8 parallel jobs are 
>>> useless, as hyperthreading provides only a logical core, but not a 
>>> "real" one.
>>> In addition, it is well known that modern multi-core cpus are very 
>>> often "memory-bound", this means, their memory bus is too slow to 
>>> saturate all cores simultaneously. Thus it is often "natural" that a 
>>> N core job is NOT N times as fast as a single core job.
>>> Another factor is disk I/O, which on some systems can become VERY 
>>> slow (over the network or on a single node) the more jobs are running.
>>>
>>> b) Software: There is a "multithreading" option with Intel, and 
>>> setting OMP_NUM_THREAD=2 makes lapw1 nearly twice as fast as 
>>> OMP_NUM_THREAD=1. Of course, when using this, you should reduce the 
>>> number of parallel jobs by 2. Check with "top" your cpu usage. When 
>>> you see "200 %" for an lapw1 process, it is this multithreading.
>>> lapw1para: it starts the parallel processes with some "DELAY", 
>>> otherwise this leads to problems on some systems. If for instance 
>>> DELAY=1, it means that spanning 16 lapw1 will take at least 16 
>>> seconds. If your testcase runs only for 2 seconds/lapw1, you can 
>>> imagine that you will not get any speedup, but a drastic slowdown. 
>>> If it runs for 5 min, the 16 seconds are negligible and you should 
>>> see a speedup from 5 to 2.5 min (provided you have enough k-points 
>>> !, check with "testpara").
>>>
>>> It is always good, if you can "watch" your parallel job on the two 
>>> nodes with "top" (in two different windows). You should see how they 
>>> start, how they run (do the get nearly 100 or 200% of the cores most 
>>> of the time), and how they stop (nearly same time, or very 
>>> unbalanced) ?
>>>
>>>
>>> On 09/28/2016 03:21 PM, John Rundgren wrote:
>>>> Dear W2k team,
>>>> On my desk are two identical computers alpha and beta of 8 threads 
>>>> each.
>>>>
>>>> How is .machines set up such that k-point parallelization goes 
>>>> twice as
>>>> fast using alpha & beta compared with using single alpha?
>>>>
>>>> Unfortunately, my testing UG 5.5.4 responds with error diagnostics.
>>>>
>>>> When I try the following .machines with and without #,
>>>>   1:alpha
>>>>   #1:beta
>>>>   1:alpha
>>>>   #1:beta
>>>>   1:alpha
>>>>   #1:beta
>>>>   1:alpha
>>>>   #1:beta
>>>>   1:alpha
>>>>   #1:beta
>>>>   1:alpha
>>>>   #1:beta
>>>>   1:alpha
>>>>   #1:beta
>>>>   1:alpha
>>>>   #1:beta
>>>>   granularity:1
>>>>   extrafine:1
>>>> computing time comes out similar in both cases. I would like to see
>>>> sixteen threads executing twice as fast as eight.
>>>>
>>>> Regards,
>>>> John Rundgren KTH
>>>>
>>>>
>>>> _______________________________________________
>>>> Wien mailing list
>>>> Wien at zeus.theochem.tuwien.ac.at
>>>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>>>> SEARCH the MAILING-LIST at:
>>>> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>>>
>> Dear Peter,
>> Thanks for your comments on the use of several computers. A simple 
>> reason to my failure seems to be that my Linux set-up is defective.
>>
>> My computers are,
>>   homer = Xenon E3-1270 v3 @ 3.50GHz, 4 cpus and 8 threads,
>>   odysvs = i7-3770 3.40GHz, 4 cpus and 8 threads,
>> homer being the main computer.
>>
>> When the following .machines files,
>>   1:homer
>>   1:homer
>>   1:homer
>>   1:homer
>>   1:homer
>>   1:homer
>>   1:homer
>>   1:homer
>>   granularity:1
>>   extrafine:1
>> and
>>   1:odysvs
>>   1:odysvs
>>   1:odysvs
>>   1:odysvs
>>   1:odysvs
>>   1:odysvs
>>   1:odysvs
>>   1:odysvs
>>   granularity:1
>>   extrafine:1
>> are used separately as input to homer, the execution takes place in 
>> homer. In both cases the System Monitor of odysvs is idle, although 
>> in the second case the dayfile refers to odysvs.
>>
>> The following ssh commands were made beforehand,
>>  homer> ssh-keygen -t rsa
>>  homer> ssh-copy-id odysvs,
>> test,
>>  homer> ssh odysvs pwd > /home/jru, without password.
>> Any computer mentioned in .machines seems to be treated as "localhost".
>>
>> Does this test give a clue to what fails?
>> Regards, John
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at: 
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


-- 
Dr John Rundgren
KTH Royal Institute of Technology, Theoretical Physics
AlbaNova, SE-10691 Stockholm, Sweden
homepage http://theophys.kth.se/~jru



More information about the Wien mailing list