[Wien] k-point parallel job in distributed file system

L. D. Marks L-marks at northwestern.edu
Thu Aug 17 15:53:29 CEST 2006


I think you need to ask, pointedly, your sysadmin/expert whether or not 
they/you are using a kernel with the (known) NFS bugs in it or a more 
recent one. If it is one with the bugs Wien2k behaves very badly and the 
best that one can hope for is to persist past the crashes. There are also 
a lot of switches in how NFS is configured that will effect Wien2k; 
probably some reading (it's not simple and I don't understand all the 
details myself).

On Thu, 17 Aug 2006, XU ZUO wrote:

> Unfortunately, I am suffered from the instability of the k-point
> parallelization. I understand that this problem is caused by the bad NFS
> performance (problems on read/write latency and synchronization) and that
> adjusting $delay and $sleepy may solve the problem. However, as the cluster
> load and traffic are dynamic, it is better to design adaptive code, which
> can handle this problem dynamically.
>
> -----Original Message-----
> From: wien-bounces at zeus.theochem.tuwien.ac.at
> [mailto:wien-bounces at zeus.theochem.tuwien.ac.at] On Behalf Of Stefaan
> Cottenier
> Sent: Thursday, August 17, 2006 7:57 PM
> To: wien at zeus.theochem.tuwien.ac.at
> Subject: Re: [Wien] k-point parallel job in distributed file system
>
> You say you were able to do a k-point parallel run, but it was slow.
> This means that all your nodes can access a common place (where your
> case.struct etc. are). Your problem probably is that you have put $SCRATCH
> also in that same directory, which indeed causes a lot of network traffic.
> The solution is easy: either you assign to $SCRATCH a directory that exists
> on all your nodes (often this is the case for /tmp), or -- if that is not
> possible -- you assign on-the-fly the correct workspace directory for the
> node(s) you are submittin to (like in the PBS script from the other reply).
>
> Stefaan
>
>
>> Hello,
>>
>>  	We are trying to do k-point parallel wien2k job in a linux cluster
>> which has distributed file system. Though we are able to do k-point
>> parallel calculation, we have a problem in assigning a common work
>> space ($SCRATCH) to read/write all input/output files. This means
>> that, for example, if we do a 10 kpoint calculation in 10 nodes, all
>> the 10 nodes should communicate to the common working area through ssh
>> to read/write files. This slows down the performance and also the network.
>> So far we have done k-point parallel calculations in supercomputers
>> with shared memory and hence we never had such a problem.  Is it
>> possible to do k-point parallel calculations in distributed file
>> system without any common working area?
>>
>> I have received the following from the system expert here.
>>
>> ###
>> Hmm, I've been looking through the jungle of scipts which constitutes
>> wien2k, and it is clear to me that this way of paralellizing isn't
>> meant for distributed filesystems (local disks on nodes). Unless the
>> wien2k people have a solution, I don't think we will get around this
>> without some major reprogramming. At least it seems so to me, but I
>> must admit that I don't have the complete overview of todo tasks.
>>
>> Also a quick google of the proble, did not provide a solution.
>> This is very efficient for SMP types of machines, but is a bit ad-hoc
>> for cluster type computers.
>> On the bright side, it doesn't seem taht the program does a lot of
>> disk read/write in the long run. Only 10-20 min bursts of 10 megs/sek.
>> ####
>>
>> Looking forward your responses to do the computation more efficently.
>>
>> Best regards
>> Ravi
>
> Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>
>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>

-----------------------------------------------
Laurence Marks
Department of Materials Science and Engineering
MSE Rm 2036 Cook Hall
2220 N Campus Drive
Northwestern University
Evanston, IL 60208, USA
Tel: (847) 491-3996 Fax: (847) 491-7820
email: L-marks at northwestern dot edu
http://www.numis.northwestern.edu
-----------------------------------------------



More information about the Wien mailing list