[Wien] k-point parallel job in distributed file system
XU ZUO
xzuo at nankai.edu.cn
Fri Aug 18 02:37:25 CEST 2006
Thank you for your help.
I agree that "... one [kernel] with the bugs Wien2k behaves very badly". I
test the wien2k k-point parallel on a virtual cluster, in which every node
is a virtual machine. It works very well.
Xu Zuo
-----Original Message-----
From: wien-bounces at zeus.theochem.tuwien.ac.at
[mailto:wien-bounces at zeus.theochem.tuwien.ac.at] On Behalf Of L. D. Marks
Sent: Thursday, August 17, 2006 9:53 PM
To: A Mailing list for WIEN2k users
Subject: Re: [Wien] k-point parallel job in distributed file system
I think you need to ask, pointedly, your sysadmin/expert whether or not
they/you are using a kernel with the (known) NFS bugs in it or a more recent
one. If it is one with the bugs Wien2k behaves very badly and the best that
one can hope for is to persist past the crashes. There are also a lot of
switches in how NFS is configured that will effect Wien2k; probably some
reading (it's not simple and I don't understand all the details myself).
On Thu, 17 Aug 2006, XU ZUO wrote:
> Unfortunately, I am suffered from the instability of the k-point
> parallelization. I understand that this problem is caused by the bad
> NFS performance (problems on read/write latency and synchronization)
> and that adjusting $delay and $sleepy may solve the problem. However,
> as the cluster load and traffic are dynamic, it is better to design
> adaptive code, which can handle this problem dynamically.
>
> -----Original Message-----
> From: wien-bounces at zeus.theochem.tuwien.ac.at
> [mailto:wien-bounces at zeus.theochem.tuwien.ac.at] On Behalf Of Stefaan
> Cottenier
> Sent: Thursday, August 17, 2006 7:57 PM
> To: wien at zeus.theochem.tuwien.ac.at
> Subject: Re: [Wien] k-point parallel job in distributed file system
>
> You say you were able to do a k-point parallel run, but it was slow.
> This means that all your nodes can access a common place (where your
> case.struct etc. are). Your problem probably is that you have put
> $SCRATCH also in that same directory, which indeed causes a lot of network
traffic.
> The solution is easy: either you assign to $SCRATCH a directory that
> exists on all your nodes (often this is the case for /tmp), or -- if
> that is not possible -- you assign on-the-fly the correct workspace
> directory for the
> node(s) you are submittin to (like in the PBS script from the other
reply).
>
> Stefaan
>
>
>> Hello,
>>
>> We are trying to do k-point parallel wien2k job in a linux cluster
>> which has distributed file system. Though we are able to do k-point
>> parallel calculation, we have a problem in assigning a common work
>> space ($SCRATCH) to read/write all input/output files. This means
>> that, for example, if we do a 10 kpoint calculation in 10 nodes, all
>> the 10 nodes should communicate to the common working area through
>> ssh to read/write files. This slows down the performance and also the
network.
>> So far we have done k-point parallel calculations in supercomputers
>> with shared memory and hence we never had such a problem. Is it
>> possible to do k-point parallel calculations in distributed file
>> system without any common working area?
>>
>> I have received the following from the system expert here.
>>
>> ###
>> Hmm, I've been looking through the jungle of scipts which constitutes
>> wien2k, and it is clear to me that this way of paralellizing isn't
>> meant for distributed filesystems (local disks on nodes). Unless the
>> wien2k people have a solution, I don't think we will get around this
>> without some major reprogramming. At least it seems so to me, but I
>> must admit that I don't have the complete overview of todo tasks.
>>
>> Also a quick google of the proble, did not provide a solution.
>> This is very efficient for SMP types of machines, but is a bit ad-hoc
>> for cluster type computers.
>> On the bright side, it doesn't seem taht the program does a lot of
>> disk read/write in the long run. Only 10-20 min bursts of 10 megs/sek.
>> ####
>>
>> Looking forward your responses to do the computation more efficently.
>>
>> Best regards
>> Ravi
>
> Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>
>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>
-----------------------------------------------
Laurence Marks
Department of Materials Science and Engineering MSE Rm 2036 Cook Hall 2220 N
Campus Drive Northwestern University Evanston, IL 60208, USA
Tel: (847) 491-3996 Fax: (847) 491-7820
email: L-marks at northwestern dot edu
http://www.numis.northwestern.edu
-----------------------------------------------
_______________________________________________
Wien mailing list
Wien at zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
More information about the Wien
mailing list