[Wien] k-point parallel job in distributed file system
Stefaan Cottenier
Stefaan.Cottenier at fys.kuleuven.be
Thu Aug 17 13:57:03 CEST 2006
You say you were able to do a k-point parallel run, but it was slow.
This means that all your nodes can access a common place (where your
case.struct etc. are). Your problem probably is that you have put
$SCRATCH also in that same directory, which indeed causes a lot of
network traffic. The solution is easy: either you assign to $SCRATCH a
directory that exists on all your nodes (often this is the case for
/tmp), or -- if that is not possible -- you assign on-the-fly the
correct workspace directory for the node(s) you are submittin to (like
in the PBS script from the other reply).
Stefaan
> Hello,
>
> We are trying to do k-point parallel wien2k job in a linux cluster
> which has distributed file system. Though we are able to do k-point
> parallel calculation, we have a problem in assigning a
> common work space ($SCRATCH) to read/write all input/output files. This
> means that, for example, if we do a 10 kpoint calculation in 10 nodes, all
> the 10 nodes should communicate to the common working area through ssh to
> read/write files. This slows down the performance and also the network.
> So far we have done k-point parallel calculations in supercomputers with
> shared memory and hence we never had such a problem. Is it
> possible to do k-point parallel calculations in distributed file system
> without any common working area?
>
> I have received the following from the system expert here.
>
> ###
> Hmm, I've been looking through the jungle of scipts which constitutes
> wien2k, and it is clear to me that
> this way of paralellizing isn't meant for distributed filesystems (local
> disks on nodes). Unless the
> wien2k people have a solution, I don't think we will get around this
> without some major reprogramming. At
> least it seems so to me, but I must admit that I don't have the complete
> overview of todo tasks.
>
> Also a quick google of the proble, did not provide a solution.
> This is very efficient for SMP types of machines, but is a bit
> ad-hoc for cluster type computers.
> On the bright side, it doesn't seem taht the program does a lot of disk
> read/write in the long run. Only 10-20 min bursts of 10 megs/sek.
> ####
>
> Looking forward your responses to do the computation more efficently.
>
> Best regards
> Ravi
Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
More information about the Wien
mailing list