[Wien] At least on the NSF supercomputer Trestles, the line "sleep $delay" in lapw1para_lapw should not be commented out

David Olmsted olmsted at berkeley.edu
Thu Apr 2 20:38:49 CEST 2015


I am running WIEN2k_14.2 (Release 15/10/2014).  The clusters I use all use
torque.  

 

In lapw1para_lapw, in an initial section where options are set, there is a
line giving a default for "delay" of

set delay       = 0.1           # delay launching of processes by n seconds

 

In a parallel_options file posted by Laurence Marks for a similar
environment, this is changed to 0.25 seconds.  (set delay   = 0.25).

I am using a version of this file.

 

Unfortunately for my runs on one cluster the actual line "sleep $delay" is
commented out.   This is line 559:#    sleep $delay.

 

The effect of this was that frequently some of the lapw1 processes would
start, but a few would fail.

(Causing the job to fail.)  A consultant from the help address at Trestles
suggested adding a delay of some kind so that multiple ssh connections were
not attempted all at once.  When I looked at lapw1para_lapw, it turned out
that all I had to do was to uncomment that line.  So far at least, the
problem has not recurred, so I think it has made a difference.

 

I would suggest that the delay be put back in.  The current 0.1 seconds
seems small enough to me, but even if the default were smaller, it could be
set by the user in parallel_options.

 

Best,

David


David Olmsted

Assistant Research Engineer

Materials Science and Engineering

210 Hearst Memorial Mining Building

University of California

Berkeley, CA 94720-1760

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20150402/37b7d5b8/attachment.html>


More information about the Wien mailing list