[Wien] At least on the NSF supercomputer Trestles, the line "sleep $delay" in lapw1para_lapw should not be commented out
David Olmsted
olmsted at berkeley.edu
Thu Apr 2 20:38:49 CEST 2015
I am running WIEN2k_14.2 (Release 15/10/2014). The clusters I use all use
torque.
In lapw1para_lapw, in an initial section where options are set, there is a
line giving a default for "delay" of
set delay = 0.1 # delay launching of processes by n seconds
In a parallel_options file posted by Laurence Marks for a similar
environment, this is changed to 0.25 seconds. (set delay = 0.25).
I am using a version of this file.
Unfortunately for my runs on one cluster the actual line "sleep $delay" is
commented out. This is line 559:# sleep $delay.
The effect of this was that frequently some of the lapw1 processes would
start, but a few would fail.
(Causing the job to fail.) A consultant from the help address at Trestles
suggested adding a delay of some kind so that multiple ssh connections were
not attempted all at once. When I looked at lapw1para_lapw, it turned out
that all I had to do was to uncomment that line. So far at least, the
problem has not recurred, so I think it has made a difference.
I would suggest that the delay be put back in. The current 0.1 seconds
seems small enough to me, but even if the default were smaller, it could be
set by the user in parallel_options.
Best,
David
David Olmsted
Assistant Research Engineer
Materials Science and Engineering
210 Hearst Memorial Mining Building
University of California
Berkeley, CA 94720-1760
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20150402/37b7d5b8/attachment.html>
More information about the Wien
mailing list