[Wien] Why is "sleep $delay" commented out in lapw1para_lapw?
Peter Blaha
pblaha at theochem.tuwien.ac.at
Tue Apr 7 07:51:26 CEST 2015
The answer to your query is not so easy.
Consider the following:
Some clusters have NO probem with the filesystem timing and can handle multiple
logins (ssh) "at the same time", others react slower and need a delay (of up to one second).
At the moment our clusters do not have any problem, therefore I commented this (for test reasons),
but apparently forgot to do the same in lapw2para.
It also depends a lot on the types of jobs you submit:
If you have only a few k-points (eg. 10) and each lapw1 step takes anyway 10 minutes, even
a delay of 1 second does not really matter.
If you have many more k-parallel jobs (100 or more) and each lapw1 step takes only 1 second,
even a delay of 0.1 sec can be unacceptable.
For your case: increase it until you get a stable situation.
Am 06.04.2015 um 21:37 schrieb David Olmsted:
> Laurence,
> Thank you for the response. As I mentioned in my first try at this issue last week, I have put the "sleep $delay" back in, and it does seem to have helped. I think I have less failures when the lapw1 processes are being started. But I still have some, so I am not certain. I am still working with the cluster's consultants on this.
>
> Nonetheless, the other scripts do have "sleep $delay", and at the top of lapw1para_lapw it does say
> #In this section use 0 to turn of an option, 1 to turn it on,
> #respectively choose a value
>
> set useremote = 1 # using remote shell to launch processes
> set mpiremote = 1 # using remote shell to launch mpi
> set delay = 0.1 # delay launching of processes by n seconds
> set sleepy = 1.0 # additional sleep before checking
> set debug = 0 # verbosity of debugging output
> set taskset0
> set taskset=no
>
> Given those two things, it seems to me that it would be more appropriate for the delay to actually exist in lapw1para_lapw. But not my call.
>
> Thank you for your help.
>
> Cheers,
> David
>
>
> -----Original Message-----
> From: wien-bounces at zeus.theochem.tuwien.ac.at [mailto:wien-bounces at zeus.theochem.tuwien.ac.at] On Behalf Of Laurence Marks
> Sent: Monday, April 06, 2015 12:14 PM
> To: A Mailing list for WIEN2k users
> Subject: Re: [Wien] Why is "sleep $delay" commented out in lapw1para_lapw?
>
> Dear David,
>
> I think the answer to your question "why" is "because".
>
> Often for things like this it is some combination of "seat of the pants" gut instinct and KISS. I am not certain why I used 0.25 in my version, and I think I have recently reduced it to 0.1. I will admit that I never tested in great detail whether 0.25 was better or worse, it really will depend heavily upon the cluster.
>
> Similarly I suspect the delay between launching ssh was probably removed as it did not seem to matter. My suggestion would be to put it back and see if it helps.
>
> I agree that it would be better to have this (and various other
> things) set in parallel_options.
>
> Not the most clear answer, sorry.
>
> On Mon, Apr 6, 2015 at 11:32 AM, David Olmsted <olmsted at berkeley.edu> wrote:
>> Hi,
>>
>> There has been no response to my suggestion that in lapw1para_lapw, the
>> line “# sleep $delay” be changed to “sleep $delay”. Perhaps I have not
>> given enough information. In the userguide there is no mention of “delay”.
>> In the archive I find nothing explaining why the line is commented
>> out. (Or even explaining that it is commented out.) In
>> lapw2para_lapw, for example, the “sleep $delay” line is actually in
>> use, rather than commented out. The same is true in some of the other
>> scripts. Why the difference in lapw1para_lapw?
>>
>>
>>
>> I am using version 14.2 on a large linux cluster with TORQUE. I was using
>> a revised version of a parallel_options file from a post by Lawrence Marks
>> which included “set delay = 0.25”, and was surprised to discover this did
>> not actually take effect in lapw1para_lapw.
>>
>>
>>
>> Thanks,
>>
>> David
>>
>>
>>
>> David Olmsted
>>
>> Assistant Research Engineer
>>
>> Materials Science and Engineering
>>
>> 210 Hearst Memorial Mining Building
>>
>> University of California
>>
>> Berkeley, CA 94720-1760
>>
>>
>
>
>
> --
> Professor Laurence Marks
> Department of Materials Science and Engineering Northwestern University www.numis.northwestern.edu Corrosion in 4D: MURI4D.numis.northwestern.edu Co-Editor, Acta Cryst A "Research is to see what everybody else has seen, and to think what nobody else has thought"
> Albert Szent-Gyorgi
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>
--
-----------------------------------------
Peter Blaha
Inst. Materials Chemistry, TU Vienna
Getreidemarkt 9, A-1060 Vienna, Austria
Tel: +43-1-5880115671
Fax: +43-1-5880115698
email: pblaha at theochem.tuwien.ac.at
-----------------------------------------
More information about the Wien
mailing list