[Wien] e-test: subscript out of range

Stefaan Cottenier Stefaan.Cottenier at fys.kuleuven.ac.be
Thu Jul 3 13:56:58 CEST 2003


> I had the same problem for quite a while on Suse 7.x and Athlon. In our
> case, the problem seem to have originated from problems with NFS mounted
> file systems (only showing up in WIEN). To overcome it, I added a "sleep
1"
> at the beginning of the script "testconv" to get some extra time for
> synchronizing the file systems. Hardly any problems since then.
>
> Our local computer wizards told me recently that Suse 7 has become widely
> known to have some NFS troubles with Athlons. There should be fixes
> available. Upgrading to Suse 8 may also be an option.

Dear Ingo,

Thank you very much, this mysterious problem is now suddenly solved, and
much of the strange behaviour becomes understandable.

1) Although we are using Suse 8.1 on all clients of the pc-cluster, the
server (that does the file handling) had 7.3 : consistent with your
explanation.

2) This etest-problem was mostly reproducable, but sometimes it was less
likely to occur. Now I understand why : it did sometimes not occur when 2
jobs were running simultaneously. That induced some natural extra delay !

3) I had a fix that worked, but was not understable so far:  interchanging
two parts of the code of testconv (that of course were chosen in order not
to change the logical order). Now I see why it worked: the guilty piece of
code was executed a little bit later, allowing some extra time for the file
system to be ready.

4) After adding your extra sleep second, the problem seems to have gone.

We will upgrade to Suse 8.1 for the server anyway, as also in lapw2 and
lapwso we had to introduce sleeps of many seconds in order to guarantee
stable behaviour. I thought this was unavoidable, but now it looks like this
could be related to the same problem.

Thanks !

Stefaan




More information about the Wien mailing list