[Wien] NSF cache kernel bug can break Wien2k

L. D. Marks L-marks at northwestern.edu
Tue Feb 7 17:38:01 CET 2006


We've been experiencing irreproducible problems in Wien2k ever since we 
moved to rocks 4.0 or 4.1 (RedHat kernel 2.6.9). After many months we've 
traced that it is probably a real kernel bug 
(http://bugs.centos.org/view.php?id=1039) which (according to Trond 
Myklebust) should be fixed in 2.6.15-rc5 and newer kernels. The presence 
of the bug is masked by the default use of automount in rocks and probably 
other systems.

There is a very simple test you can do if you think you have it. In a 
directory which is nfs mounted (not automounted) on a compute node c0-0, 
create a script test.sh 
containing:

echo 10 > Probe
ssh -x c0-0 cat Probe
echo 11 > Probe
ssh -x c0-0 cat Probe

Then "sh test.sh" will report 10 & 11 the very first time you do it, but 
afterwards probably 10 & 10.

Cure, none yet.
-----------------------------------------------
Laurence Marks
Department of Materials Science and Engineering
MSE Rm 2036 Cook Hall
2220 N Campus Drive
Northwestern University
Evanston, IL 60201, USA
Tel: (847) 491-3996 Fax: (847) 491-7820
email: L-marks at northwestern dot edu
http://www.numis.northwestern.edu
-----------------------------------------------



More information about the Wien mailing list