[Wien] Suse 9.2 problem
Stefaan Cottenier
Stefaan.Cottenier at fys.kuleuven.be
Tue May 31 16:08:06 CEST 2005
Dear all,
While setting up a new pc-cluster using P4 machines and Suse 9.2, we
have troubles with the k-point parallel execution. lapw1 gets correctly
executed on all machines in a parallel run (the 'fortran stop' statement
appears, all scf1 files are complete and correct, all error files
empty), but nevertheless lapw1para keeps running forever. The last lines
in case.dayfile are :
[1] 4479
[2] 4512
[3] 4545
[1] Done ( $remote $machine[$p] ...
waiting for all processes to complete
which shows that one of the three machine notified that the process was
completed, while the other 2 never do this. In other words, lapw1para is
stuck in the 'wait' statement :
endkloop:
if ($debug > 0) echo waiting for all processes to complete
wait <========================================
if ($debug > 0) echo `date`" ->" "all processes done."
sleep $sleepy
We verified that exactly the same code works fine on another pc-cluster
with Suse 9.0, and that code copied from that other cluster has the same
problem when ran on this new cluster. The NFS setup is as basic as could
be, and is identical on both clusters. We therefore suspect the problem
is related to Suse 9.2. I'm pretty sure many of you are running wien2k
with Suse 9.2. Did anyone experienced a similar problem and knows a fix?
Thanks,
Stefaan
More information about the Wien
mailing list