[Wien] Suse 9.2 problem

Stefaan Cottenier Stefaan.Cottenier at fys.kuleuven.be
Tue May 31 16:08:06 CEST 2005


Dear all,

While setting up a new pc-cluster using P4 machines and Suse 9.2, we 
have troubles with the k-point parallel execution. lapw1 gets correctly 
executed on all machines in a parallel run (the 'fortran stop' statement 
appears, all scf1 files are complete and correct, all error files 
empty), but nevertheless lapw1para keeps running forever. The last lines 
in case.dayfile are :

[1] 4479
[2] 4512
[3] 4545
[1]    Done                          ( $remote $machine[$p]  ...
waiting for all processes to complete

which shows that one of the three machine notified that the process was 
completed, while the other 2 never do this. In other words, lapw1para is 
stuck in the 'wait' statement :


endkloop:
if ($debug > 0) echo waiting for all processes to complete
wait         <========================================

if ($debug > 0) echo `date`" ->" "all processes done."
sleep $sleepy

We verified that exactly the same code works fine on another pc-cluster 
with Suse 9.0, and that code copied from that other cluster has the same 
problem when ran on this new cluster. The NFS setup is as basic as could 
be, and is identical on both clusters. We therefore suspect the problem 
is related to Suse 9.2. I'm pretty sure many of you are running wien2k 
with Suse 9.2. Did anyone experienced a similar problem and knows a fix?

Thanks,
Stefaan



More information about the Wien mailing list