[Wien] problems in wien2k 10 run

saed alazar q_saed74 at yahoo.com
Thu May 19 17:37:12 CEST 2011


The code works just fine on the student cluster where we have compiled 
it using the extra '-assu buff' flag to alleviate NFS problems. I think 
we have seen that the optimisation jobs work well there.

There are still problems with the optimisation jobs running on the planck cluster though.

It seems that there are 2 main problems:

1  
 One node appears to be doing nearly all the work while the others do 
little... as seen in the dayfile. However if we login to the job nodes 
while the job is running and run the top command, all cores seem to be 
using ~100% cpu, load is normal. Also 'cat /proc/meminfo' shows there is
 plenty of free memory (there should be as these nodes each have 32GB 
RAM).
2   After some time it becomes impossible to login to your home
 directory and I cannot even login to the job node from the console on 
the machine. I also cannot delete the job from the queues. This means I 
then have to turn the queuing system off (qterm -t quick), remove the job files from the jobs directory (rm 
-rf /usr/local/torque/server_priv/jobs/2627.planck.*) and then restart 
the queuing system (/usr/local/torque/sbin/pbs_server -t warm). I then 
still have to reboot the nodes that were involved with that job. This is a problem.

The main difference between the planck cluster and the student cluster is that the planck cluster has a GPFS parallel file system and does not use NFS (well actually, GPFS uses something like NFS). The problems we were seeing on the student clsuter disappeared when we 
recompiled with the extra '-assu buff' flag. I am recompiling wien2k on the planck cluster with this flag but it does not fix the 
problems.

Other than that, both machines are running RHEL5.3 
Operating System, and openmpi, fftw have been compiled the same way, as 
has wien2k.




Thanks


Said
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20110519/127a430a/attachment.htm>


More information about the Wien mailing list