[Wien] Network problem caused by lapw1?

Oleg Rubel rubelo at tbh.net
Tue Dec 22 17:26:23 CET 2009


Dear Wien2k Users and Developers,

I observe the cluster network dying for about 10 minutes when performing calculation for a relatively large case that involves 256 cores and InfiniBand. I use WIEN2k_09.2 (Release 29/9/2009) + ifort 11.0.074 + Intel MKL 10.1.0.015 + MVAPICH2 and iterative diagonalization. The network dyes always at the end of the second scf iteration iteration (most likely at the end of lapw1). This did not occur in WIEN2k_08.3 (Release 18/9/2008) for the same case and compiler settings. I know that the iterative diagonalization has undergone some major changes between these two versions.

This actually does not interrupt the calculations and there is no sign of any error, but it causes SGE demon to die on compute nodes with all consequences.

Did anyone experience a similar problem? What is differently in the behaviour of lapw1 for the 2nd iteration that may cause the problem?

Thank you in advance and Happy Holidays.

Oleg Rubel

--
Thunder Bay Regional Research Institute
290 Munro St, Thunder Bay, ON, P7A 7T1, Canada
Homepage: http://www.tbrri.com/~orubel/


More information about the Wien mailing list