[Wien] time difference among nodes

Elias Assmann elias.assmann at gmail.com
Fri Sep 25 10:01:08 CEST 2015


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Sounds like a nasty problem …  In terms of strategy, I think the first
thing should be to find out if the node is really to blame.  If so,
you have to convince the admins and/or find a way to avoid it.  If
not, you can turn to figuring out whatever else (presumably in your
Wien2k setup) is causing the trouble.

On 09/24/2015 07:37 PM, Luis Ogando wrote:
> First of all, I wonder: To what extent is this problem
> reproducible? E.g., does your job always run on the same 4 nodes?
> 
> Yes.
> 
> Is it always the same node(s) that are slow?
> 
> Yes

It seems unusual that your job should always be assigned the same
nodes, but okay.  If you get your job to run on a different set it
could help establish if the node is really to blame.  In some queuing
systems, you can request specific nodes.  Or you could submit two
copies of your job.

> The strangest part: at the beginning of this month, the same
> calculation was running properly. I had a crash for convergence
> problems and when I reduced the "mixing factor" in case.inm (it is
> now 0.04 in pre-convergence scf cycle) the problems started.
> Obviously, I do not believe that the mixing factor is the problem.
> 
> No. All the executables are running slowly in the problematic
> node.

I would try to widen the tests then -- restart the calculation from
scratch, try a different case, try other programs …

> Users can do nothing. The administrator sent me the "top's" and I
> have asked him for simultaneous ones.

Like I said, even if you have no direct access you can put it in a job
script.  Something along these lines (in bash):

run &

pid=$(jobs -p %1)

while [[ "$(jobs)" ]]; do
   for n in $NODES; do
      ssh $n top -bn1 >>$n.top
      # plus whatever else you want to check
   done
done

wait



	Elias

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Using GnuPG with Icedove - http://www.enigmail.net/

iQIcBAEBAgAGBQJWBP9EAAoJEE/4gtQZfOqPHfkQALvFqdz2yL5CGbVH7c7klkoo
UT3vR6W+3Ev6in9Ed/z/KOc09m8j2hFrZ0p32jW9EF78jfiObFKaaNVkbHJLpw8l
6ru8AEVBxdNIeCJp53aakILSboRx/GzRnTHdZMyjj8EGfEng+0+fPG2+xm+OWipU
Nsreceb/n+gwJvZTKTn719xushxAM9JSUmSMPrN3WESH4nEgm3wFeR/FuPFyoqfZ
S3RNb0CYd8tB3bs0MP4lYFbHWVeiQVy0j2uOwoiqjfqkSlC1vvJoxnBXO900ybvX
AaIRRXGcmd8XiTaQfD/VPvZX0R3Un1swee4EI0LcMNxiYFGkvuN0p7lMd5MC5Zny
7h+IeXIMH9QNtlWF4HDr7stMAYSeKxKLhTWlddJgIOXrXGPF9BHHJsY/X3LwUIYF
E8UzP061j1LNVwDMUIOYYBX4UCIQJfMpnW3PvbTJIIq56NE3Z6ppxV4ZMAkK2JBo
HRmdtQX8pSCXJaggu7QbAIzdhH4Eat+YoEgBAo6uj1M4tYjZ1GivNlwBO2ItQFTu
Y5JCrWILBKloCEym4TDezcwCR0R2/4cUKkXQlgQUh+iLVrKCG2QkAYnJwSxzdIDe
q19gOQEU5MrUCHtH1vaUTYE+Oq4Z0UNWhKiGRapBgJNFYnRonqzKywqOciWt2SmU
JV7fZo5W2vviyEW/e9TF
=eXD9
-----END PGP SIGNATURE-----


More information about the Wien mailing list