[Wien] time difference among nodes

Elias Assmann elias.assmann at gmail.com
Thu Sep 24 10:24:28 CEST 2015


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Luis,

First of all, I wonder: To what extent is this problem reproducible?
E.g., does your job always run on the same 4 nodes?  Is it always the
same node(s) that are slow?  Does the problem also show up in other
calculations (maybe just changing the number of k-points, or
restarting the same case from scratch).  Is it only lapw1 that is slow?

Second, how did you make those ‘top’s?  As for ‘lapw0’ and ‘lapw1’, I
am guessing that this is just because the snapshots were taken at
different times (notice that the CPU times of lapw0 on the two nodes
are quite different, too).

About the CPU usage on ‘n2’, I find this very suspicious.  If it is as
Peter said that the jobs are in the initialization and therefore not
computing much, that may be fine; but I have to disagree with his
assessment, because the memory usage of lapw1 on the two nodes is
basically the same (if anything, the image sizes on ‘n2’ are slightly
larger).  Note also that it is *not* the case that other processes are
using the CPU; the total usage is at 7.5 %.

It would be good to clarify that by getting a ‘top’ such that we know
that lapw1 had been running for a while.  To this end, top has an ‘-n’
option which says how many frames to output, e.g. ‘top -bn 10’.

I am also curious about the load averages.  ‘n2’ has larger “mid-term”
and “long-term” load averages than the others, and its “short-term”
average is just as large.  I am not sure what that means.

On 09/23/2015 02:21 PM, Luis Ogando wrote:
> I can not access the nodes. SSH among them is forbidden ! We have
> to ask the administrators for anything !! It is the hell !! Of
> course, only the PBS jobs can "travel" among the nodes.

I do not know about PBS Pro, but Torque and SGE have an option (I
think ‘-I’ in either case) to submit an interactive job where you get
a login on a node.  Of course that is only a realistic option when the
queuing time is not too long.  Otherwise, any information that a more
sophisticated tool can give you will also be available from the
command line (just more painful to extract!) via ‘top’, ‘ps’, ‘/proc’,
etc.  You can also put these things in a jobs script (which you
apparently already did with ‘top’).


Good luck,

	Elias


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Using GnuPG with Icedove - http://www.enigmail.net/

iQIcBAEBAgAGBQJWA7M8AAoJEE/4gtQZfOqPu5AQAJERPcJ8VBgVJdiVmDPSmfC0
9lJ+NUXWbNKxP9oXVChniwB/p0TUn588xVtVGIiXuviIW6jWM/reh7aU4NkXfxz/
J3zQq+yZ/gqMnK3JseNpq5hosU6f8keG4dGvq/qz3a+fDefe3Q1KoaTotG3oOyzY
foq3RJjIoY0M7Yl2VJXhhDU6fLWNuu2Uixd9DpbWDmUzhY2o7y8zUZrCdEN0CMN7
OcaUWAkPzFwAdGY/ZVzmc4AvBICXAndBRd29KIMF5JJAxKqwXzbCbROZC14spCl5
Yt8A3deCiUrCGKTuT8w4or8shtkfGxFXXWAEKxY9kKpsHRGmbcOmIVljXk3x6JpV
VOo5y3xHOEmaGOGGRZSDRGK0AWpkiep71us9zOYmnTd0GVuulOOAfi6m4FyTS0vc
3FPws2FUaOZWHm+K0AEMJyyxY5Sz6NwN6sTmiPfelvUdKLDHpDDVyig1a0X+x39+
jfgOx/J927rCYvyWA1/n5h6Mqj7ByUYA3zM9nrrTt3mw5YM/fgCyqlFp8M9cWWRF
cW54Aes9cnV2GdhnbLy7cuOwXK5J7FV6uyQFPipaAkuGEG7ynvUWQdvnftX9j1hL
O8S6WOzZDUYduB3mXJ5XT2iV2jjRd3zEk1niQcRfyFuQUYneY9zuGjpxkknmxEln
5KaBqwFCLo4XnRrvlDkg
=PO9e
-----END PGP SIGNATURE-----


More information about the Wien mailing list