[Wien] time difference among nodes
Luis Ogando
lcodacal at gmail.com
Mon Sep 21 18:25:45 CEST 2015
Dear Professor Blaha,
Thank you !
My .machines file is 0K.
I will ask the administrator to follow your other suggestions (users do
not have privileges).
All the best,
Luis
2015-09-21 10:22 GMT-03:00 Peter Blaha <pblaha at theochem.tuwien.ac.at>:
> a) Check your .machines file. DFoes it meet your expectations, or has
> this node too large load.
>
> b) Can you interactively login into these nodes while your job is running ?
> If yes, login on 2 nodes (in two windows) and run top
>
> c) If nothing obvious is wrong so far, test the network by doing some
> bigger copying from/to these nodes from your $home (or $scratch) to see if
> file-io is killing you.
>
>
> On 09/21/2015 02:51 PM, Luis Ogando wrote:
>
>> Dear Prof. Marks,
>>
>> Many thanks for your help.
>> The administrators said that everything is 0K, the software is the
>> problem (the easy answer) : no zombies, no other jobs in the node, ... !!
>> Let me give you more information to see if you can imagine other
>> possibilities:
>>
>> 1) Intel Xeon Six Core 5680, 3.33GHz
>>
>> 2) Intel(R) Fortran/CC/OpenMPI Intel(R) 64 Compiler XE for applications
>> running on Intel(R) 64, Version 12.1.1.256 Build 20111011
>>
>> 3) OpenMPI 1.6.5
>>
>> 4) PBS Pro 11.0.2
>>
>> 5) OpenMPI built using --with-tm due to prohibited ssh among nodes (
>> http://www.open-mpi.org/faq/?category=building#build-rte-tm )
>>
>> 6) Wien2k 14.2
>>
>> 7) The mystery : two weeks ago, everything was working properly !!
>>
>> Many thanks again !
>> All the best,
>> Luis
>>
>> 2015-09-18 23:24 GMT-03:00 Laurence Marks <laurence.marks at gmail.com
>> <mailto:laurence.marks at gmail.com>>:
>>
>> Almost certainly one or more of:
>> * Other jobs on the node
>> * Zombie process(es)
>> * Too many mpi
>> * Bad memory
>> * Full disc
>> * Too hot
>>
>> If you have it use ganglia, if not ssh in and use top/ps or whatever
>> SGI has. If you cannot sudo get help from someone who can.
>>
>> On Sep 18, 2015 8:58 PM, "Luis Ogando" <lcodacal at gmail.com
>> <mailto:lcodacal at gmail.com>> wrote:
>>
>> Dear Wien2k community,
>>
>> I am using Wien2k in a SGI cluster with 32 nodes. My
>> calculation is running in 4 nodes that have the same
>> characteristics and only my job is running in these 4 nodes.
>> I noticed that one of these 4 nodes is spending more than 20
>> times the time spent by the other 3 nodes in the run_lapw
>> execution.
>> Could someone imagine a reason for this ? Any advice ?
>> All the best,
>> Luis
>>
>>
>> _______________________________________________
>> Wien mailing list
>> Wien at zeus.theochem.tuwien.ac.at <mailto:
>> Wien at zeus.theochem.tuwien.ac.at>
>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>> SEARCH the MAILING-LIST at:
>>
>> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>>
>>
>>
>>
>> _______________________________________________
>> Wien mailing list
>> Wien at zeus.theochem.tuwien.ac.at
>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>> SEARCH the MAILING-LIST at:
>> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>>
>>
> --
>
> P.Blaha
> --------------------------------------------------------------------------
> Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
> Phone: +43-1-58801-165300 FAX: +43-1-58801-165982
> Email: blaha at theochem.tuwien.ac.at WIEN2k: http://www.wien2k.at
> WWW: http://www.imc.tuwien.ac.at/staff/tc_group_e.php
> --------------------------------------------------------------------------
>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20150921/797a1458/attachment.html>
More information about the Wien
mailing list