[Wien] time difference among nodes

Mon Sep 21 18:25:45 CEST 2015

Dear Professor Blaha,

   Thank you !
   My .machines file is 0K.
   I will ask the administrator to follow your other suggestions (users do
not have privileges).
   All the best,
           Luis

2015-09-21 10:22 GMT-03:00 Peter Blaha <pblaha at theochem.tuwien.ac.at>:

> a) Check your .machines file.  DFoes it meet your expectations, or has
> this node too large load.
>
> b) Can you interactively login into these nodes while your job is running ?
> If yes, login on 2 nodes (in two windows) and run    top
>
> c) If nothing obvious is wrong so far, test the network by doing some
> bigger copying from/to these nodes from your $home (or $scratch) to see if
> file-io is killing you.
>
>
> On 09/21/2015 02:51 PM, Luis Ogando wrote:
>
>> Dear Prof. Marks,
>>
>>     Many thanks for your help.
>>     The administrators said that everything is 0K, the software is the
>> problem (the easy answer) : no zombies, no other jobs in the node, ... !!
>>     Let me give you more information to see if you can imagine other
>> possibilities:
>>
>> 1) Intel Xeon Six Core 5680, 3.33GHz
>>
>> 2) Intel(R) Fortran/CC/OpenMPI Intel(R) 64 Compiler XE for applications
>> running on Intel(R) 64, Version 12.1.1.256 Build 20111011
>>
>> 3) OpenMPI 1.6.5
>>
>> 4) PBS Pro 11.0.2
>>
>> 5) OpenMPI built using  --with-tm  due to prohibited ssh among nodes  (
>> http://www.open-mpi.org/faq/?category=building#build-rte-tm )
>>
>> 6) Wien2k 14.2
>>
>> 7) The mystery : two weeks ago, everything was working properly !!
>>
>>     Many thanks again !
>>     All the best,
>>                     Luis
>>
>> 2015-09-18 23:24 GMT-03:00 Laurence Marks <laurence.marks at gmail.com
>> <mailto:laurence.marks at gmail.com>>:
>>
>>     Almost certainly one or more of:
>>     * Other jobs on the node
>>     * Zombie process(es)
>>     * Too many mpi
>>     * Bad memory
>>     * Full disc
>>     * Too hot
>>
>>     If you have it use ganglia, if not ssh in and use top/ps or whatever
>>     SGI has. If you cannot sudo get help from someone who can.
>>
>>     On Sep 18, 2015 8:58 PM, "Luis Ogando" <lcodacal at gmail.com
>>     <mailto:lcodacal at gmail.com>> wrote:
>>
>>         Dear Wien2k community,
>>
>>             I am using Wien2k in a SGI cluster with 32 nodes. My
>>         calculation is running in 4 nodes that have the same
>>         characteristics and only my job is running in these 4 nodes.
>>             I noticed that one of these 4 nodes is spending more than 20
>>         times the time spent by the other 3 nodes in the run_lapw
>> execution.
>>             Could someone imagine a reason for this ? Any advice ?
>>             All the best,
>>                      Luis
>>
>>
>>     _______________________________________________
>>     Wien mailing list
>>     Wien at zeus.theochem.tuwien.ac.at <mailto:
>> Wien at zeus.theochem.tuwien.ac.at>
>>     http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>>     SEARCH the MAILING-LIST at:
>>
>> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>>
>>
>>
>>
>> _______________________________________________
>> Wien mailing list
>> Wien at zeus.theochem.tuwien.ac.at
>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>> SEARCH the MAILING-LIST at:
>> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>>
>>
> --
>
>                                       P.Blaha
> --------------------------------------------------------------------------
> Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
> Phone: +43-1-58801-165300             FAX: +43-1-58801-165982
> Email: blaha at theochem.tuwien.ac.at    WIEN2k: http://www.wien2k.at
> WWW:   http://www.imc.tuwien.ac.at/staff/tc_group_e.php
> --------------------------------------------------------------------------
>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20150921/797a1458/attachment.html>