[Wien] mpd invalid port info, MPI problem

Ludwig, Christian ludwigc at uni-mainz.de
Thu Sep 25 09:42:17 CEST 2008


Hi,

I installed intel MPI on a cluster and now I am trying to run Wien on the master and one node. k-parallelisation works. For fine grain, my machines file looks something like

1:master:2
1:node:4
granularity:1
extrafine:1

lapw0 is done, of course, and in lapw1 I get the following dayfile:
running LAPW1 in parallel mode (using .machines)
2 number_of_parallel_jobs
     snode7 snode7 snode7 snode7(1) mpdboot_snode7 (handle_mpd_output 589): from mpd on iacgu1, invalid port info:
/bin/sh: rsh: command not found

     iacgu1 iacgu1(1) Using    1 processors
scalapack processors array (row,col):   1   1
Using    1 processors
scalapack processors array (row,col):   1   1
     snode7 snode7 snode7 snode7(1) mpdboot_snode7 (handle_mpd_output 589): from mpd on iacgu1, invalid port info:
/bin/sh: rsh: command not found

**  LAPW1 crashed!
0.148u 0.300s 0:05.32 8.2%      0+0k 0+0io 0pf+0w
error: command   /usr/local/Wien2k/lapw1para lapw1.def   failed


STDOUT is:
LAPW0 END

real    0m0.408s
user    0m0.232s
sys     0m0.108s
 LAPW1 END
-----------------------------------------------------------------------------
It seems that [at least] one of the processes that was started with
mpirun did not invoke MPI_INIT before quitting (it is possible that
more than one process did not invoke MPI_INIT -- mpirun was only
notified of the first one, which was on node n0).

mpirun can *only* be used with MPI programs (i.e., programs that
invoke MPI_INIT and MPI_FINALIZE).  You can use the "lamexec" program
to run non-MPI programs over the lambooted nodes.
-----------------------------------------------------------------------------
forrtl: error (78): process killed (SIGTERM)

real    0m0.171s
user    0m0.000s
sys     0m0.004s

real    0m0.424s
user    0m0.252s
sys     0m0.096s


I started the mpdaemon on master and node and successfully executed commands via mpdrun -l -n 2 command
But this only works with mpd & and ssh node mpd -h master -p port -d, when I try mpdboot -n 2 -f ~/mpd.hosts (which Wien seems to invoke) I get
mpdboot_master (handle_mpd_output 589): from mpd on node.domain.name, invalid port info:
node.domain.name: Connection refused

Any help would be greatly appreciated.

All the best,
Christian


More information about the Wien mailing list