[Wien] mpd invalid port info, MPI problem
Ludwig, Christian
ludwigc at uni-mainz.de
Thu Sep 25 09:42:17 CEST 2008
Hi,
I installed intel MPI on a cluster and now I am trying to run Wien on the master and one node. k-parallelisation works. For fine grain, my machines file looks something like
1:master:2
1:node:4
granularity:1
extrafine:1
lapw0 is done, of course, and in lapw1 I get the following dayfile:
running LAPW1 in parallel mode (using .machines)
2 number_of_parallel_jobs
snode7 snode7 snode7 snode7(1) mpdboot_snode7 (handle_mpd_output 589): from mpd on iacgu1, invalid port info:
/bin/sh: rsh: command not found
iacgu1 iacgu1(1) Using 1 processors
scalapack processors array (row,col): 1 1
Using 1 processors
scalapack processors array (row,col): 1 1
snode7 snode7 snode7 snode7(1) mpdboot_snode7 (handle_mpd_output 589): from mpd on iacgu1, invalid port info:
/bin/sh: rsh: command not found
** LAPW1 crashed!
0.148u 0.300s 0:05.32 8.2% 0+0k 0+0io 0pf+0w
error: command /usr/local/Wien2k/lapw1para lapw1.def failed
STDOUT is:
LAPW0 END
real 0m0.408s
user 0m0.232s
sys 0m0.108s
LAPW1 END
-----------------------------------------------------------------------------
It seems that [at least] one of the processes that was started with
mpirun did not invoke MPI_INIT before quitting (it is possible that
more than one process did not invoke MPI_INIT -- mpirun was only
notified of the first one, which was on node n0).
mpirun can *only* be used with MPI programs (i.e., programs that
invoke MPI_INIT and MPI_FINALIZE). You can use the "lamexec" program
to run non-MPI programs over the lambooted nodes.
-----------------------------------------------------------------------------
forrtl: error (78): process killed (SIGTERM)
real 0m0.171s
user 0m0.000s
sys 0m0.004s
real 0m0.424s
user 0m0.252s
sys 0m0.096s
I started the mpdaemon on master and node and successfully executed commands via mpdrun -l -n 2 command
But this only works with mpd & and ssh node mpd -h master -p port -d, when I try mpdboot -n 2 -f ~/mpd.hosts (which Wien seems to invoke) I get
mpdboot_master (handle_mpd_output 589): from mpd on node.domain.name, invalid port info:
node.domain.name: Connection refused
Any help would be greatly appreciated.
All the best,
Christian
More information about the Wien
mailing list