[Wien] A trick for mpi debugging

Laurence Marks L-marks at northwestern.edu
Mon Jul 22 16:47:15 CEST 2013


A brief followup which may be useful (or not) for others in the future
with mpi problems. I have been able to work around a mysterious
impi/ssh bug on NU's supercomputer by replacing ssh by the
openmpi/mpirun launcher. The hack is gross, but very stable.

Step 1:
1) Add "--bootstrap-exec=$WIENROOT/hopen" to $WIENROOT/parallel_options.
2) Create the executable file $WIENROOT/hopen containing
#!/bin/bash
a=`echo $@ | sed -e 's/-x -q//'`
$OPENMPI/bin/mpirun -np 1 --host $a

(change $OPENMPI to where it has been compiled).

On Thu, Jul 18, 2013 at 10:38 AM, Laurence Marks
<L-marks at northwestern.edu> wrote:
> On a cluster I am using I am having a problem with ssh connections as
> part of impi/mpirun about 0.1-0.2% of the time; what happens is that
> they fail to launch and become zombie's (ps shows "[ssh] <defunct>").
> Since fiddling through all the options within mpirun can be hard
> (particularly for impi which is rather fast), I found (after a comment
> from someone on the openssh list) a useful hack. I am providing it
> here as it is a nice way around things, and might be useful to others
> in the future.
>
> The "trick" is to add --bootstrap-exec ~/bin/hssh or similar to the
> mpirun line in $WIENROOT/parallel_options, then create the executable
> ~/bin/hssh with something similar to:
>
> #!/bin/bash
> a=`echo $@ | sed -e 's/-q/-v/'`
> ssh $a
>
>
> The above allows me to turn verbose output on in the ssh command since
> impi insists on setting -q (quiet). For other cases something similar
> can be done.
>
> --
> Professor Laurence Marks
> Department of Materials Science and Engineering
> Northwestern University
> www.numis.northwestern.edu 1-847-491-3996
> "Research is to see what everybody else has seen, and to think what
> nobody else has thought"
> Albert Szent-Gyorgi



-- 
Professor Laurence Marks
Department of Materials Science and Engineering
Northwestern University
www.numis.northwestern.edu 1-847-491-3996
"Research is to see what everybody else has seen, and to think what
nobody else has thought"
Albert Szent-Gyorgi


More information about the Wien mailing list