[Wien] A trick for mpi debugging

Luis Ogando lcodacal at gmail.com
Mon Aug 5 14:03:47 CEST 2013


Dear Prof. Marks,

   Thank you again !
   I will do the tests and tell you what happen.
   All the best,
                Luis



2013/8/3 Laurence Marks <L-marks at northwestern.edu>

> I am not sure if I can give you the right answer; My guess is to have
> it as 1, but I do not know all the details of your system and if I
> remember right you have an sgi system. Try both, then let us/me know
> what works (or does not).
>
> For reference, I have it working fine with USE_REMOTE 1, and I don't
> currently want to change to test (particularly as I am on travel).
>
> On Fri, Aug 2, 2013 at 8:36 AM, Luis Ogando <lcodacal at gmail.com> wrote:
> > Dear Prof. Marks,
> >
> >    Just a quick question : in case that the openmpi launcher replaces
> ssh,
> > should I change USE_REMOTE to 0 in a cluster ?
> >    Thank you one more time,
> >                 Luis
> >
> >
> >
> > 2013/7/27 Laurence Marks <L-marks at northwestern.edu>
> >>
> >> WARNING 1: To be used with care, and customized as needed
> >> WARNING 2: Valid for impi and perhaps other, but not all variants
> >> WARNING 3: Please look at what these options mean...
> >>
> >> My parallel_options file with NU's supercomputer, which contains
> >> various debug and other options (some recommended by Intel, some by
> >> the local sys_admin):
> >>
> >> setenv USE_REMOTE 1
> >> setenv MPI_REMOTE 0
> >> setenv WIEN_GRANULARITY 1
> >> setenv DAPL_DBG_TYPE 0
> >> # Normal
> >> #setenv WIEN_MPIRUN "mpirun -n _NP_ -machinefile _HOSTS_ _EXEC_ "
> >>
> >> # To turn on verbose
> >> #setenv WIEN_MPIRUN "mpirun -bootstrap-exec ~/bin/hssh -n _NP_
> >> -machinefile _HOSTS_ _EXEC_ "
> >>
> >> # To use more recent, privately compiled ssh
> >> #setenv WIEN_MPIRUN "mpirun -bootstrap-exec $HOME/local/bin/ssh -n
> >> _NP_ -machinefile _HOSTS_ _EXEC_ "
> >>
> >> # To use openmpi to launch
> >> setenv WIEN_MPIRUN "mpirun -bootstrap-exec $WIENROOT/hopen -n _NP_
> >> -machinefile _HOSTS_ _EXEC_ "
> >>
> >> set sleepy = 0.2
> >> set delay = 0.1
> >> unset DAPL_DBG
> >> #Turn on Hydra debug on Quest
> >> #setenv I_MPI_HYDRA_DEBUG 1
> >> #Turn on MPI DEBUG
> >> #setenv I_MPI_DEBUG 1
> >> #setenv I_MPI_DEBUG_OUTPUT mpi_debug%h_%r
> >> setenv I_MPI_FABRICS_LIST dapl,tcp
> >> setenv I_MPI_FALLBACK enable
> >>
> >>
> >>
> >>
> >> On Sat, Jul 27, 2013 at 2:53 PM, Luis Ogando <lcodacal at gmail.com>
> wrote:
> >> > Dear Prof. Marks,
> >> >
> >> >    Could you, please, send me a template for the parallel_options file
> >> > where
> >> > this implementation was done ?
> >> >    I am sorry for that, but I am really far from being an expert.
> >> >    All the best,
> >> >                     Luis
> >> >
> >> >
> >> > 2013/7/22 Laurence Marks <L-marks at northwestern.edu>
> >> >>
> >> >> A brief followup which may be useful (or not) for others in the
> future
> >> >> with mpi problems. I have been able to work around a mysterious
> >> >> impi/ssh bug on NU's supercomputer by replacing ssh by the
> >> >> openmpi/mpirun launcher. The hack is gross, but very stable.
> >> >>
> >> >> Step 1:
> >> >> 1) Add "--bootstrap-exec=$WIENROOT/hopen" to
> >> >> $WIENROOT/parallel_options.
> >> >> 2) Create the executable file $WIENROOT/hopen containing
> >> >> #!/bin/bash
> >> >> a=`echo $@ | sed -e 's/-x -q//'`
> >> >> $OPENMPI/bin/mpirun -np 1 --host $a
> >> >>
> >> >> (change $OPENMPI to where it has been compiled).
> >> >>
> >> >> On Thu, Jul 18, 2013 at 10:38 AM, Laurence Marks
> >> >> <L-marks at northwestern.edu> wrote:
> >> >> > On a cluster I am using I am having a problem with ssh connections
> as
> >> >> > part of impi/mpirun about 0.1-0.2% of the time; what happens is
> that
> >> >> > they fail to launch and become zombie's (ps shows "[ssh]
> <defunct>").
> >> >> > Since fiddling through all the options within mpirun can be hard
> >> >> > (particularly for impi which is rather fast), I found (after a
> >> >> > comment
> >> >> > from someone on the openssh list) a useful hack. I am providing it
> >> >> > here as it is a nice way around things, and might be useful to
> others
> >> >> > in the future.
> >> >> >
> >> >> > The "trick" is to add --bootstrap-exec ~/bin/hssh or similar to the
> >> >> > mpirun line in $WIENROOT/parallel_options, then create the
> executable
> >> >> > ~/bin/hssh with something similar to:
> >> >> >
> >> >> > #!/bin/bash
> >> >> > a=`echo $@ | sed -e 's/-q/-v/'`
> >> >> > ssh $a
> >> >> >
> >> >> >
> >> >> > The above allows me to turn verbose output on in the ssh command
> >> >> > since
> >> >> > impi insists on setting -q (quiet). For other cases something
> similar
> >> >> > can be done.
> >> >> >
> >> >> > --
> >> >> > Professor Laurence Marks
> >> >> > Department of Materials Science and Engineering
> >> >> > Northwestern University
> >> >> > www.numis.northwestern.edu 1-847-491-3996
> >> >> > "Research is to see what everybody else has seen, and to think what
> >> >> > nobody else has thought"
> >> >> > Albert Szent-Gyorgi
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Professor Laurence Marks
> >> >> Department of Materials Science and Engineering
> >> >> Northwestern University
> >> >> www.numis.northwestern.edu 1-847-491-3996
> >> >> "Research is to see what everybody else has seen, and to think what
> >> >> nobody else has thought"
> >> >> Albert Szent-Gyorgi
> >> >> _______________________________________________
> >> >> Wien mailing list
> >> >> Wien at zeus.theochem.tuwien.ac.at
> >> >> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> >> >> SEARCH the MAILING-LIST at:
> >> >>
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Professor Laurence Marks
> >> Department of Materials Science and Engineering
> >> Northwestern University
> >> www.numis.northwestern.edu 1-847-491-3996
> >> "Research is to see what everybody else has seen, and to think what
> >> nobody else has thought"
> >> Albert Szent-Gyorgi
> >> _______________________________________________
> >> Wien mailing list
> >> Wien at zeus.theochem.tuwien.ac.at
> >> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> >> SEARCH the MAILING-LIST at:
> >> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
> >
> >
>
>
>
> --
> Professor Laurence Marks
> Department of Materials Science and Engineering
> Northwestern University
> www.numis.northwestern.edu 1-847-491-3996
> "Research is to see what everybody else has seen, and to think what
> nobody else has thought"
> Albert Szent-Gyorgi
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20130805/7e6f8c78/attachment.htm>


More information about the Wien mailing list