[Wien] A trick for mpi debugging

Laurence Marks L-marks at northwestern.edu
Sat Aug 3 06:16:51 CEST 2013


I am not sure if I can give you the right answer; My guess is to have
it as 1, but I do not know all the details of your system and if I
remember right you have an sgi system. Try both, then let us/me know
what works (or does not).

For reference, I have it working fine with USE_REMOTE 1, and I don't
currently want to change to test (particularly as I am on travel).

On Fri, Aug 2, 2013 at 8:36 AM, Luis Ogando <lcodacal at gmail.com> wrote:
> Dear Prof. Marks,
>
>    Just a quick question : in case that the openmpi launcher replaces ssh,
> should I change USE_REMOTE to 0 in a cluster ?
>    Thank you one more time,
>                 Luis
>
>
>
> 2013/7/27 Laurence Marks <L-marks at northwestern.edu>
>>
>> WARNING 1: To be used with care, and customized as needed
>> WARNING 2: Valid for impi and perhaps other, but not all variants
>> WARNING 3: Please look at what these options mean...
>>
>> My parallel_options file with NU's supercomputer, which contains
>> various debug and other options (some recommended by Intel, some by
>> the local sys_admin):
>>
>> setenv USE_REMOTE 1
>> setenv MPI_REMOTE 0
>> setenv WIEN_GRANULARITY 1
>> setenv DAPL_DBG_TYPE 0
>> # Normal
>> #setenv WIEN_MPIRUN "mpirun -n _NP_ -machinefile _HOSTS_ _EXEC_ "
>>
>> # To turn on verbose
>> #setenv WIEN_MPIRUN "mpirun -bootstrap-exec ~/bin/hssh -n _NP_
>> -machinefile _HOSTS_ _EXEC_ "
>>
>> # To use more recent, privately compiled ssh
>> #setenv WIEN_MPIRUN "mpirun -bootstrap-exec $HOME/local/bin/ssh -n
>> _NP_ -machinefile _HOSTS_ _EXEC_ "
>>
>> # To use openmpi to launch
>> setenv WIEN_MPIRUN "mpirun -bootstrap-exec $WIENROOT/hopen -n _NP_
>> -machinefile _HOSTS_ _EXEC_ "
>>
>> set sleepy = 0.2
>> set delay = 0.1
>> unset DAPL_DBG
>> #Turn on Hydra debug on Quest
>> #setenv I_MPI_HYDRA_DEBUG 1
>> #Turn on MPI DEBUG
>> #setenv I_MPI_DEBUG 1
>> #setenv I_MPI_DEBUG_OUTPUT mpi_debug%h_%r
>> setenv I_MPI_FABRICS_LIST dapl,tcp
>> setenv I_MPI_FALLBACK enable
>>
>>
>>
>>
>> On Sat, Jul 27, 2013 at 2:53 PM, Luis Ogando <lcodacal at gmail.com> wrote:
>> > Dear Prof. Marks,
>> >
>> >    Could you, please, send me a template for the parallel_options file
>> > where
>> > this implementation was done ?
>> >    I am sorry for that, but I am really far from being an expert.
>> >    All the best,
>> >                     Luis
>> >
>> >
>> > 2013/7/22 Laurence Marks <L-marks at northwestern.edu>
>> >>
>> >> A brief followup which may be useful (or not) for others in the future
>> >> with mpi problems. I have been able to work around a mysterious
>> >> impi/ssh bug on NU's supercomputer by replacing ssh by the
>> >> openmpi/mpirun launcher. The hack is gross, but very stable.
>> >>
>> >> Step 1:
>> >> 1) Add "--bootstrap-exec=$WIENROOT/hopen" to
>> >> $WIENROOT/parallel_options.
>> >> 2) Create the executable file $WIENROOT/hopen containing
>> >> #!/bin/bash
>> >> a=`echo $@ | sed -e 's/-x -q//'`
>> >> $OPENMPI/bin/mpirun -np 1 --host $a
>> >>
>> >> (change $OPENMPI to where it has been compiled).
>> >>
>> >> On Thu, Jul 18, 2013 at 10:38 AM, Laurence Marks
>> >> <L-marks at northwestern.edu> wrote:
>> >> > On a cluster I am using I am having a problem with ssh connections as
>> >> > part of impi/mpirun about 0.1-0.2% of the time; what happens is that
>> >> > they fail to launch and become zombie's (ps shows "[ssh] <defunct>").
>> >> > Since fiddling through all the options within mpirun can be hard
>> >> > (particularly for impi which is rather fast), I found (after a
>> >> > comment
>> >> > from someone on the openssh list) a useful hack. I am providing it
>> >> > here as it is a nice way around things, and might be useful to others
>> >> > in the future.
>> >> >
>> >> > The "trick" is to add --bootstrap-exec ~/bin/hssh or similar to the
>> >> > mpirun line in $WIENROOT/parallel_options, then create the executable
>> >> > ~/bin/hssh with something similar to:
>> >> >
>> >> > #!/bin/bash
>> >> > a=`echo $@ | sed -e 's/-q/-v/'`
>> >> > ssh $a
>> >> >
>> >> >
>> >> > The above allows me to turn verbose output on in the ssh command
>> >> > since
>> >> > impi insists on setting -q (quiet). For other cases something similar
>> >> > can be done.
>> >> >
>> >> > --
>> >> > Professor Laurence Marks
>> >> > Department of Materials Science and Engineering
>> >> > Northwestern University
>> >> > www.numis.northwestern.edu 1-847-491-3996
>> >> > "Research is to see what everybody else has seen, and to think what
>> >> > nobody else has thought"
>> >> > Albert Szent-Gyorgi
>> >>
>> >>
>> >>
>> >> --
>> >> Professor Laurence Marks
>> >> Department of Materials Science and Engineering
>> >> Northwestern University
>> >> www.numis.northwestern.edu 1-847-491-3996
>> >> "Research is to see what everybody else has seen, and to think what
>> >> nobody else has thought"
>> >> Albert Szent-Gyorgi
>> >> _______________________________________________
>> >> Wien mailing list
>> >> Wien at zeus.theochem.tuwien.ac.at
>> >> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>> >> SEARCH the MAILING-LIST at:
>> >> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>> >
>> >
>>
>>
>>
>> --
>> Professor Laurence Marks
>> Department of Materials Science and Engineering
>> Northwestern University
>> www.numis.northwestern.edu 1-847-491-3996
>> "Research is to see what everybody else has seen, and to think what
>> nobody else has thought"
>> Albert Szent-Gyorgi
>> _______________________________________________
>> Wien mailing list
>> Wien at zeus.theochem.tuwien.ac.at
>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>> SEARCH the MAILING-LIST at:
>> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>
>



-- 
Professor Laurence Marks
Department of Materials Science and Engineering
Northwestern University
www.numis.northwestern.edu 1-847-491-3996
"Research is to see what everybody else has seen, and to think what
nobody else has thought"
Albert Szent-Gyorgi


More information about the Wien mailing list