[Wien] A trick for mpi debugging

Luis Ogando lcodacal at gmail.com
Wed Aug 21 20:32:26 CEST 2013


Dear Prof. Marks,

   First of all, thank you very much for your help !
   Unfortunately, your suggestions did not work in my SGI system. Despite
of this, I have now WIEN2k working in parallel even when more than one node
is used. My solution where to install OpenMPI with ifort and icc in the SGI
machine and use them to compile and run WIEN2k.
   We saw that mpiexec-mpt does not allow the use of a "machinefile" built
by the user (at least, this can not be done by a beginner like me). As the
Intel MPI is installed by the vendor (SGI team), I believe that it is
somehow configured in a similar way. As a result, when I tried the
compilation and execution with Intel MPI, I got some error messages
complaining about the -machinefile option. When I tried your suggestion of
compiling with Intel MPI but using the hopen file to launch the job with
OpenMPI, the error messages complained about the -bootstrap-exec option.
   Well, it looks like that the best option is to use compilers and MPI
softwares not optimized for an specific system by others.
   Thank you again !
   All the best,
                   Luis
PS: in the parallel_options file, I had to set the complete path for the
OpenMPI mpirun, despite of defining it in my .bashrc


2013/8/3 Laurence Marks <L-marks at northwestern.edu>

> I am not sure if I can give you the right answer; My guess is to have
> it as 1, but I do not know all the details of your system and if I
> remember right you have an sgi system. Try both, then let us/me know
> what works (or does not).
>
> For reference, I have it working fine with USE_REMOTE 1, and I don't
> currently want to change to test (particularly as I am on travel).
>
> On Fri, Aug 2, 2013 at 8:36 AM, Luis Ogando <lcodacal at gmail.com> wrote:
> > Dear Prof. Marks,
> >
> >    Just a quick question : in case that the openmpi launcher replaces
> ssh,
> > should I change USE_REMOTE to 0 in a cluster ?
> >    Thank you one more time,
> >                 Luis
> >
> >
> >
> > 2013/7/27 Laurence Marks <L-marks at northwestern.edu>
> >>
> >> WARNING 1: To be used with care, and customized as needed
> >> WARNING 2: Valid for impi and perhaps other, but not all variants
> >> WARNING 3: Please look at what these options mean...
> >>
> >> My parallel_options file with NU's supercomputer, which contains
> >> various debug and other options (some recommended by Intel, some by
> >> the local sys_admin):
> >>
> >> setenv USE_REMOTE 1
> >> setenv MPI_REMOTE 0
> >> setenv WIEN_GRANULARITY 1
> >> setenv DAPL_DBG_TYPE 0
> >> # Normal
> >> #setenv WIEN_MPIRUN "mpirun -n _NP_ -machinefile _HOSTS_ _EXEC_ "
> >>
> >> # To turn on verbose
> >> #setenv WIEN_MPIRUN "mpirun -bootstrap-exec ~/bin/hssh -n _NP_
> >> -machinefile _HOSTS_ _EXEC_ "
> >>
> >> # To use more recent, privately compiled ssh
> >> #setenv WIEN_MPIRUN "mpirun -bootstrap-exec $HOME/local/bin/ssh -n
> >> _NP_ -machinefile _HOSTS_ _EXEC_ "
> >>
> >> # To use openmpi to launch
> >> setenv WIEN_MPIRUN "mpirun -bootstrap-exec $WIENROOT/hopen -n _NP_
> >> -machinefile _HOSTS_ _EXEC_ "
> >>
> >> set sleepy = 0.2
> >> set delay = 0.1
> >> unset DAPL_DBG
> >> #Turn on Hydra debug on Quest
> >> #setenv I_MPI_HYDRA_DEBUG 1
> >> #Turn on MPI DEBUG
> >> #setenv I_MPI_DEBUG 1
> >> #setenv I_MPI_DEBUG_OUTPUT mpi_debug%h_%r
> >> setenv I_MPI_FABRICS_LIST dapl,tcp
> >> setenv I_MPI_FALLBACK enable
> >>
> >>
> >>
> >>
> >> On Sat, Jul 27, 2013 at 2:53 PM, Luis Ogando <lcodacal at gmail.com>
> wrote:
> >> > Dear Prof. Marks,
> >> >
> >> >    Could you, please, send me a template for the parallel_options file
> >> > where
> >> > this implementation was done ?
> >> >    I am sorry for that, but I am really far from being an expert.
> >> >    All the best,
> >> >                     Luis
> >> >
> >> >
> >> > 2013/7/22 Laurence Marks <L-marks at northwestern.edu>
> >> >>
> >> >> A brief followup which may be useful (or not) for others in the
> future
> >> >> with mpi problems. I have been able to work around a mysterious
> >> >> impi/ssh bug on NU's supercomputer by replacing ssh by the
> >> >> openmpi/mpirun launcher. The hack is gross, but very stable.
> >> >>
> >> >> Step 1:
> >> >> 1) Add "--bootstrap-exec=$WIENROOT/hopen" to
> >> >> $WIENROOT/parallel_options.
> >> >> 2) Create the executable file $WIENROOT/hopen containing
> >> >> #!/bin/bash
> >> >> a=`echo $@ | sed -e 's/-x -q//'`
> >> >> $OPENMPI/bin/mpirun -np 1 --host $a
> >> >>
> >> >> (change $OPENMPI to where it has been compiled).
> >> >>
> >> >> On Thu, Jul 18, 2013 at 10:38 AM, Laurence Marks
> >> >> <L-marks at northwestern.edu> wrote:
> >> >> > On a cluster I am using I am having a problem with ssh connections
> as
> >> >> > part of impi/mpirun about 0.1-0.2% of the time; what happens is
> that
> >> >> > they fail to launch and become zombie's (ps shows "[ssh]
> <defunct>").
> >> >> > Since fiddling through all the options within mpirun can be hard
> >> >> > (particularly for impi which is rather fast), I found (after a
> >> >> > comment
> >> >> > from someone on the openssh list) a useful hack. I am providing it
> >> >> > here as it is a nice way around things, and might be useful to
> others
> >> >> > in the future.
> >> >> >
> >> >> > The "trick" is to add --bootstrap-exec ~/bin/hssh or similar to the
> >> >> > mpirun line in $WIENROOT/parallel_options, then create the
> executable
> >> >> > ~/bin/hssh with something similar to:
> >> >> >
> >> >> > #!/bin/bash
> >> >> > a=`echo $@ | sed -e 's/-q/-v/'`
> >> >> > ssh $a
> >> >> >
> >> >> >
> >> >> > The above allows me to turn verbose output on in the ssh command
> >> >> > since
> >> >> > impi insists on setting -q (quiet). For other cases something
> similar
> >> >> > can be done.
> >> >> >
> >> >> > --
> >> >> > Professor Laurence Marks
> >> >> > Department of Materials Science and Engineering
> >> >> > Northwestern University
> >> >> > www.numis.northwestern.edu 1-847-491-3996
> >> >> > "Research is to see what everybody else has seen, and to think what
> >> >> > nobody else has thought"
> >> >> > Albert Szent-Gyorgi
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Professor Laurence Marks
> >> >> Department of Materials Science and Engineering
> >> >> Northwestern University
> >> >> www.numis.northwestern.edu 1-847-491-3996
> >> >> "Research is to see what everybody else has seen, and to think what
> >> >> nobody else has thought"
> >> >> Albert Szent-Gyorgi
> >> >> _______________________________________________
> >> >> Wien mailing list
> >> >> Wien at zeus.theochem.tuwien.ac.at
> >> >> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> >> >> SEARCH the MAILING-LIST at:
> >> >>
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Professor Laurence Marks
> >> Department of Materials Science and Engineering
> >> Northwestern University
> >> www.numis.northwestern.edu 1-847-491-3996
> >> "Research is to see what everybody else has seen, and to think what
> >> nobody else has thought"
> >> Albert Szent-Gyorgi
> >> _______________________________________________
> >> Wien mailing list
> >> Wien at zeus.theochem.tuwien.ac.at
> >> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> >> SEARCH the MAILING-LIST at:
> >> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
> >
> >
>
>
>
> --
> Professor Laurence Marks
> Department of Materials Science and Engineering
> Northwestern University
> www.numis.northwestern.edu 1-847-491-3996
> "Research is to see what everybody else has seen, and to think what
> nobody else has thought"
> Albert Szent-Gyorgi
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20130821/59a83465/attachment.htm>


More information about the Wien mailing list