[Wien] One problem at a time

Laurence Marks L-marks at northwestern.edu
Mon Jul 15 20:50:42 CEST 2013


Alas, in fact you have fallen in to the trap that I was warning about
with trusting sys_admins and how they expect jobs to run. If you leave
things with the $PBS_NODEFILE then you will only ever be able to run a
single mpi task at a time, using all the nodes you have available. You
will not be able to run mixed k-point/mpi parallelization where some
set of nodes (e.g. 1-16) work on k-point 1, others (e.g. 17-32) on the
next etc.

If you do try and run mixed k-point/mpi then you will find that you
are starting multiple mpi tasks all using the same cores and there
will be chaos.

If all you want to do is run single k-point large calculations, then
you do not need to do anything more. However, if you want to do
general calculations you have to find how to redefine where
mpiexec_mpt looks or use another script such as mpirun. Believe me
please, and don't just believe your sys_admin people!

On Mon, Jul 15, 2013 at 12:51 PM, Luis Ogando <lcodacal at gmail.com> wrote:
> Dear Prof. Blaha, Marks, Rubel and Abo,
>
>    First of all, I would like to thank your attention concerning my
> mpiexec_mpt problem. It is now solved. The hint was in the documentation
> sent by Prof. Marks and Abo (
> http://techpubs.sgi.com/library/tpl/cgi-bin/getdoc.cgi?coll=linux&db=man&fname=/usr/share/catman/man1/mpiexec_mpt.1.html
> ). At its end it is written :
>
> " The mpiexec_mpt  command reads the node list from the $PBS_NODEFILE file.
> "
>
> what means that the -machinefile option must be omitted in the " setenv
> WIEN_MPIRUN " line of parallel_options file when one is using mpiexec_mpt.
>
>    I would like to ask another question: is it dangerous to use " extrafine
> " and " -it " simultaneously in a parallel calculation ?
>    I have some indications (using the SGI cluster and a DELL workstation)
> that:
>
> 1) " extrafine " WITHOUT " -it " is fine
> 2) " -it " WITHOUT " extrafine " is fine
> 3) " extrafine " WITH " -it " does not succeed, giving rise to the following
> error message in the SGI cluster (scratch is the working directory) :
>
> forrtl: severe (41): insufficient virtual memory
> Image              PC                Routine            Line        Source
> lapw1c             000000000052E04A  Unknown               Unknown  Unknown
> lapw1c             000000000052CB46  Unknown               Unknown  Unknown
> lapw1c             00000000004D6B50  Unknown               Unknown  Unknown
> lapw1c             00000000004895CF  Unknown               Unknown  Unknown
> lapw1c             00000000004BA106  Unknown               Unknown  Unknown
> lapw1c             0000000000478D9A  jacdavblock_              240
> jacdavblock_tmp_.F
> lapw1c             0000000000470690  seclr5_                   277
> seclr5_tmp_.F
> lapw1c             000000000040FA16  calkpt_                   241
> calkpt_tmp_.F
> lapw1c             0000000000449EB3  MAIN__                     61
> lapw1_tmp_.F
> lapw1c             000000000040515C  Unknown               Unknown  Unknown
> libc.so.6          00002ABBA7930BC6  Unknown               Unknown  Unknown
> lapw1c             0000000000405059  Unknown               Unknown  Unknown
>
>
> only for some processors, not all of them. This is a little bit strange,
> remembering that all the nodes are equal in the cluster. May this have
> relation with the (number of k-points)/(number of processors) ratio ?
>
>    Well, many thanks again.
>    All the best,
>                     Luis
>
>
>
>
>
>
> 2013/7/11 Laurence Marks <L-marks at northwestern.edu>
>>
>> I 99,9% agree with what Peter just said.
>>
>> According to the man page at
>>
>> http://techpubs.sgi.com/library/tpl/cgi-bin/getdoc.cgi?coll=linux&db=man&fname=/usr/share/catman/man1/mpiexec_mpt.1.html
>> (which may be wrong for you), the same global options as mpirun
>> accepts will work. Therefore just use " mpirun --help" and look for
>> whatever is the option for file mapping procs to machines on your
>> system, then change WIEN_MPIRUN in parallel_options.
>>
>> A word of other advice concerning talking to the sys_admins at your
>> center. I have found without exception that they expect people to
>> launch just one mpi task which runs for hours to days. All the
>> schedulers that I have come across expect this. Wien2k is much smarter
>> than this, and can exploit the cores much better. Therefore you will
>> have to "filter" (i.e. in some cases ignore) what you are told if it
>> is not appropriate. Sometimes this takes more time than anything else!
>>
>> On Thu, Jul 11, 2013 at 9:41 AM, Peter Blaha
>> <pblaha at theochem.tuwien.ac.at> wrote:
>> > But I'm afraid, only YOU have access to the specific doku of your
>> > system.
>> >
>> > As was mentioned before, I would   ALWAYS recommend to use   mpirun,
>> > which should be a "standardized wrapper" to the specific mpi-scheduler.
>> >
>> > Only, when you mpi does not have mpirun, use the more specific calls.
>> >
>> > For your case it seems "trivial:
>> >
>> >  >         *mpiexec_mpt error: -machinefile option not supported.*
>> >
>> > the option   -machinefile     does not exist for mpiexec_mpt
>> >
>> > sometimes it is called    -hostfile
>> >
>> > but you should easily find it out by
>> >
>> > man mpiexec_mpt
>> >
>> > or    mpiexec_mpt --help
>> >
>> >
>> > On 07/11/2013 04:30 PM, Luis Ogando wrote:
>> >> Dear Oleg Rubel,
>> >>
>> >>     I agree with you ! This is the reason I asked for hints from
>> >> someone
>> >> that uses WIEN with mpiexec_mpt (to save efforts and time).
>> >>     Thank you again !
>> >>     All the best,
>> >>                   Luis
>> >>
>> >>
>> >>
>> >> 2013/7/11 Oleg Rubel <orubel at lakeheadu.ca <mailto:orubel at lakeheadu.ca>>
>> >>
>> >>     Dear Luis,
>> >>
>> >>     It looks like the problem is not in Wien2k. I would recommend to
>> >>     make sure that you can get a list of host names correctly before
>> >>     proceeding with wien. There are slight difference between various
>> >>     mpi implementation in a way of passing the host name list.
>> >>
>> >>     Oleg
>> >>
>> >>     On 2013-07-11 9:52 AM, "Luis Ogando" <lcodacal at gmail.com
>> >>     <mailto:lcodacal at gmail.com>> wrote:
>> >>
>> >>         Dear Prof. Marks and Rubel,
>> >>
>> >>             Many thanks for your kind responses.
>> >>             I am forwarding your messages to the computation center. As
>> >>         soon as I have any reply, I will contact you.
>> >>
>> >>             I know that they have other wrappers (Intel MPI, for
>> >>         example), but they argue that mpiexec_mpt is the optimized
>> >> option.
>> >>             I really doubt that this option will succeed, because I am
>> >>         getting the following error message in case.dayfile (bold)
>> >>
>> >>
>> >> ================================================================================
>> >>         Calculating InPwurt15InPzb3 in
>> >>
>> >> /home/ice/proj/proj546/ogando/Wien/Calculos/InP/InPwurtInPzb/15camadasWZ+3ZB/InPwurt15InPzb3
>> >>         on r1i0n8 with PID 6433
>> >>         using WIEN2k_12.1 (Release 22/7/2012) in
>> >>         /home/ice/proj/proj546/ogando/RICARDO2/wien/src
>> >>
>> >>
>> >>              start (Wed Jul 10 13:29:42 BRT 2013) with lapw0 (150/99 to
>> >> go)
>> >>
>> >>              cycle 1 (Wed Jul 10 13:29:42 BRT 2013) (150/99 to go)
>> >>
>> >>          >   lapw0 -grr -p(13:29:42) starting parallel lapw0 at Wed Jul
>> >>         10 13:29:42 BRT 2013
>> >>         -------- .machine0 : 12 processors
>> >>         *mpiexec_mpt error: -machinefile option not supported.*
>> >>         0.016u 0.008s 0:00.40 2.5%0+0k 0+176io 0pf+0w
>> >>         error: command
>> >>         /home/ice/proj/proj546/ogando/RICARDO2/wien/src/lapw0para -c
>> >>         lapw0.def   failed
>> >>
>> >>          >   stop error
>> >>
>> >> ================================================================================
>> >>
>> >>             Related to -sgi option, I am using -pbs option because PBS
>> >>         is the queueing system. As I said, I works well for parallel
>> >>         execution that uses just one node.
>> >>             Many thanks again,
>> >>                           Luis
>> >>
>> >>
>> >>
>> >>         2013/7/11 Oleg Rubel <orubel at lakeheadu.ca
>> >>         <mailto:orubel at lakeheadu.ca>>
>> >>
>> >>             Dear Luis,
>> >>
>> >>             Can you run other MPI codes under SGI scheduler on your
>> >>             cluster? In any case, I would suggest first to try the
>> >>             simplest check
>> >>
>> >>             mpiexec -n $NSLOTS hostname
>> >>
>> >>             this is what we use for Wien2k
>> >>
>> >>             mpiexec -machinefile _HOSTS_ -n _NP_ _EXEC_
>> >>
>> >>             the next line is also useful to ensure a proper CPU load
>> >>
>> >>             setenv MV2_ENABLE_AFFINITY 0
>> >>
>> >>
>> >>             I hope this will help
>> >>             Oleg
>> >>
>> >>
>> >>             On 13-07-11 8:32 AM, Luis Ogando wrote:
>> >>
>> >>                 Dear WIEN2k community,
>> >>
>> >>                      I am trying to use WIEN2k 12.1 in a SGI cluster.
>> >>                 When I perform
>> >>                 parallel calculations using  just "one" node, I can use
>> >>                 mpirun and
>> >>                 everything goes fine (many thanks to Prof. Marks and
>> >> his
>> >>                 SRC_mpiutil
>> >>                 directory).
>> >>                      On the other hand, when I want to use more than
>> >> one
>> >>                 node, I have to
>> >>                 use mpiexec_mpt and the calculation fails. I also tried
>> >>                 the mpirun for
>> >>                 more than one node, but this is not the proper way in a
>> >>                 SGI system and I
>> >>                 did not succeed.
>> >>                      Well, I would like to know if anyone has
>> >> experience
>> >>                 in using WIEN2k
>> >>                 with mpiexec_mpt and could give me any hint.
>> >>                       I can give more information. This is only an
>> >>                 initial ask for help.
>> >>                      All the best,
>> >>                                         Luis
>> >>
>> >>
>> >>
>> >>                 _________________________________________________
>> >>                 Wien mailing list
>> >>                 Wien at zeus.theochem.tuwien.ac.__at
>> >>                 <mailto:Wien at zeus.theochem.tuwien.ac.at>
>> >>
>> >> http://zeus.theochem.tuwien.__ac.at/mailman/listinfo/wien
>> >> <http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien>
>> >>                 SEARCH the MAILING-LIST at:
>> >>
>> >> http://www.mail-archive.com/__wien@zeus.theochem.tuwien.ac.__at/index.html
>> >>
>> >> <http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html>
>> >>
>> >>             _________________________________________________
>> >>             Wien mailing list
>> >>             Wien at zeus.theochem.tuwien.ac.__at
>> >>             <mailto:Wien at zeus.theochem.tuwien.ac.at>
>> >>             http://zeus.theochem.tuwien.__ac.at/mailman/listinfo/wien
>> >>             <http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien>
>> >>             SEARCH the MAILING-LIST at:
>> >>
>> >> http://www.mail-archive.com/__wien@zeus.theochem.tuwien.ac.__at/index.html
>> >>
>> >> <http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html>
>> >>
>> >>
>> >>
>> >>         _______________________________________________
>> >>         Wien mailing list
>> >>         Wien at zeus.theochem.tuwien.ac.at
>> >>         <mailto:Wien at zeus.theochem.tuwien.ac.at>
>> >>         http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>> >>         SEARCH the MAILING-LIST at:
>> >>
>> >> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>> >>
>> >>
>> >>     _______________________________________________
>> >>     Wien mailing list
>> >>     Wien at zeus.theochem.tuwien.ac.at
>> >> <mailto:Wien at zeus.theochem.tuwien.ac.at>
>> >>     http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>> >>     SEARCH the MAILING-LIST at:
>> >>
>> >> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>> >>
>> >>
>> >>
>> >>
>> >> _______________________________________________
>> >> Wien mailing list
>> >> Wien at zeus.theochem.tuwien.ac.at
>> >> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>> >> SEARCH the MAILING-LIST at:
>> >> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>> >>
>> >
>> > --
>> >
>> >                                        P.Blaha
>> >
>> > --------------------------------------------------------------------------
>> > Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
>> > Phone: +43-1-58801-165300             FAX: +43-1-58801-165982
>> > Email: blaha at theochem.tuwien.ac.at    WWW:
>> > http://info.tuwien.ac.at/theochem/
>> >
>> > --------------------------------------------------------------------------
>> > _______________________________________________
>> > Wien mailing list
>> > Wien at zeus.theochem.tuwien.ac.at
>> > http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>> > SEARCH the MAILING-LIST at:
>> > http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>>
>>
>>
>> --
>> Professor Laurence Marks
>> Department of Materials Science and Engineering
>> Northwestern University
>> www.numis.northwestern.edu 1-847-491-3996
>> "Research is to see what everybody else has seen, and to think what
>> nobody else has thought"
>> Albert Szent-Gyorgi
>> _______________________________________________
>> Wien mailing list
>> Wien at zeus.theochem.tuwien.ac.at
>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>> SEARCH the MAILING-LIST at:
>> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>
>



-- 
Professor Laurence Marks
Department of Materials Science and Engineering
Northwestern University
www.numis.northwestern.edu 1-847-491-3996
"Research is to see what everybody else has seen, and to think what
nobody else has thought"
Albert Szent-Gyorgi


More information about the Wien mailing list