[Wien] Benchmark

Fri Nov 18 13:57:56 CET 2005

OK great.  For the sake of others: just add it in .bashrc (or add setenv 
OMP_NUM_THREADS 2  in .cshrc, if you use tcsh); and logout / login for the 
new settings to take effect.

Lapw0 does not use MKL/GOTO as far as I know, so OMP_NUM_THREADS should 
not have an effect. 

I have not tired using parallel k-points and OMP_NUM_THREADS 
simultaneously. This will probably slow down the calculation relative to 
only using parallel k-points.

Regards

Enrico

--
Dr E B Lombardi
Physics Department
University of South Africa
P.O. Box 392
0003 UNISA
South Africa
Tel: +27 (0)12 429-8027
Fax: +27 (0)12 429-3643
e-mail: lombaeb at science.unisa.ac.za

On Fri, 18 Nov 2005, Michael Gurnett wrote:

> Thank you for the answer. Just a few questions. Does this require a
> recompile of the code? or enough to just add it in the .bashrc. Is there
> any speed up in lapw0, and finally have you noticed any problem when using
> this but spreading several k points over cpus (would be nice if it worked
> for lapw0 without causing problems in lapw1)
> 
> Michael
> 
> -----Original Message-----
> From: lombaeb at science.unisa.ac.za
> To: A Mailing list for WIEN2k users <wien at zeus.theochem.tuwien.ac.at>
> Date: Fri, 18 Nov 2005 11:45:53 +0200 (SAST)
> Subject: Re: [Wien] Benchmark
> 
> > Using   export OMP_NUM_THREADS=2  in .bashrc should speed up 
> > the calculations.
> > 
> > For a 32 bit dual processor 3.0 GHz Xeon machine I get:
> > 1 kpoint,  1 processor:        247s (4:08)
> > 2 kpoints, 2 processors:       261s (4:22)
> > 1 kpoint,  OMP_NUM_THREADS=2:  275s (3:09 = 189)
> > 
> > Note in the last line that the 'CPU seconds' does not equal the 'Real 
> > time', since the former takes into account that two CPUs are used. The 
> > difference of 1s in the first two lines is due to CPU usage never 
> > being exactly 100.0%.
> > 
> > So using OMP....=2, speeds up the calculation by about 50%, which 
> > unfortunately implies only 50% efficiency for the 2nd CPU.
> > 
> > Note that both MKL and GOTO use the OMP_NUM_THREADS environment 
> > variable.
> > 
> > Regards
> > 
> > Enrico
> > 
> > --
> > Dr E B Lombardi
> > Physics Department
> > University of South Africa
> > P.O. Box 392
> > 0003 UNISA
> > South Africa
> > Tel: +27 (0)12 429-8027
> > Fax: +27 (0)12 429-3643
> > e-mail: lombaeb at science.unisa.ac.za
> > 
> > 
> > On Thu, 17 Nov 2005, Michael Gurnett wrote:
> > 
> > > Yes. using a both processors by spreading k points over them
> > basically 
> > > doubles the speed (and this is also the more typical setup we use).
> > What I 
> > > really want to do is just get an idea how fast this system is on the
> > test 
> > > case, which means most likely mpi, which I believe is just not that
> > good. It 
> > > would be nice if mkl would use both processors. I will be setting up
> > mpi for 
> > > lapw0 (tried long ago but never got it working). So if someone has a 
> > > "Getting intel mpi to work with ifort for dummies" book I would be
> > most 
> > > greatful.
> > > 
> > > Michael
> > > ----- Original Message ----- 
> > > From: <lombaeb at science.unisa.ac.za>
> > > To: "A Mailing list for WIEN2k users"
> > <wien at zeus.theochem.tuwien.ac.at>
> > > Sent: Wednesday, November 16, 2005 7:13 PM
> > > Subject: Re: [Wien] Benchmark
> > > 
> > > 
> > > > Dear Wien users
> > > >
> > > > On an x86_64 machine with a 3.2 GHz (P4-640) CPU I got the
> > following
> > > > benchmark times, using ifort 9.0 and mkl 8.0 (OPTIONS used are
> > given at
> > > > the end of the e-mail) on a system with an Intel motherboard (945G
> > > > chipset) and DDR-II 533MHz RAM (dual channel configuration).
> > > >
> > > > HT disabled:  163s
> > > > HT enabled:  176s
> > > > HT enabled, with OMP_NUM_THREADS = 2:  194s
> > > > HT enabled, 2 k-points in parallel: 386 s  (DIV 2 = 193s) (x lapw1
> > -c
> > > > -p)
> > > >
> > > >
> > > > To Michael:
> > > > It may be possible to speed up the .throughput. time on a dual core
> > > > machine by using .normal. MKL and running 2 k-points simultaneously
> > > > (using .machines).  The time for 1 k-point may be slower, but the
> > time
> > > > for 2 k-points in parallel will probably be faster.
> > > >
> > > >
> > > >
> > > > OPTIONS used (thanks to Gerhard Fecher for the useful e-mail about
> > > > compiling Wien2k with ifort 9.0 at the end of August):
> > > >
> > > > current:FOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -xP
> > > > current:FPOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML
> > > > current:LDFLAGS:-L/opt/intel/fce/9.0/lib
> > -L/opt/intel/mkl/8.0/lib/em64t -lsvml
> > > > current:DPARALLEL:'-DParallel'
> > > > current:R_LIBS:-lmkl_lapack -lmkl_em64t -lguide -lguide_stats
> > -lpthread
> > > > current:RP_LIBS:-L /usr/local/SCALAPACK -L /usr/local/BLACS/LIB
> > -lpblas
> > > > -lredist -ltools -lscalapack -lfblacs -lblacs .lmpi
> > > >
> > > > Notes:
> > > > 1.  The speed difference between the new GOTO 1.00 library and MKL
> > > > 8.0 was negligible.
> > > > 2.  Omitting the -xP option for ifort (P4 only) slows down the
> > > > calculations by about 4s (there is a change of +- 1 in the last
> > > > significant digit of the output of lapw1 if -xP is included).
> > > > 3.  The paths to the compiler and mkl libraries will be
> > installation
> > > > dependent.
> > > > 4.  These paths must also be included in .bashrc (or .cshrc) in the
> > > > LD_LIBRARY_PATH environment variable.
> > > >
> > > > Regards,
> > > >
> > > > Enrico
> > > >
> > > >
> > > >
> > > >
> > > > On Mon,
> > > > 14 Nov 2005,
> > > > Michael Gurnett wrote:
> > > >
> > > >>
> > > >> PIV dual-core 820 at 3.3 Ghz ifort 9 and libgoto_prescott64p-r1.00
> > > >>
> > > >> 171 seconds
> > > >>
> > > >>
> > > >> The compiler options used were as follows:
> > > >>
> > > >> current:FOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML
> > > >> current:LDFLAGS:-L/opt -L/opt/intel/cmkl/8.0/lib/em64t -Vaxlib
> > > >> -static-libcxa -pthread
> > > >> current:DPARALLEL:'-DParallel'
> > > >> current:R_LIBS:-lgoto_prescott64p-r1.00 -lmkl_lapack64 -lmkl_em64t
> > -lguide
> > > >>
> > > >>
> > > >> If anyone has some recommendations to increase speed I would
> > appreciate 
> > > >> it
> > > >>
> > > >> Michael
> > > >>
> > > >>
> > > >> _______________________________________________
> > > >> Wien mailing list
> > > >> Wien at zeus.theochem.tuwien.ac.at
> > > >> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> > > >>
> > > >
> > > > _______________________________________________
> > > > Wien mailing list
> > > > Wien at zeus.theochem.tuwien.ac.at
> > > > http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> > > >
> > > > 
> > > 
> > > 
> > > _______________________________________________
> > > Wien mailing list
> > > Wien at zeus.theochem.tuwien.ac.at
> > > http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> > > 
> > 
> > _______________________________________________
> > Wien mailing list
> > Wien at zeus.theochem.tuwien.ac.at
> > http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> 
> 
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>