[Wien] Benchmark

Michael Gurnett michael.gurnett at kau.se
Fri Nov 18 11:08:07 CET 2005


Thank you for the answer. Just a few questions. Does this require a
recompile of the code? or enough to just add it in the .bashrc. Is there
any speed up in lapw0, and finally have you noticed any problem when using
this but spreading several k points over cpus (would be nice if it worked
for lapw0 without causing problems in lapw1)

Michael

-----Original Message-----
From: lombaeb at science.unisa.ac.za
To: A Mailing list for WIEN2k users <wien at zeus.theochem.tuwien.ac.at>
Date: Fri, 18 Nov 2005 11:45:53 +0200 (SAST)
Subject: Re: [Wien] Benchmark

> Using   export OMP_NUM_THREADS=2  in .bashrc should speed up 
> the calculations.
> 
> For a 32 bit dual processor 3.0 GHz Xeon machine I get:
> 1 kpoint,  1 processor:        247s (4:08)
> 2 kpoints, 2 processors:       261s (4:22)
> 1 kpoint,  OMP_NUM_THREADS=2:  275s (3:09 = 189)
> 
> Note in the last line that the 'CPU seconds' does not equal the 'Real 
> time', since the former takes into account that two CPUs are used. The 
> difference of 1s in the first two lines is due to CPU usage never 
> being exactly 100.0%.
> 
> So using OMP....=2, speeds up the calculation by about 50%, which 
> unfortunately implies only 50% efficiency for the 2nd CPU.
> 
> Note that both MKL and GOTO use the OMP_NUM_THREADS environment 
> variable.
> 
> Regards
> 
> Enrico
> 
> --
> Dr E B Lombardi
> Physics Department
> University of South Africa
> P.O. Box 392
> 0003 UNISA
> South Africa
> Tel: +27 (0)12 429-8027
> Fax: +27 (0)12 429-3643
> e-mail: lombaeb at science.unisa.ac.za
> 
> 
> On Thu, 17 Nov 2005, Michael Gurnett wrote:
> 
> > Yes. using a both processors by spreading k points over them
> basically 
> > doubles the speed (and this is also the more typical setup we use).
> What I 
> > really want to do is just get an idea how fast this system is on the
> test 
> > case, which means most likely mpi, which I believe is just not that
> good. It 
> > would be nice if mkl would use both processors. I will be setting up
> mpi for 
> > lapw0 (tried long ago but never got it working). So if someone has a 
> > "Getting intel mpi to work with ifort for dummies" book I would be
> most 
> > greatful.
> > 
> > Michael
> > ----- Original Message ----- 
> > From: <lombaeb at science.unisa.ac.za>
> > To: "A Mailing list for WIEN2k users"
> <wien at zeus.theochem.tuwien.ac.at>
> > Sent: Wednesday, November 16, 2005 7:13 PM
> > Subject: Re: [Wien] Benchmark
> > 
> > 
> > > Dear Wien users
> > >
> > > On an x86_64 machine with a 3.2 GHz (P4-640) CPU I got the
> following
> > > benchmark times, using ifort 9.0 and mkl 8.0 (OPTIONS used are
> given at
> > > the end of the e-mail) on a system with an Intel motherboard (945G
> > > chipset) and DDR-II 533MHz RAM (dual channel configuration).
> > >
> > > HT disabled:  163s
> > > HT enabled:  176s
> > > HT enabled, with OMP_NUM_THREADS = 2:  194s
> > > HT enabled, 2 k-points in parallel: 386 s  (DIV 2 = 193s) (x lapw1
> -c
> > > -p)
> > >
> > >
> > > To Michael:
> > > It may be possible to speed up the .throughput. time on a dual core
> > > machine by using .normal. MKL and running 2 k-points simultaneously
> > > (using .machines).  The time for 1 k-point may be slower, but the
> time
> > > for 2 k-points in parallel will probably be faster.
> > >
> > >
> > >
> > > OPTIONS used (thanks to Gerhard Fecher for the useful e-mail about
> > > compiling Wien2k with ifort 9.0 at the end of August):
> > >
> > > current:FOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -xP
> > > current:FPOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML
> > > current:LDFLAGS:-L/opt/intel/fce/9.0/lib
> -L/opt/intel/mkl/8.0/lib/em64t -lsvml
> > > current:DPARALLEL:'-DParallel'
> > > current:R_LIBS:-lmkl_lapack -lmkl_em64t -lguide -lguide_stats
> -lpthread
> > > current:RP_LIBS:-L /usr/local/SCALAPACK -L /usr/local/BLACS/LIB
> -lpblas
> > > -lredist -ltools -lscalapack -lfblacs -lblacs .lmpi
> > >
> > > Notes:
> > > 1.  The speed difference between the new GOTO 1.00 library and MKL
> > > 8.0 was negligible.
> > > 2.  Omitting the -xP option for ifort (P4 only) slows down the
> > > calculations by about 4s (there is a change of +- 1 in the last
> > > significant digit of the output of lapw1 if -xP is included).
> > > 3.  The paths to the compiler and mkl libraries will be
> installation
> > > dependent.
> > > 4.  These paths must also be included in .bashrc (or .cshrc) in the
> > > LD_LIBRARY_PATH environment variable.
> > >
> > > Regards,
> > >
> > > Enrico
> > >
> > >
> > >
> > >
> > > On Mon,
> > > 14 Nov 2005,
> > > Michael Gurnett wrote:
> > >
> > >>
> > >> PIV dual-core 820 at 3.3 Ghz ifort 9 and libgoto_prescott64p-r1.00
> > >>
> > >> 171 seconds
> > >>
> > >>
> > >> The compiler options used were as follows:
> > >>
> > >> current:FOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML
> > >> current:LDFLAGS:-L/opt -L/opt/intel/cmkl/8.0/lib/em64t -Vaxlib
> > >> -static-libcxa -pthread
> > >> current:DPARALLEL:'-DParallel'
> > >> current:R_LIBS:-lgoto_prescott64p-r1.00 -lmkl_lapack64 -lmkl_em64t
> -lguide
> > >>
> > >>
> > >> If anyone has some recommendations to increase speed I would
> appreciate 
> > >> it
> > >>
> > >> Michael
> > >>
> > >>
> > >> _______________________________________________
> > >> Wien mailing list
> > >> Wien at zeus.theochem.tuwien.ac.at
> > >> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> > >>
> > >
> > > _______________________________________________
> > > Wien mailing list
> > > Wien at zeus.theochem.tuwien.ac.at
> > > http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> > >
> > > 
> > 
> > 
> > _______________________________________________
> > Wien mailing list
> > Wien at zeus.theochem.tuwien.ac.at
> > http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> > 
> 
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien




More information about the Wien mailing list