[Wien] wien2k, gotoblas and multi threads
Peter Blaha
pblaha at theochem.tuwien.ac.at
Wed Aug 13 19:04:44 CEST 2008
These numbers make "sense".
k-point parallelization:
In "real" cases, one has to solve an eigenvalue problem for "many"
k-points ("Many" means typically 10-1000). In these cases, k-point
parallelism is very efficient.
The benchmark case has only ONE k-point in its *.klist file, thus
there's no k-point parallelism.
When you edit the *.klist file (and eg. repeat the 1st line 8 times),
you will see, that the sequential run will take almost exactly 8 times
as long. However, whith k-point parallelism you will probably get a
speedup of 4-6 on your machine.
Still one can find out, what is more efficient on your specific machine:
use (for 8 k-points) 8 lines in .machines and OMP_NUM_THREAD=1 ; or only
4 lines and OMP..=2
I'm sure the colleagues from physics/chemistry can explain the
"k-points" to you.
Regards
Todd Pfaff schrieb:
> I get much better timings for the serial benchmark using an ifort+mkl
> version of wien2k on the same machine. I'm not seeing any speedup
> with k-point parallelization yet though.
>
> - machine: dual Xeon quad-core E5430 @ 2.66GHz with 8GB 667MHz RAM
>
> 1) timings for wien2k-08.2-20080407 built with
> - ifort 10.1.017
> - mkl 10.0.3.020
>
> 1.1) wien2k serial benchmark
> - x lapw1 -c
> - varying OMP_NUM_THREADS from 1 to 8
>
> OMP_NUM_THREADS=1: 116.292u 0.386s 1:56.69 99.9% 0+0k 0+33256io 0pf+0w
> OMP_NUM_THREADS=2: 148.964u 0.963s 1:17.11 194.4% 0+0k 0+33240io 0pf+0w
> OMP_NUM_THREADS=3: 182.932u 1.495s 1:11.11 259.3% 0+0k 0+33240io 0pf+0w
> OMP_NUM_THREADS=4: 213.973u 1.356s 1:03.52 338.9% 0+0k 0+33240io 0pf+0w
> OMP_NUM_THREADS=5: 251.813u 2.195s 1:03.51 399.9% 0+0k 0+33240io 0pf+0w
> OMP_NUM_THREADS=6: 294.103u 2.429s 1:02.11 477.4% 0+0k 0+33240io 0pf+0w
> OMP_NUM_THREADS=7: 329.413u 2.686s 1:01.91 536.4% 0+0k 0+33240io 0pf+0w
> OMP_NUM_THREADS=8: 374.467u 2.488s 1:01.12 616.7% 0+0k 0+33240io 0pf+0w
>
> 1.2) wien2k serial benchmark run with k-point parallelism
> - process started with command 'x lapw1 -p'
> - OMP_NUM_THREADS=1, GOTO_NUM_THREADS=1
> - varying .machines file with N lines, N from 1 to 8, where each line is:
>
> 1:localhost
>
> k-point parallel N=1: localhost k=1 user=116.173 wallclock=116.59
> k-point parallel N=2: localhost k=1 user=116.312 wallclock=116.79
> k-point parallel N=3: localhost k=1 user=116.254 wallclock=116.66
> k-point parallel N=4: localhost k=1 user=116.306 wallclock=116.76
> k-point parallel N=5: localhost k=1 user=116.09 wallclock=116.52
> k-point parallel N=6: localhost k=1 user=116.218 wallclock=116.66
> k-point parallel N=7: localhost k=1 user=116.251 wallclock=116.68
> k-point parallel N=8: localhost k=1 user=116.372 wallclock=116.79
>
>
> 2) timings for wien2k-08.2-20080407 built with
> - GNU Fortran (GCC) 4.2.3 (4.2.3-6mnb1)
> - GotoBLAS-1.26
>
> 2.1) wien2k serial benchmark
> - x lapw1 -c
> - varying OMP_NUM_THREADS from 1 to 8
>
> OMP_NUM_THREADS=1: 195.463u 0.307s 3:15.80 99.9% 0+0k 0+33264io 0pf+0w
> OMP_NUM_THREADS=2: 199.565u 0.569s 2:57.40 112.8% 0+0k 0+33264io 0pf+0w
> OMP_NUM_THREADS=3: 204.145u 0.635s 2:51.02 119.7% 0+0k 0+33264io 0pf+0w
> OMP_NUM_THREADS=4: 211.666u 0.736s 2:49.02 125.6% 0+0k 0+33264io 0pf+0w
> OMP_NUM_THREADS=5: 222.604u 1.032s 2:48.41 132.7% 0+0k 0+33264io 0pf+0w
> OMP_NUM_THREADS=6: 231.258u 0.927s 2:47.54 138.5% 0+0k 0+33264io 0pf+0w
> OMP_NUM_THREADS=7: 243.170u 0.996s 2:46.55 146.5% 0+0k 0+33264io 0pf+0w
> OMP_NUM_THREADS=8: 252.584u 0.916s 2:46.57 152.1% 0+0k 0+33264io 0pf+0w
>
>
> --
> Todd Pfaff <pfaff at mcmaster.ca>
> Research & High-Performance Computing Support
> McMaster University, Hamilton, Ontario, Canada
> http://www.rhpcs.mcmaster.ca/~pfaff
>
>
> On Tue, 12 Aug 2008, Peter Blaha wrote:
>
>> Looking on these numbers tells me, that you probably should invest into
>> ifort + mkl. It does not make sense to buy expensive new hardware, but
>> with bad software it runs slower than on a 6 year old PC.
>> Compare your timing with the benchmark page to see what is possible.
>>
>> k-point parallelization: Please read the UG !!! This is fairly simple.
>>
>> 1:localhost:4 utilizes the mpi-parallel version;
>>
>> you need to put N-lines
>>
>> 1:localhost
>> 1:localhost
>> ...
>>
>> to specify running N lapw1 processes in parallel.
>>
>> Todd Pfaff schrieb:
>>> Peter, thanks for the response.
>>>
>>> I'm getting small speedup from multithreading in libgoto. Here are
>>> timings from the wien2k serial benchmark:
>>>
>>> OMP_NUM_THREADS=1: 195.463u 0.307s 3:15.80 99.9% 0+0k 0+33264io 0pf+0w
>>> OMP_NUM_THREADS=2: 199.565u 0.569s 2:57.40 112.8% 0+0k 0+33264io 0pf+0w
>>> OMP_NUM_THREADS=3: 204.145u 0.635s 2:51.02 119.7% 0+0k 0+33264io 0pf+0w
>>> OMP_NUM_THREADS=4: 211.666u 0.736s 2:49.02 125.6% 0+0k 0+33264io 0pf+0w
>>> OMP_NUM_THREADS=5: 222.604u 1.032s 2:48.41 132.7% 0+0k 0+33264io 0pf+0w
>>> OMP_NUM_THREADS=6: 231.258u 0.927s 2:47.54 138.5% 0+0k 0+33264io 0pf+0w
>>> OMP_NUM_THREADS=7: 243.170u 0.996s 2:46.55 146.5% 0+0k 0+33264io 0pf+0w
>>> OMP_NUM_THREADS=8: 252.584u 0.916s 2:46.57 152.1% 0+0k 0+33264io 0pf+0w
>>>
>>>
>>> I would like explore the k-point parallelization. But when I run
>>> 'x lapw1 -p' it aborts with an error message about being unable to run
>>> lapw1c_mpi. This appears to me like it's trying to run the fine grained
>>> MPI parallel version. I'm not building wien2k with mpi so I don't have a
>>> lapw1c_mpi. I must be misunderstanding something. What am I doing wrong
>>> that's causing it to try to run this lapw1c_mpi executable?
>>>
>>> Which of these are appropriate .machines files to do k-point
>>> parallelization across N cpu cores on a single machine?
>>>
>>> This?
>>>
>>> 1:localhost:N
>>>
>>> Or this?
>>>
>>> N:localhost
>>>
>>> And do I need any of these lines?
>>>
>>> extrafine
>>> granularity:1
>>> residue:localhost
>>>
>>> Or do I need something else either in .machines or in some other
>>> file or on the command line?
>>>
>>> --
>>> Todd Pfaff <pfaff at mcmaster.ca>
>>> Research & High-Performance Computing Support
>>> McMaster University, Hamilton, Ontario, Canada
>>> http://www.rhpcs.mcmaster.ca/~pfaff
>>>
>>> On Mon, 11 Aug 2008, Peter Blaha wrote:
>>>
>>>> The program lapw1 spends a large fraction in BLAS-routines, thus it can
>>>> benefit from multithreading of GOTOLIBS (or MKL).
>>>> Setting the variables you mentioned to 2 (or 4) you should see a
>>>> speedup. The improvement may depend on many factors but it will be at
>>>> most about 50%.
>>>>
>>>> Another possibility to utilize the multiple cores is to do k-point
>>>> parallelism.
>>>> Generate a .machines file with 2,4 or 8 times your machine name
>>>> and test the performance with x lapw1 -p.
>>>> On some architectures (with slow memory bus) it can be that only 4
>>>> parallel jobs give best performance (because the slow memory bus cannot
>>>> feed all 8 cpus properly), on others you can use 8 parallel jobs.
>>>> Sometimes a mixture (4 k-point parallel + OMP_NUM_THREADS=2) is best.
>>>>
>>>> Todd Pfaff schrieb:
>>>>> We're using:
>>>>>
>>>>> wien2k-08.2-20080407
>>>>>
>>>>> built with:
>>>>>
>>>>> GNU Fortran (GCC) 4.2.3 (4.2.3-6mnb1)
>>>>> GotoBLAS-1.26
>>>>>
>>>>> and running on an 8 core (2 x quad core) Xeon machine.
>>>>>
>>>>> Can wien2k take advantage of multithreading inherent to GotoBLAS
>>>>> when either GOTO_NUM_THREADS or OMP_NUM_THREADS is set?
>>>>>
>>>>> If so, can someone provide, or direct me to a document about details of
>>>>> the best way to build and run wien2k for such an environment?
>>>>>
>>>>> Thank you,
>>>>> --
>>>>> Todd Pfaff <pfaff at mcmaster.ca>
>>>>> Research & High-Performance Computing Support
>>>>> McMaster University, Hamilton, Ontario, Canada
>>>>> http://www.rhpcs.mcmaster.ca/~pfaff
>>>>> _______________________________________________
>>>>> Wien mailing list
>>>>> Wien at zeus.theochem.tuwien.ac.at
>>>>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>>> _______________________________________________
>>> Wien mailing list
>>> Wien at zeus.theochem.tuwien.ac.at
>>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
--
-----------------------------------------
Peter Blaha
Inst. Materials Chemistry, TU Vienna
Getreidemarkt 9, A-1060 Vienna, Austria
Tel: +43-1-5880115671
Fax: +43-1-5880115698
email: pblaha at theochem.tuwien.ac.at
-----------------------------------------
More information about the Wien
mailing list