[Wien] wien2k, gotoblas and multi threads
Todd Pfaff
pfaff at rhpcs.mcmaster.ca
Mon Aug 11 21:35:28 CEST 2008
Peter, thanks for the response.
I'm getting small speedup from multithreading in libgoto. Here are
timings from the wien2k serial benchmark:
OMP_NUM_THREADS=1: 195.463u 0.307s 3:15.80 99.9% 0+0k 0+33264io 0pf+0w
OMP_NUM_THREADS=2: 199.565u 0.569s 2:57.40 112.8% 0+0k 0+33264io 0pf+0w
OMP_NUM_THREADS=3: 204.145u 0.635s 2:51.02 119.7% 0+0k 0+33264io 0pf+0w
OMP_NUM_THREADS=4: 211.666u 0.736s 2:49.02 125.6% 0+0k 0+33264io 0pf+0w
OMP_NUM_THREADS=5: 222.604u 1.032s 2:48.41 132.7% 0+0k 0+33264io 0pf+0w
OMP_NUM_THREADS=6: 231.258u 0.927s 2:47.54 138.5% 0+0k 0+33264io 0pf+0w
OMP_NUM_THREADS=7: 243.170u 0.996s 2:46.55 146.5% 0+0k 0+33264io 0pf+0w
OMP_NUM_THREADS=8: 252.584u 0.916s 2:46.57 152.1% 0+0k 0+33264io 0pf+0w
I would like explore the k-point parallelization. But when I run
'x lapw1 -p' it aborts with an error message about being unable to run
lapw1c_mpi. This appears to me like it's trying to run the fine grained
MPI parallel version. I'm not building wien2k with mpi so I don't have a
lapw1c_mpi. I must be misunderstanding something. What am I doing wrong
that's causing it to try to run this lapw1c_mpi executable?
Which of these are appropriate .machines files to do k-point
parallelization across N cpu cores on a single machine?
This?
1:localhost:N
Or this?
N:localhost
And do I need any of these lines?
extrafine
granularity:1
residue:localhost
Or do I need something else either in .machines or in some other
file or on the command line?
--
Todd Pfaff <pfaff at mcmaster.ca>
Research & High-Performance Computing Support
McMaster University, Hamilton, Ontario, Canada
http://www.rhpcs.mcmaster.ca/~pfaff
On Mon, 11 Aug 2008, Peter Blaha wrote:
> The program lapw1 spends a large fraction in BLAS-routines, thus it can
> benefit from multithreading of GOTOLIBS (or MKL).
> Setting the variables you mentioned to 2 (or 4) you should see a
> speedup. The improvement may depend on many factors but it will be at
> most about 50%.
>
> Another possibility to utilize the multiple cores is to do k-point
> parallelism.
> Generate a .machines file with 2,4 or 8 times your machine name
> and test the performance with x lapw1 -p.
> On some architectures (with slow memory bus) it can be that only 4
> parallel jobs give best performance (because the slow memory bus cannot
> feed all 8 cpus properly), on others you can use 8 parallel jobs.
> Sometimes a mixture (4 k-point parallel + OMP_NUM_THREADS=2) is best.
>
> Todd Pfaff schrieb:
>> We're using:
>>
>> wien2k-08.2-20080407
>>
>> built with:
>>
>> GNU Fortran (GCC) 4.2.3 (4.2.3-6mnb1)
>> GotoBLAS-1.26
>>
>> and running on an 8 core (2 x quad core) Xeon machine.
>>
>> Can wien2k take advantage of multithreading inherent to GotoBLAS
>> when either GOTO_NUM_THREADS or OMP_NUM_THREADS is set?
>>
>> If so, can someone provide, or direct me to a document about details of
>> the best way to build and run wien2k for such an environment?
>>
>> Thank you,
>> --
>> Todd Pfaff <pfaff at mcmaster.ca>
>> Research & High-Performance Computing Support
>> McMaster University, Hamilton, Ontario, Canada
>> http://www.rhpcs.mcmaster.ca/~pfaff
>> _______________________________________________
>> Wien mailing list
>> Wien at zeus.theochem.tuwien.ac.at
>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>
>
More information about the Wien
mailing list