[Wien] wien2k, gotoblas and multi threads

Todd Pfaff pfaff at rhpcs.mcmaster.ca
Mon Aug 11 21:35:28 CEST 2008


Peter, thanks for the response.

I'm getting small speedup from multithreading in libgoto.  Here are
timings from the wien2k serial benchmark:

OMP_NUM_THREADS=1: 195.463u 0.307s 3:15.80 99.9%        0+0k 0+33264io 0pf+0w
OMP_NUM_THREADS=2: 199.565u 0.569s 2:57.40 112.8%       0+0k 0+33264io 0pf+0w
OMP_NUM_THREADS=3: 204.145u 0.635s 2:51.02 119.7%       0+0k 0+33264io 0pf+0w
OMP_NUM_THREADS=4: 211.666u 0.736s 2:49.02 125.6%       0+0k 0+33264io 0pf+0w
OMP_NUM_THREADS=5: 222.604u 1.032s 2:48.41 132.7%       0+0k 0+33264io 0pf+0w
OMP_NUM_THREADS=6: 231.258u 0.927s 2:47.54 138.5%       0+0k 0+33264io 0pf+0w
OMP_NUM_THREADS=7: 243.170u 0.996s 2:46.55 146.5%       0+0k 0+33264io 0pf+0w
OMP_NUM_THREADS=8: 252.584u 0.916s 2:46.57 152.1%       0+0k 0+33264io 0pf+0w


I would like explore the k-point parallelization.  But when I run
'x lapw1 -p' it aborts with an error message about being unable to run
lapw1c_mpi.  This appears to me like it's trying to run the fine grained
MPI parallel version.  I'm not building wien2k with mpi so I don't have a
lapw1c_mpi.  I must be misunderstanding something.  What am I doing wrong
that's causing it to try to run this lapw1c_mpi executable?

Which of these are appropriate .machines files to do k-point 
parallelization across N cpu cores on a single machine?

This?

   1:localhost:N

Or this?

   N:localhost

And do I need any of these lines?

   extrafine
   granularity:1
   residue:localhost

Or do I need something else either in .machines or in some other
file or on the command line?

--
Todd Pfaff <pfaff at mcmaster.ca>
Research & High-Performance Computing Support
McMaster University, Hamilton, Ontario, Canada
http://www.rhpcs.mcmaster.ca/~pfaff

On Mon, 11 Aug 2008, Peter Blaha wrote:

> The program lapw1 spends a large fraction in BLAS-routines, thus it can
> benefit from multithreading of GOTOLIBS (or MKL).
> Setting the variables you mentioned to 2 (or 4) you should see a
> speedup. The improvement may depend on many factors but it will be at
> most about 50%.
>
> Another possibility to utilize the multiple cores is to do k-point
> parallelism.
> Generate a .machines file with 2,4 or 8  times your machine name
> and test the performance with     x lapw1 -p.
> On some architectures (with slow memory bus) it can be that only 4
> parallel jobs give best performance (because the slow memory bus cannot
> feed all 8 cpus properly), on others you can use 8 parallel jobs.
> Sometimes a mixture (4 k-point parallel + OMP_NUM_THREADS=2) is best.
>
> Todd Pfaff schrieb:
>> We're using:
>>
>>    wien2k-08.2-20080407
>>
>> built with:
>>
>>    GNU Fortran (GCC) 4.2.3 (4.2.3-6mnb1)
>>    GotoBLAS-1.26
>>
>> and running on an 8 core (2 x quad core) Xeon machine.
>>
>> Can wien2k take advantage of multithreading inherent to GotoBLAS
>> when either GOTO_NUM_THREADS or OMP_NUM_THREADS is set?
>>
>> If so, can someone provide, or direct me to a document about details of
>> the best way to build and run wien2k for such an environment?
>>
>> Thank you,
>> --
>> Todd Pfaff <pfaff at mcmaster.ca>
>> Research & High-Performance Computing Support
>> McMaster University, Hamilton, Ontario, Canada
>> http://www.rhpcs.mcmaster.ca/~pfaff
>> _______________________________________________
>> Wien mailing list
>> Wien at zeus.theochem.tuwien.ac.at
>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>
>


More information about the Wien mailing list