[Wien] "OpenBlas" package instead of default "blas"
Pavel Ondračka
pavel.ondracka at email.cz
Tue Nov 20 11:24:00 CET 2018
On Mon, 2018-11-19 at 23:54 +0530, Ashwani Kumar wrote:
> Dear Dr. Pavel Ondracka,
> In previous thread,
> https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg18098.html
>
> you advised to use OpenBlas package to extract best performance from
> processor. Since i was having problem with wien2k installation, so i
> went with Dr. Gavin's set of instructions (for lapack devel package).
> Now i want to speed up the wien2k execution (simple oxides too take
> much time). Further i noted that at a time, only one thread remains
> 100% busy, rest threads shows load level 1-5%.
> Configuration of my pc: i7-8700 (6 cores, 12 threads), 8 gb ram (can
> be upgraded to 16 gb), fedora-28, graphic card (gtx...)
>
> I understand that "openBlas" need to be installed and set R_path to
> -lopenblas. I also want to utilize thread level parallelism if it
> boosts the processor's performance further by a factor of >= x1.5
> times.
Dear A. Kumar,
I don't fully understand your comment about the thread load? The Wien2k
does not ATM spawn multiple threads (unless you use threaded
blas/lapack). The k-point (or MPI) parallel calculations spawn multiple
processes but those should never be at 1-5% load...
IMO there are likely two problems here:
1) If you are only using one machine and your case has a lot of k-point
(and you are not memory-bound), what you want is k-point parallelism.
This can be done with the .machines file (and the -p switch). If you
are only using single machine your .machines file should contain
"1:localhost" line for every processor on your computer (i.e. in your
specific case reasonable .machines file would have 6 (maybe even 12
with hyperthreading, but you need to test your optimal setup) identical
lines. Please check the userguide for more details about the k-point
parallel execution and .machines file in general.
2) regarding the openblas: what you need is an openblas devel package.
In the beginning I suggest the serial openblas "dnf install openblas-
devel" and set R_LIBs to just "-lopenblas". If you want to squeeze more
speed (and you are using only single computer), add also "-ftree-
vectorize -march=native" to your FOPT flags.
If you really want to go with the threaded openblas I can help you
later but IMO this should not be needed in the beginning (as the k-
point parallelism is the optimal one). You will also need some further
tricks to make lapw1 fast with the libmvec. Either see
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg16159.html
or I can provide some new patches which do the same with OpenMP (but
first get the k-point parallelism and serial openblas working).
Hope this helps
Best regards
Pavel
>
> Waiting for your expert advise,
>
> thanks,
> A. Kumar
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
More information about the Wien
mailing list