[Wien] more on pathscale and ifort64
Stefaan Cottenier
Stefaan.Cottenier at fys.kuleuven.be
Fri Aug 5 12:10:42 CEST 2005
Following recent questions and comments about compiling wien2k on
Opteron, I add some results from recent tests with pathscale and ifort64
compilers. Do not consider these as final yet, I guess there is some
room for optimization still. And maybe some choices are redundant. The
only thing I can guarantee is that the listed options work for us. From
the cases that did work, ifort64+goto is clearly the fastest one with
215 s for the test_case.
I am highly interested to see more reports on this issue, as well as
suggestions for improvements. Did anyone try yet how 32-bit compiled
wien2k runs on Opteron? And could somebody get the goto-lib to work with
pathscale ?
system: MACHTYPE = x86_64-pc-linux-gnu (dual-cpu Opteron, 2.4 GHz, 2 Gb
RAM)
pathscale 2.1 (pathf90) + mkl 7.2
=======================
timing for test_case: 365 s
O Compiler options: -freeform -march=opteron -mcpu=opteron
-mtune=opteron -w
L Linker Flags: -L/apps/prod/math-lib/mkl72/lib/em64t
-L../SRC_lib -L/lib64 -pthread -Wl,-rpath
/apps/prod/math-lib/mkl72/lib/em64t
P Preprocessor flags '-DParallel'
R R_LIB (LAPACK+BLAS): -lmkl -lmkl_lapack -lvml
(the lib64 might not be explicitely needed, depending on how your system
is configured. The -Wl,-rpath etc. is to enable execution after dynamic
linking on a cluster, probably not needed for a stand-alone machine).
ifort64 + mkl 7.2
============
timing for test_case: 366 s
O Compiler options: -FR -mp1 -w -prec_div -pc80 -pad -ip
-DINTEL_VML
L Linker Flags: -L/apps/prod/math-lib/mkl72/lib/em64t
-L../SRC_lib -L/lib64 -pthread -Wl,-rpath
/apps/prod/math-lib/mkl72/lib/em64t -Wl,-rpath /lib64
P Preprocessor flags '-DParallel'
R R_LIB (LAPACK+BLAS): -lmkl -lmkl_lapack -lvml
pathscale 2.1 + goto
==============
(no timing available as this gives a segmentation fault in the test_case
(x lapw1 -c). Real version runs without problems)
O Compiler options: -freeform -march=opteron -mcpu=opteron
-mtune=opteron -w
L Linker Flags: -L/apps/prod/math-lib/goto
-L../SRC_lib -pthread -Wl,-rpath /apps/prod/math-lib/goto
P Preprocessor flags '-DParallel'
R R_LIB (LAPACK+BLAS): -lgoto_opt64-r0.96-2 -llapack_lapw
ifort64 + goto
=========
timing for test_case: 215 s (the fastest of this series)
O Compiler options: -FR -mp1 -w -prec_div -pc80 -pad -ip
-DINTEL_VML
L Linker Flags: -L/apps/prod/math-lib/goto
-L/apps/prod/math-lib/mkl72/lib/em64t -L/lib64 -L../SRC_lib -pthread
-Wl,-rpath /apps/prod/math-lib/goto -Wl,-rpath
/apps/prod/math-lib/mkl72/lib/em64t -Wl,-rpath /lib64
P Preprocessor flags '-DParallel'
R R_LIB (LAPACK+BLAS): -lgoto_opt64-r0.96-2 -llapack_lapw -lvml
pathscale 2.1 + atlas
==============
(no timing available as this gives a segmentation fault in the test_case
(x lapw1 -c). Not tested yet whether real version runs.)
O Compiler options: -freeform -march=opteron -mcpu=opteron
-mtune=opteron -w
L Linker Flags:
-L/apps/prod/math-lib/atlas/Linux_HAMMER64SSE2_2/lib
-L/apps/prod/math-lib/mkl72/lib/em64t -L../SRC_lib -L/lib64 -Wl,-rpath
/apps/prod/math-lib/atlas/Linux_HAMMER64SSE2_2/lib -Wl,-rpath
/apps/prod/math-lib/mkl72/lib/em64t -Wl,-rpath /lib64
P Preprocessor flags '-DParallel'
R R_LIB (LAPACK+BLAS): -llapack_lapw -lf77blas -latlas
-lguide -lpthread -lvml
ifort64 + atlas
==========
timing for test_case: 270 s
O Compiler options: -FR -mp1 -w -prec_div -pc80 -pad -ip
-DINTEL_VML
L Linker Flags:
-L/apps/prod/math-lib/atlas/Linux_HAMMER64SSE2_2/lib
-L/apps/prod/math-lib/mkl72/lib/em64t -L../SRC_lib -L/lib64 -Wl,-rpath
/apps/prod/math-lib/atlas/Linux_HAMMER64SSE2_2/lib -Wl,-rpath
/apps/prod/math-lib/mkl72/lib/em64t -Wl,-rpath /lib64
P Preprocessor flags '-DParallel'
R R_LIB (LAPACK+BLAS): -llapack_lapw -lf77blas -latlas
-lguide -lpthread -lvml
pathscale 2.1 + mkl : additional optimization (-IPA)
==================================
timing for test_case: 360 s
O Compiler options: -freeform -march=opteron -mcpu=opteron
-mtune=opteron -w -IPA
L Linker Flags: -L/apps/prod/math-lib/mkl72/lib/em64t
-L../SRC_lib -L/lib64 -pthread -Wl,-rpath
/apps/prod/math-lib/mkl72/lib/em64t -IPA
P Preprocessor flags '-DParallel'
R R_LIB (LAPACK+BLAS): -lmkl -lmkl_lapack -lvml -lmkl
-lmkl_lapack -lvml
Note that the -IPA option is present both for compiler and linker, and
that the libraries are specified twice. Both are necessary.
pathscale 2.1 + mkl : additional optimization (-Ofast)
===================================
timing for test_case: 355 s
O Compiler options: -freeform -march=opteron -mcpu=opteron
-mtune=opteron -w -Ofast
L Linker Flags: -L/apps/prod/math-lib/mkl72/lib/em64t
-L../SRC_lib -L/lib64 -pthread -Wl,-rpath
/apps/prod/math-lib/mkl72/lib/em64t -IPA
P Preprocessor flags '-DParallel'
R R_LIB (LAPACK+BLAS): -lmkl -lmkl_lapack -lvml -lmkl
-lmkl_lapack -lvml
This is a more agressive optimization, potentially harmful for accuracy.
---------------
Stefaan
More information about the Wien
mailing list