[Wien] more on pathscale and ifort64

Stefaan Cottenier Stefaan.Cottenier at fys.kuleuven.be
Fri Aug 5 12:10:42 CEST 2005


Following recent questions and comments about compiling wien2k on 
Opteron, I add some results from recent tests with pathscale and ifort64 
compilers. Do not consider these as final yet, I guess there is some 
room for optimization still. And maybe some choices are redundant. The 
only thing I can guarantee is that the listed options work for us. From 
the cases that did work, ifort64+goto is clearly the fastest one with 
215 s for the test_case.

I am highly interested to see more reports on this issue, as well as 
suggestions for improvements. Did anyone try yet how 32-bit compiled 
wien2k runs on Opteron? And could somebody get the goto-lib to work with 
pathscale ?

system: MACHTYPE = x86_64-pc-linux-gnu  (dual-cpu Opteron, 2.4 GHz, 2 Gb 
RAM)

pathscale 2.1 (pathf90) + mkl 7.2
=======================

timing for test_case: 365 s

     O   Compiler options:        -freeform -march=opteron -mcpu=opteron 
-mtune=opteron -w
     L   Linker Flags:            -L/apps/prod/math-lib/mkl72/lib/em64t 
-L../SRC_lib -L/lib64 -pthread -Wl,-rpath 
/apps/prod/math-lib/mkl72/lib/em64t
     P   Preprocessor flags       '-DParallel'
     R   R_LIB (LAPACK+BLAS):     -lmkl -lmkl_lapack -lvml

(the lib64 might not be explicitely needed, depending on how your system 
is configured. The -Wl,-rpath etc. is to enable execution after dynamic 
linking on a cluster, probably not needed for a stand-alone machine).

ifort64 + mkl 7.2
============

timing for test_case: 366 s

     O   Compiler options:        -FR -mp1 -w -prec_div -pc80 -pad -ip 
-DINTEL_VML
     L   Linker Flags:            -L/apps/prod/math-lib/mkl72/lib/em64t 
-L../SRC_lib -L/lib64 -pthread -Wl,-rpath 
/apps/prod/math-lib/mkl72/lib/em64t -Wl,-rpath /lib64
     P   Preprocessor flags       '-DParallel'
     R   R_LIB (LAPACK+BLAS):     -lmkl -lmkl_lapack -lvml

pathscale 2.1 + goto
==============

(no timing available as this gives a segmentation fault in the test_case 
(x lapw1 -c). Real version runs without problems)

     O   Compiler options:        -freeform -march=opteron -mcpu=opteron 
-mtune=opteron -w
     L   Linker Flags:            -L/apps/prod/math-lib/goto 
-L../SRC_lib -pthread -Wl,-rpath /apps/prod/math-lib/goto
     P   Preprocessor flags       '-DParallel'
     R   R_LIB (LAPACK+BLAS):     -lgoto_opt64-r0.96-2 -llapack_lapw

ifort64 + goto
=========

timing for test_case: 215 s  (the fastest of this series)

     O   Compiler options:        -FR -mp1 -w -prec_div -pc80 -pad -ip 
-DINTEL_VML
     L   Linker Flags:            -L/apps/prod/math-lib/goto 
-L/apps/prod/math-lib/mkl72/lib/em64t -L/lib64 -L../SRC_lib -pthread 
-Wl,-rpath /apps/prod/math-lib/goto -Wl,-rpath 
/apps/prod/math-lib/mkl72/lib/em64t -Wl,-rpath /lib64
     P   Preprocessor flags       '-DParallel'
     R   R_LIB (LAPACK+BLAS):     -lgoto_opt64-r0.96-2 -llapack_lapw -lvml

pathscale 2.1 + atlas
==============

(no timing available as this gives a segmentation fault in the test_case 
(x lapw1 -c). Not tested yet whether real version runs.)

     O   Compiler options:        -freeform -march=opteron -mcpu=opteron 
-mtune=opteron -w
     L   Linker Flags:            
-L/apps/prod/math-lib/atlas/Linux_HAMMER64SSE2_2/lib 
-L/apps/prod/math-lib/mkl72/lib/em64t -L../SRC_lib -L/lib64 -Wl,-rpath 
/apps/prod/math-lib/atlas/Linux_HAMMER64SSE2_2/lib -Wl,-rpath 
/apps/prod/math-lib/mkl72/lib/em64t -Wl,-rpath /lib64
     P   Preprocessor flags       '-DParallel'
     R   R_LIB (LAPACK+BLAS):     -llapack_lapw -lf77blas -latlas 
-lguide -lpthread -lvml

ifort64 + atlas
==========

timing for test_case: 270 s

     O   Compiler options:        -FR -mp1 -w -prec_div -pc80 -pad -ip 
-DINTEL_VML
     L   Linker Flags:            
-L/apps/prod/math-lib/atlas/Linux_HAMMER64SSE2_2/lib 
-L/apps/prod/math-lib/mkl72/lib/em64t -L../SRC_lib -L/lib64 -Wl,-rpath 
/apps/prod/math-lib/atlas/Linux_HAMMER64SSE2_2/lib -Wl,-rpath 
/apps/prod/math-lib/mkl72/lib/em64t -Wl,-rpath /lib64
     P   Preprocessor flags       '-DParallel'
     R   R_LIB (LAPACK+BLAS):     -llapack_lapw -lf77blas -latlas 
-lguide -lpthread -lvml

pathscale 2.1 + mkl : additional optimization (-IPA)
==================================

timing for test_case: 360 s

     O   Compiler options:        -freeform -march=opteron -mcpu=opteron 
-mtune=opteron -w -IPA
     L   Linker Flags:            -L/apps/prod/math-lib/mkl72/lib/em64t 
-L../SRC_lib -L/lib64 -pthread -Wl,-rpath 
/apps/prod/math-lib/mkl72/lib/em64t -IPA
     P   Preprocessor flags       '-DParallel'
     R   R_LIB (LAPACK+BLAS):     -lmkl -lmkl_lapack -lvml -lmkl 
-lmkl_lapack -lvml

Note that the -IPA option is present both for compiler and linker, and 
that the libraries are specified twice. Both are necessary.

pathscale 2.1 + mkl : additional optimization (-Ofast)
===================================

timing for test_case: 355 s

     O   Compiler options:        -freeform -march=opteron -mcpu=opteron 
-mtune=opteron -w -Ofast
     L   Linker Flags:            -L/apps/prod/math-lib/mkl72/lib/em64t 
-L../SRC_lib -L/lib64 -pthread -Wl,-rpath 
/apps/prod/math-lib/mkl72/lib/em64t -IPA
     P   Preprocessor flags       '-DParallel'
     R   R_LIB (LAPACK+BLAS):     -lmkl -lmkl_lapack -lvml -lmkl 
-lmkl_lapack -lvml

This is a more agressive optimization, potentially harmful for accuracy.

---------------
Stefaan


More information about the Wien mailing list