[Wien] FFTW and ifx/icx issue relevant to WIEN2k

Fecher, Gerhard fecher at uni-mainz.de
Wed Sep 10 16:44:57 CEST 2025


You are right it seems znver4 works for AMD's

Here are some benchmarks, don't take them too serious, Wien2k may behave different from such simple tests.

As Laurence told -O3 doesn't  do better than -O2

Note the use of some optimization switches makes sometimes only sense in case the code supports the vectorization.

On Intel XEONs it seems that for complex operations ifx performs still worth than ifort (I don't know whether this is improved in 2025.2),
  on EPIYCs I am not that sure about improvements, the AMD compiler does not do better than the Intel one

The hint of AMD on -axCORE-AVX512 seems to be useless on my EPYCs (ifx tells that it doesn't vectorize the loop)

The old gfortran was realy bad.

Benchmarks from MC Rutter https://www.mjr19.org.uk/
(conjg calculates the complex conjugate, mult calculate multiplications
mult-dble is transformed by me from mult z (mult-cmplx) just to see what real operations do)


            Times in ns per operation for
            conjg mult-cmplx mult-dble

Intel Xeon(R) E5-2697 -----------
ifort 24.2
O2          0.422   1.322   0.230
O2 host     0.150   0.387   0.122
O3          0.424   1.328   0.234
O3 host     0.151   0.387   0.123

ifx 25.1
O2          0.423   0.946   0.256
O2 host     0.906   1.050   0.155
O3          0.426   0.950   0.257
O3 host     0.910   1.062   0.157

gfortran  v 7.5
O3          0.364   1.442  19.532
O3 generic  0.365   1.442  19.534
O3 avx2     0.364   0.880  10.261

AMD EPYC 9354 -------------------
ifort 24.2
O3          0.346   0.596   0.174
O3 AVX512   0.345   0.597   0.166
O3 znver4   0.346   0.600   0.173

ifx 25.1
O2          0.264   0.596   0.196
O2 znver4   0.467   0.334   0.118
O3          0.264   0.597   0.170
O3 AVX512   0.263   0.595   0.268
O3 znver4   0.467   0.334   0.117

gfortran v 14.2
O3          0.273   0.719   0.216
O3 generic  0.273   0.716   0.228
O3 avx512   0.264   0.284   0.111
O3 znver4   0.274   0.283   0.104

flang 5.0
O3          0.264   0.571   0.266
O3 znver4   0.266   0.321   0.126

=================================

Compiler switches
Intel ifx 2025.1 or ifort 2024.2
-xhost
-axCORE-AVX2
-axCORE-AVX512
-axCORE-AVX2,CORE-AVX512

GNU gfortran
-march=znver4
-march=core-avx2
-mavx2 -mavx512f
-mtune=generic

AMD flang
-march=znver4

Ciao
Gerhard

DEEP THOUGHT in D. Adams; Hitchhikers Guide to the Galaxy:
"I think the problem, to be quite honest with you,
is that you have never actually known what the question is."

====================================
Dr. Gerhard H. Fecher
Institut of Physics
Johannes Gutenberg - University
55099 Mainz
________________________________________
Von: Wien [wien-bounces at zeus.theochem.tuwien.ac.at] im Auftrag von Straus, Daniel B [dstraus at tulane.edu]
Gesendet: Mittwoch, 10. September 2025 16:08
An: A Mailing list for WIEN2k users
Betreff: Re: [Wien] FFTW and ifx/icx issue relevant to WIEN2k

You're right in that Intel doesn't officially list any of the znver# flags as supported by the LLVM/CLANG based compilers. However, it seems that the LLVM backend supports them. See, for example, https://community.intel.com/t5/Intel-Fortran-Compiler/Compilation-error-with-fast-on-AMD-Ryzen-9-9900X-using-ifx/td-p/1712241, https://stackoverflow.com/questions/79174824/why-do-gcc-icx-and-clang-not-auto-vectorize-using-avx-512-based-instructions-on.

While this is not conclusive, if you attempt to specify march=znver5 for the 2024 oneapi compilers, compilation fails because znver5 is not recognized. march=znver4 allows programs to be compiled properly in the 2024 oneapi version.

That said, AMD says to specify -axCORE-AVX512 when using oneapi, so it is not clear exactly what's going on. https://docs.amd.com/r/en-US/63857-AOCC-quick-start-guide/AMD-EPYC-9xx5-Series-Processors-Compiler-Options-Quick-Reference

All I know is that AVX512 code is generated when march=znver5 is passed as a flag to the oneapi 2025.2 compilers, and that's good enough for me. Trying to compile FFTW with AVX512 enabled fails when no march flag is passed to icx.


Daniel Straus
Assistant Professor
Department of Chemistry
Tulane University
5088 Percival Stern Hall
6400 Freret Street
New Orleans, LA 70118
(504) 862-3585
http://straus.tulane.edu/


-----Original Message-----
From: Wien <wien-bounces at zeus.theochem.tuwien.ac.at> On Behalf Of Fecher, Gerhard
Sent: Wednesday, September 10, 2025 1:45 AM
To: A Mailing list for WIEN2k users <wien at zeus.theochem.tuwien.ac.at>
Subject: Re: [Wien] FFTW and ifx/icx issue relevant to WIEN2k

External Sender. Be aware of links, attachments and requests.

more comments
I could not find that zenver5 is a valid CPU architecture for ixc or ifx on https://www.intel.com/content/www/us/en/docs/fortran-compiler/developer-guide-reference/2025-2/march.html
(this concerns also other CPU dependent compiler switches -x, -ax, -arch, there is no zenverX) It seems it was just used by a "beginner" hoshi on https://community.intel.com/t5/Intel-Fortran-Compiler/Compilation-error-with-fast-on-AMD-Ryzen-9-9900X-using-ifx/td-p/1712241
I would guess  -march=znver5 (because it can be used with the GNU compilers) is just ignored why should Intel be interested to write an optimized comnpiler for AMD CPU's ?
Did you ever test whether -march=znver5 changes anything  ?
As mentioned earlier -axCORE-AVX512, -axCORE-AVX2 or a combination of both may work on AMD processors (at least they don't slow the programm seriuously, and I didn't find dead electrons)

There was already a lot of discussion on FFTW3 and ELPA at the beginning of the year

Ciao
Gerhard

DEEP THOUGHT in D. Adams; Hitchhikers Guide to the Galaxy:
"I think the problem, to be quite honest with you, is that you have never actually known what the question is."

====================================
Dr. Gerhard H. Fecher
Institut of Physics
Johannes Gutenberg - University
55099 Mainz
________________________________________
Von: Wien [wien-bounces at zeus.theochem.tuwien.ac.at] im Auftrag von Laurence Marks [laurence.marks at gmail.com]
Gesendet: Dienstag, 9. September 2025 23:20
An: A Mailing list for WIEN2k users
Betreff: Re: [Wien] FFTW and ifx/icx issue relevant to WIEN2k

Comments.

  1.  I have never seen -O3 do anything with icc/ifort except kill defenceless electrons and make the code slower. I will be happy to be proved wrong with ifx/icx.
  2.  I always use -mkl, rather than making mistakes chasing how intel changes its libraries.
  3.  I think you might have issues with -mkl_cdft (intel's version of fftw) and FFTW3

On Tue, Sep 9, 2025 at 4:10 PM Straus, Daniel B <dstraus at tulane.edu<mailto:dstraus at tulane.edu>> wrote:
Sorry for the long delay in responding—I was set to receive a digest of list messages, and it only comes once every couple of weeks.

Yes, this is on a Zen 5 computer, and it is running Rocky Linux 10. I am using the Intel compiler and MKL, rather than the one AMD provides. IFX and ICX support the march=znver5 flag. All the WIEN2k 24.1 patches available as of 9/1 were installed.

To be clear, on my workstation, FFTW still will work with WIEN2k even if the autoconf script is not regenerated, but there may be a performance impact as it is not using the proper Intel libraries for Fortran calls to FFTW. However, 3ddens would then not compile, and if I also attempted to use ELPA, then parallel LAPW1 would not compile. Regenerating the autoconf script for FFTW and recompiling it solved both problems. You should check the config.log for your FFTW compilation to see if there is a line such as “ld: cannot find -loopopt=0” to see if this error is occurring. For me, the configure script continued even after this error, but it was using GNU default libraries rather than the Intel provided libraries.

siteconfig_lapw is set to use the ifx and icx compilers, and here are the flags under “Options” in siteconfig_lapw I am using the following compiler options for WIEN2k with the IFX compiler.
Current settings:
  M   OpenMP switch:           -qopenmp
  O   Compiler options:        -O3 -march=znver5 -traceback -assume buffered_io -FR -I$(MKLROOT)/include
  L   Linker Flags:            $(FOPT) -L$(MKLROOT)/lib -lpthread -lm -ldl -liomp5 -Wl,-rpath,$MKLROOT/lib
  P   Preprocessor flags       '-DParallel'
  R   R_LIBS (LAPACK+BLAS):    -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core
  F   FFTW options:            -DFFTW3 -DFFTW_OMP -I/home/software/fftw-3.3.10/include
      FFTW-LIBS:               -L/home/software/fftw-3.3.10/lib -lfftw3 -lfftw3_omp
  X   LIBX options:
      LIBXC-LIBS:

For Parallel Options in siteconfig_lapw, here are the flags I am using:
Your current parallel settings (options and libraries) are:
     C   Parallel Compiler:          mpiifx
     FP  Parallel Compiler Options:  -O3 -FR -march=znver5 -fc=ifx -traceback -assume buffered_io -I$(MKLROOT)/include
     MP  MPIRUN command:             mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_
     O   Parallel OpenMP switch:     -qopenmp
   Additional setting for SLURM batch systems (is set to 1 otherwise):
     CN  Number of Cores:            1
   Libraries:
     Sp  SCALAPACK:                   -L$(MKLROOT)/lib
                                                     -lmkl_scalapack_lp64
                                                     -L$(MKLROOT)/lib -lmkl_blacs_intelmpi_lp64
     E   ELPA options:                -DELPA -I/home/software/elpa-2025.06.001/include/elpa-2025.06.001/elpa
                                                     -I/home/software/elpa-2025.06.001/include/elpa-2025.06.001/modules
         ELPA-LIBS:                   -lelpa -L/home/software/elpa-2025.06.001/lib -Wl,-rpath=/home/software/elpa-2025.06.001/lib
     RP  Parallel-Libs:      $(R_LIBS) -lmkl_cdft_core

In case it’s relevant here is what I passed to the configure script for FFTW3 (after regenerating the script with autoconf):
module load oneapi/2025.2.0
./configure --prefix=/home/software/fftw-3.3.10 CC="mpiicx -cc=icx" MPICC="mpiicx -cc=icx" F77="mpiifx -fc=ifx" FFLAGS="-O3 -march=znver5 -I"${MKLROOT}/include"" CFLAGS="-O3 -march=znver5 -I"${MKLROOT}/include"" CXXFLAGS="-I"${MKLROOT}/include"" LDFLAGS="-L${MKLROOT}/lib -lmkl_scalapack_lp64 -lmkl_cdft_core -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_lp64 -liomp5 -lpthread -lm -ldl" --enable-option-checking=fatal --enable-avx512 --enable-avx2 --enable-mpi --enable-openmp --enable-threads

And for ELPA:
module load oneapi/2025.2.0
./configure --prefix=/home/software/elpa-2025.06.001 CC="mpiicx -cc=icx" CXX="mpiicpx -cxx=icpx" FC="mpiifx -fc=ifx" CFLAGS="-O3 -march=znver5 -I"${MKLROOT}/include"" FCFLAGS="-O3 -march=znver5 -I"${MKLROOT}/include"" CXXFLAGS="-O3 -march=znver5 -I"${MKLROOT}/include"" LDFLAGS="-L${MKLROOT}/lib -lmkl_scalapack_lp64 -lmkl_cdft_core -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_lp64 -liomp5 -lpthread -lm -ldl" --enable-option-checking=fatal --with-mpi=yes --enable-openmp=yes

Hopefully this is helpful.



Daniel Straus
Assistant Professor
Department of Chemistry
Tulane University
5088 Percival Stern Hall
6400 Freret Street
New Orleans, LA 70118
(504) 862-3585
http://straus.tulane.edu/

_______________________________________________
Wien mailing list
Wien at zeus.theochem.tuwien.ac.at<mailto:Wien at zeus.theochem.tuwien.ac.at>
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


--
Emeritus Professor Laurence Marks (Laurie) Northwestern University Webpage<http://www.numis.northwestern.edu/> and Google Scholar link<http://scholar.google.com/citations?user=zmHhI9gAAAAJ&hl=en>
"Research is to see what everybody else has seen, and to think what nobody else has thought", Albert Szent-Györgyi _______________________________________________
Wien mailing list
Wien at zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
_______________________________________________
Wien mailing list
Wien at zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


More information about the Wien mailing list