[Wien] i7-13700K benchmarks
pluto
pluto at physics.ucdavis.edu
Wed Mar 15 17:19:37 CET 2023
Dear All,
This might be useful for anyone who is building a Linux PC system.
I have some more insight into the speed using i7-13700K, which is the
current 13th gen Intel CPU. I have Z690-P D4 Asus board and either 128
GB (4x32) or 64 GB (2x32) Kingston FURY RAM DD4-3600 CL18-22-22 (I can
just physcially add/remove 2 DIMMs).
With 64 GM RAM the system is seemingly couple of percent faster as
compared to 128 GB. The reason is probably due to the i7 having only 2
memory channels (as any other consumer CPU), so having 4 DIMMs probably
needs extra effort from the memory controller.
Disabling HT and/or VMX in BIOS didn't make a difference. Disabling all
efficient cores in BIOS didn't make a difference.
Current conclusion is that the bottleneck of this system is the memory
speed (RAM and probably CPU cache). My previous benchmarks were made
with DDR4 RAM running at 2400, which is the default top speed for the
DDR4 RAM. In order to get the RAM running at 3600 one needs to go into
BIOS and enable the XMP there. My board has two default XMP settings in
BIOS called XMP-I and XMP-II (one can also manipulate things manually
but I didn't try). XMP is some protocol which allows the DIMM to tell
the BIOS at which speed it should run (I think something like this is
default in DDR5, but for DDR4 is has been added at some point, so older
DDR4 boards might have a problem with this).
Our IT experts also compiled mpi. Their tests found mpi 10% slower than
OMP. Maybe problems with compilation... I tried with 20 layer Fe slab
and also found mpi clearly slower than OMP. So for now I decided not to
invest time in mpi, I think very big cases are anyway not suitable for
this system, because of that memory speed bottleneck.
Perhaps same CPU will run faster in parallel execution with DDR5. Also,
perhaps CPUs with more cache will run faster. But these things are
expensive, and e.g. the premium AMD CPUs are much more expensive than
the i7 that I have. Also cache structure seems to be quite complex
nowadays, so I am not sure if AMD CPUs would be better. Quite obviously,
at this point efficient cores are useless due to the memory bottleneck.
Some OMP and k-parallel results of my current setup below. I think in
general 4x localhost and OMP=2 is the winner.
Best,
Lukasz
With XMP-I the system is up over nearly 2 weeks now (so I call it
stable). The serial benchmark is:
XMP-I, 128 GB DDR4 RAM at 3600, system stable
OMP=1 11.65 seconds
OMP=2 6.93
OMP=3 5.49
OMP=4 4.92
OMP=6 4.09
OMP=8 3.68
OMP=9 4.53
OMP=12 4.41 - 4.85 (results vary within this range more or less)
OMP=16 4.54
In general results can vary maybe by 1% from run to run, I have a
feeling they are quite stable. I think OMP=12 variation might be related
to usage or not of efficient cores.
With XMP-II the system is fastest but unstable (PC froze after 2 hours
and needed hard reboot). The serial benchmark is:
XMP-II, 64 GB DDR4 RAM at 3600, system unstable
OMP=1 12.08
OMP=2 6.87
OMP=3 5.21
OMP=4 4.48
OMP=6 3.92
OMP=8 3.51
OMP=9 4.53
OMP=12 5.07
Previous results (email Feb 22, 2023) with 64GB DDR4 RAM at 2400:
OMP=1 12.82
OMP=2 7.65
OMP=4 5.51
OMP=6 4.87
OMP=8 4.52
OMP=12 5.54
OMP=16 5.55
k-parallel results with 16 k-points (16x Gamma point)
XMP-I, 128 GB DDR4 RAM at 3600, system stable
1x localhost OMP=1 3.05.30 min.sec.
2x localhost OMP=1 1.48.28
2x localhost OMP=2 1.18.23
4x localhost OMP=2 1.03.30
8x localhost OMP=1 1.04.53
8x localhost OMP=2 1.07.19
The best I ever got for this k-parallel test was 0.58.60 with XMP-II
(system unstable) and 4x localhost OMP=2.
More information about the Wien
mailing list