[Wien] i7-13700K benchmarks

pluto pluto at physics.ucdavis.edu
Wed Mar 15 17:19:37 CET 2023


Dear All,

This might be useful for anyone who is building a Linux PC system.

I have some more insight into the speed using i7-13700K, which is the 
current 13th gen Intel CPU. I have Z690-P D4 Asus board and either 128 
GB (4x32) or 64 GB (2x32) Kingston FURY RAM DD4-3600 CL18-22-22 (I can 
just physcially add/remove 2 DIMMs).

With 64 GM RAM the system is seemingly couple of percent faster as 
compared to 128 GB. The reason is probably due to the i7 having only 2 
memory channels (as any other consumer CPU), so having 4 DIMMs probably 
needs extra effort from the memory controller.

Disabling HT and/or VMX in BIOS didn't make a difference. Disabling all 
efficient cores in BIOS didn't make a difference.

Current conclusion is that the bottleneck of this system is the memory 
speed (RAM and probably CPU cache). My previous benchmarks were made 
with DDR4 RAM running at 2400, which is the default top speed for the 
DDR4 RAM. In order to get the RAM running at 3600 one needs to go into 
BIOS and enable the XMP there. My board has two default XMP settings in 
BIOS called XMP-I and XMP-II (one can also manipulate things manually 
but I didn't try). XMP is some protocol which allows the DIMM to tell 
the BIOS at which speed it should run (I think something like this is 
default in DDR5, but for DDR4 is has been added at some point, so older 
DDR4 boards might have a problem with this).

Our IT experts also compiled mpi. Their tests found mpi 10% slower than 
OMP. Maybe problems with compilation... I tried with 20 layer Fe slab 
and also found mpi clearly slower than OMP. So for now I decided not to 
invest time in mpi, I think very big cases are anyway not suitable for 
this system, because of that memory speed bottleneck.

Perhaps same CPU will run faster in parallel execution with DDR5. Also, 
perhaps CPUs with more cache will run faster. But these things are 
expensive, and e.g. the premium AMD CPUs are much more expensive than 
the i7 that I have. Also cache structure seems to be quite complex 
nowadays, so I am not sure if AMD CPUs would be better. Quite obviously, 
at this point efficient cores are useless due to the memory bottleneck.

Some OMP and k-parallel results of my current setup below. I think in 
general 4x localhost and OMP=2 is the winner.

Best,
Lukasz


With XMP-I the system is up over nearly 2 weeks now (so I call it 
stable). The serial benchmark is:

XMP-I, 128 GB DDR4 RAM at 3600, system stable
OMP=1 11.65 seconds
OMP=2 6.93
OMP=3 5.49
OMP=4 4.92
OMP=6 4.09
OMP=8 3.68
OMP=9 4.53
OMP=12 4.41 - 4.85 (results vary within this range more or less)
OMP=16 4.54

In general results can vary maybe by 1% from run to run, I have a 
feeling they are quite stable. I think OMP=12 variation might be related 
to usage or not of efficient cores.

With XMP-II the system is fastest but unstable (PC froze after 2 hours 
and needed hard reboot). The serial benchmark is:

XMP-II, 64 GB DDR4 RAM at 3600, system unstable
OMP=1 12.08
OMP=2 6.87
OMP=3 5.21
OMP=4 4.48
OMP=6 3.92
OMP=8 3.51
OMP=9 4.53
OMP=12 5.07

Previous results (email Feb 22, 2023) with 64GB DDR4 RAM at 2400:
OMP=1 12.82
OMP=2 7.65
OMP=4 5.51
OMP=6 4.87
OMP=8 4.52
OMP=12 5.54
OMP=16 5.55


k-parallel results with 16 k-points (16x Gamma point)
XMP-I, 128 GB DDR4 RAM at 3600, system stable
1x localhost OMP=1 3.05.30 min.sec.
2x localhost OMP=1 1.48.28
2x localhost OMP=2 1.18.23
4x localhost OMP=2 1.03.30
8x localhost OMP=1 1.04.53
8x localhost OMP=2 1.07.19

The best I ever got for this k-parallel test was 0.58.60 with XMP-II 
(system unstable) and 4x localhost OMP=2.





More information about the Wien mailing list