I'm always interested in how mid-range PC components perform in games. Previously, I explored this from the standpoint of relative performance on each CPU I had available to test. I came to the conclusion that, even though some games will be CPU-limited by the more powerful mid-range GPUs I tested with, other, more graphically-demanding titles, will still benefit from the stronger GPUs.
I also noted that, in some titles, memory and PCIe bandwidth were also very important. So, today, I wanted to take a look at some of the various situations I can pull out of the data I generated back then...
Getting over it...
The test systems are as described last time, so I won't repeat that, but all the data is taken from that same testing that I performed months ago. For this article, I want to dredge up how each of these platforms actually affects the individual games in question.
Let's start with Avatar: Frontiers of Pandora
Yes, the data bandwidth bottleneck is strong in this game, especially for the RTX 3070... |
At low settings, we can see that the CPU processing power and data transfer from system memory are the primary limitations. Starting with the Ryzen 5 4600G and finishing with the i5-12400 on DDR5, we see the RTX 4070 and Super, and RX 6800 increase and level off; the RTX 3070 increases in performance in step with every processing power increase and memory bandwidth increase - likely due to the increase in memory bandwidth.
Despite the 12400 paried with DDR5 showing a slightly higher performance for a few of the cards, and the relative standard deviation for these stock tests being around 1 for both Low and Ultra settings, these appear to be outliers in the potential performance space of the components. So, we're talking essentially equivalent performance with both DDR5 setups.
Meanwhile, the RX 7800 XT is really doing its own thing - a story we will see repeated throughout this compilation of results.
There is still an effect of data bandwidth bottleneck but with this setting, the GPU and CPU have a stronger effect... |
At ultra settings, we see a similar story, though more limited in absolute numbers. Generally speaking, we see the incremental jumps in processing power of each CPU design Zen 2 < Zen 3 < Alder Lake, but not to Raptor Lake. The effect of greater memory bandwidth doesn't help as much and we see the gap between highest and lowest performance for each GPU shrink as the GPU-limitation kicks in.
You might question why or how I know this? Well, it's written in the utilisation figures of the CPU/GPU:
For the Ryzen processors, the GPU is pretty much slammed the entire time, less so on the 5700X3D... |
For the Intel configurations, we can see the drop in CPU utilisation when moving to DDR5 but the GPU is still falling asleep half the time, despite performance actually increasing... |
For the R5 5600X and i5-12400 (DDR4), we see higher CPU utilisation than for the R7 5700X3D* and i5-12400 (DDR5), even when taking into account the difference in cores/threads and the reason is quite simple: data management. The RTX 3070 is actually struggling with it's 8 GB frame buffer, despite the low settings used in this test. Below, we can see the reduction in effect from the 16 GB framebuffer on the RX 7800 XT - though it is still present in the R5 5600X graph, if you look closely.
What is confusing me is the GPU utilisation on the intel chip in both configurations. The memory is faster, so the data transfer should be less of a bottleneck**. In fact, we do see a shorter time spent on all CPU operations for the stronger Alder Lake design but what we also observe is the RTX 3070 has a much longer time spent calculating items related to the workloads that the GPU performs.
*We must keep in mind that while the 5700X3D has 2 extra cores, so %utilisation will be lower, in general, the Snowdrop engine is able to take advantage of the extra cores, as well because it's quite multithreaded... The point is, that it still shows overall lower utilisation.
**Hence the increased performance...
The RTX 3070 takes longer to perform everything than any other card - even ray tracing! |
Given how much worse it is than even the RX 6800 at ray tracing, it seems like the low utilisaiton we see on the GPU side is related to internal stalls across the chip while a) various calculations await the outcome of prior calculations, and b) data is shuttled in from system memory...
It seems clear that the Snowdrop engine isn't able to handle the RTX 3070 very well and that there is more performance in the tank for that card if it had been gifted a larger framebuffer!
If we take a look at that middle section of the benchmark (which is ostensibly stressing world data streaming) the plots show higher CPU utilisation on the 12400 when paired with the RTX 3070 than with the RX 7800 XT - despite the more powerful card requiring more data to feed it, the 16 GB framebuffer can just hold more, even on the low setting at 1080p which, historically, we assume is not an issue for 8 GB framebuffers!
It seems that is not the case.
Here, with the RX 7800 XT, we see less of a data management bottleneck using the same Low settings as above... |
Next up: Hogwart's Legacy
Hogwart's paints a much simpler picture (thankfully!), with the title performance scaling nicely with increase in processing power, memory bandwidth, GPU power and, especially important, PCIe bandwidth.
Yes, looking at the 4600G, not only is there an issue with the on-chip cache sizes and Zen 2 cores, there's a very clear bottleneck due to the PCIe gen 3 limitation of that CPU bringing the pretty powerful GPUs all crashing down.
What is interesting on the 4600G is the performance of the Radeon parts being stronger than the Nvidia cards. I don't know this for certain but I wouldn't be surprised if this is related to the Nvidia driver overhead - of which we haven't seen hide nor hair of for quite a while!
"BOOM! Take that, RTX 4070 Super!", Said the R5 4600G... |
What I find interesting about the above chart is the performance of the RTX 4070 Super on the i5-14600KF: it's like the GPU is released from a prison and all of this extra performance comes out of nowhere! This result actually is similar to that obtained by Hardware Unboxed in a similar becnhmarking area, though in that case, they're running an RTX 4090, and it just goes to show that all that extra GPU power is just going to waste!
Unfortunately, I can't see how the 4070 non-super would react to being paired with the 14600KF since I gave it away, but the RX 7800 XT is pegged at near-enough 100 % from the Ryzen 7 5700X3D and up on the chart but it's still increasing in performance by a little... So, while there's clearly a CPU bottleneck, the GPUs I have on hand are mostly getting tapped-out in this demanding title!
The stronger CPU unleashes some more of the potential of the RTX 4070 Super... |
We also see increases going from DDR4 3200 to DDR4 3800 to DDR5 6400, with the RX 7800 XT coming closer and closer to 100% utilisation, even ignoring the increase in CPU power (though that is a primary factor).
Looking at the increase in utilisation: the 12400 DDR4 has an average of 97.8% while the 12400 DDR5 has an average of 98.6%... |
Ratchet and Clank: Rift Apart
Here we start on the heavier of the two Insomniac Engine games. In fact, it's a little too heavy for the RTX 3070, with the game crashing when paired with the 5700X3D* and performing worse on the 5600X compared to the 4600G. Yeah, I think we can safely discount the RTX 3070 as not functioning well in this testing.
*And, in fact, in some of my more recently testing, too - it seems that a game update has made the game WAY less stable on 8 GB VRAM cards...
The game has an appetite for data bandwidth to the CPU, with the 5700X3D performing decently well but not enough to overcome the generally higher bandwidth memory used on the intel platforms that's paired with the more performant Alder Lake cores.
Obviously, this game is very GPU-dependent, given its ray tracing chops, and the hardware with enough VRAM along with dedicated RT silicon (i.e. the RTX 4070 and Super) outclass all the other cards in the race...
The 5600X looks like it does well compared to the Intel parts in the RX 7800 XT testing, but it's only a 1-2 fps difference... |
Meanwhile, on the Radeon side of things, the RX 6800 becomes the bottleneck very quickly, being outshined by the other three newer generation cards. This is another one of those results where the 7800 XT is performing very strangely! It just doesn't appear to function that well on the X3D CPU or the Intel DDR4 platform but recovers nicely on the DDR5 platform - this seems to imply the PCIe to memory bandwidth is limiting the actual performance of the game* and this is something that is not helped by the extra 3D cache.
*We know Insomniac titles heavily utilise the PCIe lanes on GPUs to quickly swap data in and out of VRAM...
The fact that the 12400 DDR4 is performing worse than the 5700X3D shows that data management is vitally important to this title as opposed to it simply being a funciton of CPU power... |
Taking all the factors into account, it becomes apparent that the CPU frequency is playing a big part in why the 5700X3D is dipping in performance ever so slightly compared to the 5600X. CPU compute is a bigger factor in the overall performance picture of this title than data movement is - despite both being important. The biggest question mark surrounding the Radeon cards' performance is when they are paired with the 4600G - or maybe conversely the 12400 DDR4 platform - and I am yet to fully understand the reason for that.
Meanwhile, the RTX 4070 Super's performance curve makes sense, with both greater CPU performance and higher memory bandwidth resulting in higher %GPU utilisation. Ultimately, however, the 12400 and 14600KF perform identically - most likely because of the identical memory and PCIe bandwidth - despite the 1 GHz of extra CPU frequency on show in the case of the 14600KF...
I should really explore this title with memory scaling in a future article because the drop in performance per CPU doesn't match the GPU utilisation numbers.
Spider-Man
The second of the Insomniac titles is less heavy on the GPU and is instead incredibly CPU-bottlenecked. In this instance, the primary limitation is CPU compute performance, followed by bandwidth between the CPU and the RAM. We see this play out for the 14600KF and in the difference between the i5-12400 DDR5 and DDR4 results. Additionally, the larger cache on the 5700X3D really brings that part leaps and bounds above the 5600X to match the 12400 DDR5 result.
In this test, we can see that GPU performance just does not matter, with practically all GPUs performing equally across all CPUs tested. Strangely, enough, the RTX 3070 is often slightly ekeing-out ahead of the average in each platform and I can only assume that this is, again, a result of the Nvidia driver overhead - the smaller GPU causing less overhead.
What's impressive to me is the CPU utilisation of this title on the 5700X3D. When taking an average of the %utilisation and multiplying that figure by the number of threads on each CPU, we actually find that the 5700X3D is only using the equivalent of between 6 and 7 threads whereas the other CPUs all use the equivalent of around 9 threads. Yes, it's not "winning" in terms of absolute performance but it's pulling above its weight in terms of efficiency due to the 3D cache...
This is yet another title I should return to in a future installment, in order to test memory scaling, though the difference in bandwidth of the 12400 DDR4/5 is 52 GB/s to 84 GB/s (using Intel's MLC application). That's a 61% increase in bandwidth resulting in a 9% increase in fps on the RTX 4070 Super and it'd be interesting to see that play out on the less artificially limited 14600KF.
The 7800 XT is close to being fully utilised but that doesn't mean there aren't more frames to be had... |
These charts are another instance where I'm looking at the performance in fps compared with the GPU utilisation numbers and realising that the metric may not be a very useful indicator of a bottleneck. Sure, we all know that CPU utilisation numbers (as an aggregate) are not very useful and instead individual core utilisation numbers are a better way to determine game thread limitations, but until now, many industry commentators have always pointed to GPU utilisation as a good way to understand if you're "GPU-bound".
My data tells a different story, at least in some titles...
It would take a much stronger CPU to max out the RTX 4070 Super in this title... |
Counterstrike 2...
There are few games which are as historically well-known to be CPU-bound as the various esport titles and Counterstrike (now "Counterstrike 2") still remains one of those! The thing with Counterstrike is that the game doesn't appear to be data-bound and instead it's very compute-bound. In this scenario, peak core frequency is king. So, while the 3D cache on the 5700X3D does help it keep up with other parts, the faster 5600X gives better performance in the average fps...
There is some improvement based on memory speed, though. The 12400 DDR5 inches out ahead of its pairing with DDR4, so faster memory does appear to have an impact on proceedings.
What's confusing to me is the GPU utilisation behaviour in this title.
Looking at these graphs, there's no rhyme or reason to the %utilisation and average fps performance! The RX 7800 XT paired with the 5700X3D has some absolutely huge dips in GPU utilisaiton which do not correspond at all with any loss in fps performance. That period of 9% utilisation corresponds to around 430 fps, on average, pumped onto the screen! And the, technically, less utilised 12400 DDR5 with the 7800 XT has better performance!
However, despite what I said above about this game being primarily CPU and memory bound, it's clear that there is SOME GPU bottlenecking going on because the RTX 4070 Super suddenly springs to life to vastly outclass all the other GPUs when paired with the 14600KF. If the CPU was the only bottleneck, we'd expect to see closer average fps across the GPUs.
The only thing I can think of is that this title's performance is also related to the GPU core frequency and %utilisation numbers in this title have little bearing on whatever actual bottleneck is in the GPU. The 4070 Super has an average core frequency of 2850 MHz while the 7800 XT has 2626 MHz during this benchmark. That's an 8% difference which corresponds to an 11% fps difference (436 fps vs 392 fps)... which is pretty darn close in way of an explanation of the results we're seeing...
Yet more evidence that %GPU utilisation can be a meaningless metric... |
Conclusion...
Last time, I started the post saying that I wanted to understand if there was a cut-off point for pairing a particular GPU with a particular CPU. Today's testing rounds out the conclusion I made last time that an RX 7800 XT or RTX 4070 would be the sweet spot for a low-to-mid-range CPU but that an RTX 4070 Super would still be worthwhile if games with more graphically-demanding were in the planned usage.
Today, I think we can conclude that the RTX 4070 Super is not overkill for any of these mid-range CPUs (aside from the Ryzen 5 4600G!). Yes, you'll be losing around 20 % performance in some games by not having a CPU as powerful as an i5-14600KF but the 4070 Super still provides a 10 - 20 % performance improvement over the other GPUs tested on the weaker CPUs in games where you're not CPU-limited... This level of performance also allows gaming at higher resolutions and quality settings and, quite frankly, is the best value GPU in terms of price to performance for the time being!
However, it does seem to me that pushing for a stronger GPU will result in a real wastage of performance without purchasing a more powerful CPU. Thankfully, CPUs are still cheaper than GPUs. Hell, a CPU + RAM + motherboard is still cheaper than a mid-to-high end GPU!
The last nugget that I'm taking away from this testing and analysis is that we cannot rely on the %GPU utilisation metric to determine if we are really GPU-bound in a given title. Actual on the ground testing needs to be done in order to definitively verify whether that's true or not on a game-by-game basis.
As a result, we have lost one "surefire" performance metric which many people have looked to over the years...
No comments:
Post a Comment