29 March 2024

Analyse This: Simulating the PS5 Pro...



Last time, I took a look at the PS5 Pro leaks and came to the conclusion, based on the claimed performance uplift, that the PS5 Pro will likely be heavily CPU-constrained (considering the upgrade in totality) and that it's likely that many titles will not take advantage of any 'RDNA3' architectural improvements on the APU.

It still baffles me why Sony would even bother releasing this thing, as I concluded:
"Honestly, a part of me is wondering why AMD/Sony didn't go with the same 36 CU configuration, but using RDNA 3 instead. They'd get the RT bonus performance and they could have clocked the GPU frequency higher to achieve a similar level of raster performance though at a cost to power use. The die would also be cheaper - and this is doubly important if there is some sort of CPU bottleneck in play - you've got a lot of wasted die area spent without capitalising on the potential performance."
But, let's not dwell on theoreticals, let's do some testing!


Of Mice and Men...


From the leaks we've had thus far, the CPU is not changed from the base PS5. That means no real clock boost to speak about*, ZERO increase in cache sizes, and no architectural upgrades such as IPC or cache sharing... it's still that separated cache design from Zen 1 and Zen 2 between the two CCXs.
*At least not without an associated reduction in GPU power...
This is particularly worrisome because the key improvement (in terms of gaming) in Zen was improving data locality to the cores performing the workload. This was achieved through 1) expanding the L3 cache and 2) having that cache available between all cores on a CCD. Further improvements have taken the form of expanding on that already enlarged cache structure for the X3D designs, which has shown to be very advantageous in gaming workloads.

The Zen 3 architecture enabled a shared cache which minimises data transfer between the CCXs...

The upshot of this is that we know the approximate performance envelope of such a mobile SoC - circa. 30 W of power dedicated to the CPU portion of the APU. Now, many reviewers liken the performance of the Ryzen 5 3600 to the PS5 CPU* but, in reality, it's actually a more performant part as it has a higher power limit, has a higher operating frequency**, and a larger L3 cache***...

However, there are desktop equivalent parts from AMD that will more faithfully simulate the real potential CPU bottlenecks stemming from those almost guaranteed cache misses.
*65 W
**~4.2 GHz compared to the max. 3.5 GHz on the PS5
***2x 16 MB compared to the PS5's 2x 4 MB
To sum up, I've gone ahead and purchased an analogue part to be able to show how the jump from mobile Zen 2 to Zen 3 would have faired for the console ecosystem: the Ryzen 5 4600G. And, in the process, show how both the architectural uplift as well as cache size increase would have really helped Sony's Pro console...

For comparison, I'm squaring the 4600G up against the Ryzen 5 5600X: a nice, fast Zen 3 part. Both parts will be pared with 16 GB of dual channel DDR4 3200 - something which I've proven is perfectly fast enough to not bottleneck the system...

Yes, these are 6-core parts, rather than 8-core. However, we've seen that most games do not suffer a lot of performance loss when dropping two cores in situations where we're not pairing the greatest of the great GPUs with them. So, I think this test can be pretty representative of the PS5's mobile Zen 2 implementation - even down to both the cache split and reduced size...

Last time, I also came to the conclusion that the new and improved PS5 Pro will likely have absolute GPU performance of around an RX 6800 in its worst implementation (to achieve that claimed 1.45x performance uplift) - fewer GPU cores but better data management, overall. Happily, I happen to have an RX 6800 in my GPU stable.

The rumours of the PS5 Pro say that it is an RDNA 3 implementation, though... Luckily, in a stroke of dumb luck, I ALSO have an RX 7800 XT in the stable right next door to the RX 6800! 

With all of this hardware I can also begin to simulate the effect of performance from moving from RDNA 2 to RDNA 3 at the same time as assessing the potential CPU uplift and generational difference.


RDNA 3 has several improvements over RDNA 2 - primarily, one of which is the energy efficiency of the core die...


Cranking up the Heat...


Now, it's not a fair comparison to just test these parts at stock settings. In fact, it wouldn't make any sort of logical sense! Going back to the power draw of the APU in the PS5, the whole system is around 160 - 200 W total power, with a claimed draw of 195 - 200 W when gaming for the original model (not including the disc drive), 200 - 210 W for the 6nm refresh and 215 W for the PS5 slim - which seems a bit backward to me! 

Are Sony actually pushing the console harder? I was unable to find any information from reviewers saying that it performed better. The only thing that might make this make sense is the knowledge that with each iteration of the design, the heatsink was reduced, with the slim having the smallest heatsink.

We know that, at a given frequency, a silicon die will require more voltage to maintain that frequency (stably) as heat increases. Of course, the more voltage applied to the die to stabilise the frequency will also increase the heat outputted by the die as well. This is why extreme overclockers use liquid nitrogen to cool their chips to such an extent that the voltage can be cranked way up, along with the frequency. With that increase in voltage, power usage also must increase.

Therefore, there is a logical argument to be made that Sony are sacrificing the APU efficiency by allowing it to operate at hotter temperatures, requiring more voltage / energy to operate and saving money by reducing the cost of the heatsink used.

But, I digress...

The reason we're here, talking about this at all, is because both CPUs being analysed operate at up to 65W and the GPUs up to 220 - 250 W. Combine the two and you're talking over 80 - 100 W more than the console. Not to mention the fact that all of these desktop parts generally operate at frequencies above those found in the console hardware. So, we need to pare them back!

To do this, I've implemented a static clock on both the CPUs 3.5 GHz (and 3.8 GHz for those tests). For the 4600G, no further power saving measures were taken other than to disable the integrated GPU. For the 5600X, I activated eco-mode on top of the clock frequency limit and dropped the core voltage to 1.1 V. These measures contained both processors to between 30 - 40 W for the total package power, while maintaining a static frequency.

The GPUs are a bit more of a complicated situation.

Power readings are different between the two architectures but also you will note the difference in the distribution of that total power budget...


First off, we are expecting 56 CU in the Pro, not 60 CU. But, we are limited here with what's available on the consumer shelf.

For both GPUs, I limited the operating clockspeed to approximately 2.23 GHz (as per the PS5 GPU). I also undervolted both cards but set the memory frequency on the RX 6800 to 2100 MHz (which, while not matching the 18 Gbps in the Pro, closes the gap a little). The 7800 XT cannot have the memory underclocked, so it stays as-is. I also set both cards' memory systems to have the fast timings.

For the power limit, I wanted to match as closely as possible the theoretical numbers I worked-out in the prior blogpost for the compute portion of the die. To meet these power limits, I redefined the soft power play tables for the 6800 to allow power limiting to -50% and manually set it to -40%, while for the 7800 XT I limited it to the lowest minimum, -10%. This allowed me to approximately match a 120 W budget for the core and SoC power. I did not count the contribution of the memory systems to the total - for good reason.

This will be a static cost on the Pro and will likely match or nearly match the RDNA 2 card. The RDNA 3 card has a vastly inflated power consumption for operating the chiplets and so this is not representative, at all...

Of course, this same logic also applies to the SoC power, as well, but we can perform an estimation that the actual power used is similar to that for the RX 6800 - which gives us the ultimate takeaway that RDNA 3 has a good amount of power efficiency compared to RDNA 2 - it's just the chiplets which have screwed everything up for AMD (in more ways than one)...

Power and frequency limited power consumption for the GPU core and SoC. Not 100% representative of what will be happening in the PS5 Pro APU...


With these imperfect measures in place, I then benchmarked four modern titles:
  • Dragon's Dogma 2 (High + RT)
  • Hogwart's Legacy (High + High RT)
  • Ratchet & Clank: Rift Apart (High  + High RT)
  • Starfield (Optimised settings)
These titles were tested at 1080p with the above-noted settings and would represent the potential graphical requirement for upscaling to 4K. Most imortantly, the titles in question run the gamut from graphics limited (Ratchet) to CPU limited (Dragon's Dogma) to a combination of the two (Starfield and Hogwarts).


Let the games begin...


Let's start with the PS5 CPU analogue part, the Ryzen 5 4600G. When using the power and frequency limited GPUs, performance is very similar, with only DD2 and R&C showing any real gains in performance. Both Hogwart's and Starfield are quite CPU-limited in these tests.

The CPU bottleneck in Hogwart's and Starfield is readily apparent, here. But also for all of the titles, in general...


The limited 5600X unsurprisingly does better than the 4600G in all tests, we can see how the general trend is confirming the premise of each game's bottleneck in the section above. Each game has more performance unlocked when the processor is upgraded but the graphics-heavy title (Ratchet and Clank) gets a higher performance uplift when using the 7800 XT. 

Yes, the CPU-bottleneck is still there on DD2 and Starfield but the bottleneck is 'higher'.

The more powerful CPU allows the 7800 XT to shine more, but not completely...


If we present the data in a slightly different way, we can instead directly compare the effect of the CPU used for a particular GPU. 

Aside from having better average fps over the 4600G, the 5600X also displays better minimum fps when using the RX 6800.


Although fps minimums can be run-specific, in general, the 5600X does better due to the cache hierarchy...


We observe a similar, though larger gap between the processors when pairing them with the RX 7800 XT - showing that all of these games will benefit from a stronger GPU and the RDNA 3 archictecture.


The GPU bottleneck in Ratchet and Clank is released but there is a slight improvement in the other three games for the 5600X indicating that the limited RX 6800 is still holding it back very slightly at the tested settings...


Frequency matters...?


Moving onto the question of what possible gains could be had from the boost in CPU frequency from 3.5 GHz to 3.8 GHz, I also have repeated the above tests with the 4600G locked to 3.8 GHz, to mimic the rumours...

Paired with the RX 6800, we essentially see no improvement, except a slight bump in both average and minimum fps in Starfield.


The increase in frequency has effectively no effect on performance except in Starfield...


When plaired with the 7800 XT, we do see a slight, consistent bump in performance for all titles. However, I would doubt that it's even noticeable to the user. Putting to bed any thoughts that this slight clock frequency increase might lead to better performing games...


Again, a very slight bump on the order of a couple of fps. Not noticeable to the human eye...


One thing to note, here, is that there are random stutters when using the 4600G that just don't manifest when using the 5600X*. Hogwart's and Starfield are the most egregious offenders - especially when running through and area for the first time. 

In contrast, Ratchet is very CPU optimised and doesn't really display these to the user. Dragon's Dogma just has slow,  stuttery performance in general on the 4600G. It is playable, but barely...

However, even with the shader caches 'warmed up', Hogwart's often displayed these long loading stutters. The fact that this game runs on the consoles at all is pretty amazing!
*This actually isn't entirely true. I am able to make them appear when using the 5600X - when leaving my system memory running at default JEDEC speeds of 2133 MT/s!

The run with the 4600G @3.8 GHz happened to bit encounter one of these stutters but I believe it was just by chance...


The poor CPU just doesn't have enough on-die cache to manage all the data transfers between system storage, RAM and GPU - not to mention the gruelling task of computing each individual frame at the same time!



Conclusion...


Way back when, I poo-pooed the idea of Sony updating to Zen 3 from their Zen 2-based design based on the logic that it would cause more problems than it would solve in terms of existing software support - something that developers have highlighted with a potential move to RDNA 3 as well (if it is not handled carefully).

That choice to include the dual-issue DCU design can have potential issues with software compatibility but this would likely be mitigated by not increasing L1 and L2 cache sizes on the GPU as, essentially, RDNA 2 and 3 are "the same", otherwise. This would limit the actual uplift from including RDNA 3 intellectual property in the Pro (much as was done for the much vaunted 'rapid-packed math' of the PS4 Pro) but could provide situational benefits when coding 'to the metal'.

Unfortunately, something similar cannot be performed with the CPU. It would have been preferable to see the anaemic L3 cache of the CPU increased to 16 MB or 32 MB (as on the desktop Zen 2 counterparts) and without this architectural improvement, any clockspeed increase will only yield single digit fps increases.

I believe we can observe in these tests that there is a large effect from the architectural increase from Zen 2 to Zen 3, which is likely mostly coming from avoiding cache misses. We can also see how keeping the same CPU and architecture will hinder any GPU increases outside of resolution and image quality improvements - which will almost certainly be on the cards for any Pro enabled game titles.

A 1.45x performance increase over the RX 6700 lands us squarely in the ballpark of the RX 6800...

Moving back to the GPU, it seems that RDNA3 does have the potential to improve certain games by a small margin - on the order of 5 - 10 fps. 


All-in-all, the potential improvements that are on the table seem unlikely to push many games from 30 fps to 60 fps using the same quality settings, if they weren't already sitting at around 40 - 50 fps on the base PS5 console. 


I would like to end this thoughtful blogpost by pointing out that it's important to remember not to focus too much on the absolute numbers shown here, since we are not really matching a theoretical PS5 Pro console. However, I believe we can look at the broad strokes and look at what such a console might bring to the table. 

I hope you found this post interesting. Stay tuned, I have more analyses on the way!

No comments: