29 September 2024

Analyse This: Simulating the PS5 Pro... (Part 2)

It's the power of...


The Playstation 5 Pro talk is really hogging the headlines but I really wasn't 100% happy with my prior look/simulation of the device. So, I'm back here, today, putting together a quick look at a simulated PS5 base versus a simulated PS5 Pro - with PC parts!

As a side benefit of this, we also get a chance to look at the effect of changing GPU architecture and a potential sly look at an upcoming blogpost where I look at the differences between monolithic RDNA3 versus RDNA 2.

So, let's jump in...


Setting Up...


As with last time, I feel it's necessary to point out that this study has many flaws and inaccuracies. There are no equivalent pieces of hardware between the console and PC space. Many have tried to make this comparison and many have failed.

The long of it is, that the console is an APU with severe power limits. The chip is monolithic, so has some gains in intra-die latency, but chip has far less cache than almost any desktop part - both for the CPU and GPU. On the CPU side, we can replicate the CPU cache hierachy and split CCX design because we have desktop-equivalent APUs. 

However, what we can't do is replicate the shared GDDR6 memory system and custom cache scrubbing and data management silicon within the APU which allows better management of the data between CPU and GPU - we can see the potential effect of this, later. GDDR6 also has a latency disadvantage compared to desktop DDR4 memory, but an overall bandwidth "win".

On the GPU side, it lacks the defining feature of RDNA 2 - the L3 "Infinity Cache", while avoiding the pitfalls of the chiplet architecture introduced to mid-range and high-end RDNA3. However, this APU also lacks one of the benefits that those mid-range and high-end designs brought to RDNA3 over RDNA2 - i.e. a higher front-end clock

The APU also has less memory bandwidth than the desktop RX 7800 XT but a bit more than the RX 6800 non-XT (which are the compute unit comparable parts on the desktop) and, as noted above, this is shared between all APU functions. 

Both CPU and GPU on the APU operate at lower frequencies than their desktop counterparts, which will hold them back, somewhat.

The RDNA architecture has altered over time but mostly in the manner in which the code instructions are able to be serviced by it...

If we take a look on the PC side of things, we get the inverse of everything noted above, with some caveats! 

Data from the SSD has to be transported to the CPU, then to the memory, then operated on to be decoded/unencrypted/decompiled (depending on storage state), sent back to the memory and then forwarded, through the CPU, to the GPU memory for it to be able to work. There is additional overhead, per frame, for any data which needs to be shared, monitored, and updated for both devices to work on it over multiple frames which requires a back-and-forth along the PCIe-to-memory interface. 
In this sense, the APU of the console has a massive advantage as it can more easily work on the same data, as necessary, as well as having less latency-sensitive data being able to be accessed faster!
The PC parts also require more energy to run with the same performance: the distance between data and where it needs to be incurs a HUGE energy cost and by both having everything on a monolithic chip, as well as having soldered memory and storage with shortened distances will help with reducing the energy required to push that data around! i.e. The chiplet design for mid-range and high-end RDNA desktop parts is detrimental to their energy efficiency.


Moving back to the overview: 

The short of it is that we can approximate the performance of a console with dedicated PC hardware but never be 100% accurate - there will always be specific optimisations in code that the developers can lean on in the console space to push performance above where they will be with otherwise architecturally exact PC parts.


Moving Out...


With all that futzing around out of the way, we can address what I actually intend to analyse in this post. Like last time, I'm going to look at a Ryzen 5 4600G system, with 16 GB DDR4 3200 and various GPUs. The CPU is clock-locked to that of the CPU in the PS5 - 3.5 GHz. 

For the GPUs, things are a little more compliated...

I couldn't get ahold of an RX 6700 non-XT which would represent a base PS5 quite nicely. Or, at least, not for a reasonable price and quality! However, the RX 6650 XT is very close in performance if we leave it at stock settings... So, I'm planning to use that.

It's not perfect, the 8GB of VRAM may hold it back in some titles, but it's enough...


Yes, the 6650 XT will run at higher clock speeds than the actual PS5 GPU but this will make up for the lower Compute Unit count.

I've already pegged the RX 7800 XT and RX 6800 non-XT as being potenial PS5 Pro GPU equivalents - downclocked, of course... but we have another aspect to explore here...

The rumours and talk about the Playstation 5 Pro being RDNA 3 or 3.5 in architecture for the GPU are rife in the tech community. So, why not introduce a PS5 base equivalent with "RDNA3"? Yes, the RX 6650 XT is already an approximation of the GPU performance of the desktop equivalent part (RX 6700) but there is also another part which was roundly decried as being a pointless upgrade: the RX 7600!

For this testing, I've chosen an RX 7600 XT 16GB. (A), to avoid issues with memory usage and (B) to match the Compute Unit count of the RX 6650 XT - which, I am going to use to my advantage by allowing the card to run at default settings.

This should allow us to see some differences between RDNA 2 and RDNA 3 architectures both at the current GPU performance of the Playstation 5 but also at the potential Playstation 5 Pro.

Finally, I've chosen games which have a counterpart on the PS5 and have had console equivalent settings defined by outlets in the industry:
I've also taken some of these titles using these same settings and run them either with RT enabled (when not previously enabled) or at a different resolution to that used on the PS5 to see the effects on and capabilites of the GPUs in this analysis.


Aaand, with that - let's get into the testing... All results can be found here.


Swing Away, Swing Away...


Alan Wake 2...


Back when Alan Wake 2 released I did a performance analysis using the GPUs I had on hand at the time. While I didn't test the specific settings combinations used below, I did note that there wasn't a lot of performance uplift from resolution changes or from the settings changes at the same resolution. 

What I'm seeing in today's results is that - even though we're using a much weaker CPU (I used the i5-12400 in my original testing) and the GPU is clock-limited, both the RX 6800 and RX 7800 XT are performing WAY better! So, Remedy/AMD have done quite a lot of optimising on this title and perhaps on the graphics driver side as well. I had previously noticed the improvements in upscaling visual quality - especially when using ray tracing - but this is on another level! 

The same benchmark run back at release gave me 95 fps and 73 fps for the 7800 XT and 6800, respectively, on the low settings, without RT enabled. Now, we're getting 104 and 80 fps on a much weaker setup with the PS5 Performance settings - which are essentially the low settings with an internal 847p resolution (compared to 1080p native). Given that we know the increase in performance from increasing scaling factors has pretty harsh diminishing returns in this title, I believe that we're looking at the performance difference between the Low and High presets on "stock" PC hardware.

I think that's pretty impressive and worthy of a shout-out!


Alan Wake 2 is very heavily reliant on GPU performance and throughput... though as a write this, I realised Quality mode should have been tested at 2160p FSR Balanced and not 1440p... Still, similar scaling will apply...


Considering that Alan Wake 2 on the console does not have ray tracing enabled, what we see in the charts above is that the PS5 Performance mode running on the RX 6650 XT matches the approximate performance of that mode running on the console. However, in quality mode, if the output resolution had been kept at 1440p, instead of being raised to 4K, the PS5 GPU equivalent RX 6650 XT would have been performing better than the ~30 fps the game is locked to on the console. At upscaled 4K, both the 6650 XT and 7600 XT are managing to keep a slightly sub-30 fps presentation...

I understand that Remedy wouldn't want to present that in a 60 Hz container on a TV or monitor as it would provide a poor experience for the user. However, the less consistent actual 30 fps quality mode on the PS5, with dips into the twenties, is just as bad, in my opinion. The message is clear, though - there's extra performance headroom available there if the resolution had been dropped slightly from 4K with FSR without notably affecting the visual quality on a per-frame basis. 

In my opinion, this decision was a poor one, on Remedy's part.
There is another aspect of the performance of this game which is on RDNA2 parts there are numerous, repeated frametime stutters. The cause of which is not readily understandable. Upon release, I did not observe them but other outlets did. However, now I am experiencing these issues on both the RX 6650 XT and RX 6800 but neither RDNA 3 card. This is very strange and, given certain recent events, I'm wondering if this is a Windows issue (though I'm on Windows 10!)...
Moving over to the RDNA3 equivalent part, the RX 7600 XT we see no real improvement in these two quality modes - perhaps a couple of fps improvement but this is just one small section of the game and that is most likely not a consistent difference. When we switch to the addition of ray tracing effects (low preset, of course!) the RDNA3 architecture actually does help a little in that situation - as I have previously noted - but the performance is still sub-30 fps and I wouldn't want that experience on the PS5 base console.


Having re-inserted all the cards to test the true PS5 Quality mode equivalent, we see the limitations of the "equivalent PS5"...


The PS5 Pro equivalent setups are more interesting. First up, there's essentially a guarantee of being able to push 60 fps, even in the quality mode, there is the suggestion that a Pro-enabled "Performance RT" mode at 30 fps could be on the cards with RX 6800-class of hardware available on the PS5 Pro, with a potentially small increase in ray tracing performance if the RDNA3-style dual-issue FP32 upgrades to the compute units is in place. 

The second interesting thing, to my mind, is that the performance difference between the monolithic RDNA2 > 3 jump compared to the monolithic RDNA2 to chiplet RDNA3 design is truly huge: A relatively miniscule 6% difference on the former but a 20-30% difference on the latter. While some of this performance may be explained through the memory frequency difference (17.2 Gbps on the overlcocked RX 6800, versus the stock 19.5 Gbps of the RX 7800 XT),  I'm still placing my bets that the decoupling and increase in front-end clock frequency on the RX 7800 XT is the real performance enhancer on RDNA3. Especially since I didn't observe any real gains in performance when adjusting memory speed on either card.

This is another reason why I specifically purchased the 6650 XT and 7600 XT - I'll be exploring the differences between RDNA 2 and 3 in a future blogpost!



Here we see a different story to that in Alan Wake 2. RDNA3 really shines over RDNA2...


Avatar....


This game has a huge focus on GPU rendering. I always thought that Alan Wake 2 had a similar focus but looking at the above tests, it's clear that the engineers over at Massive Entertainment have really optimised based on the GPU's ability to perform. Their engine allows extra headroom in FP32 compute and other GPU-specific factors to really enable better performance and this puts the Snowdrop engine into a position that very few game engines occupy at the low-end.

In this test, all results are ray traced - since Snowdrop doesn't have a fall-back fully rasterised mode. We see a 30% difference in performance between the 6650 XT and the 7600 XT, most likely due to the extra compute performance. However, at the high-end, the 7800 XT performs only 6% better than the RX 6800 - exposing a potential CPU or data bandwidth* bottleneck.
*from storage/sytem memory...
What's really interesting, here, is that for this game, there was the possibility of quite large gains just from the upgrade to the RDNA3 architecture! There was no need for the expanded CU count of the PS5 Pro and it speaks to the fact that not all games will be able to benefit equally between the two versions of the console...

However, what's clear from the provided benchmark in the PC version is that the base console is not up to snuff in being able to provide a clean 60 fps in the performance mode, with these results confirming the problematic presentation on the base console...

What is nice, though, is the ability of either hardware version of the "Pro" being able to reach 60 fps in both modes - which means the CPU is able to reach this feat. 

So, for this game, I wouldn't be expecting any resolution upgrades for a Pro-enhanced version, only graphics settings updates.


Hogwart's Legacy is a deeply CPU-limited title...

Hogwart's...


Hogwart's Legacy is not a first party title for Sony, so I doubt it was included in their +45% rendering performance calculation. However, it's an important game to test because it exhibits limitations from CPU, memory bandwidth, and GPU compute - it's one of those rare games that straddles the line of PC infrastructure design when choosing which parts you want to build with!

What we can observe, here, is the difference between RDNA2 and 3 -with the extra FP32 compute of RDNA3 giving an advantage over RDNA2 when it comes to the 100% rasterised testing. For ray tracing, things are more complicated, with only the RX 7800 XT showing any sort of gains once enabled. This may be another reflection of the quickened frontend clock frequency this part has...

I haven't tested the "Performance" mode settings, here, since I didn't have them specified but since we're observing fps values which are the same as those defined for the Fidelity and Fidelity RT modes with the base PS5 hardware, I'm fairly confident in saying that the CPU is the primary limiting factor for these high-end settings. It's unlikely that HL will improve on these two settings and would likely leave them to a 30 fps rate, but perhaps improve the rendering resolution or the performance mode, which could have more demanding graphical options at 60 fps...


Ratchet & Clank displays two issues but by far the most egregious is the VRAM limit on the 6650 XT...


Ratchet & Clank...


This title is one of those where the developers have easily observable optimisations which work specifically on the console hardware. The PS5 Performance mode achieves a locked 60 fps on the console but, here, we're VRAM limited* on the RX 6650 XT. This is multiplied multi-fold when switching to 4K resolution... and the 16 GB framebuffer of the RX 7600 XT handles both situations with ease.
*The reason is that with the monitoring software I used (HWInfo64, MSI AfterBurner, RTSS) we never approched 6 GB application usage, let alone near 7 GB, at 1400p. So, this result is a little inexplicable to me... Memory bandwidth is very similar between this card and the RX 7600 XT, so, that doesn't explain the issue we're observing.
Other than that, we see that both the RX 6800 and RX 7800 XT perform identically, with the game logic apparently running into a CPU bottleneck - with barely any performance difference between the two tested resolutions - meaning that any PS5 Pro port of the game will be able to push the graphical quality very heavily, no matter the internal rendering resolution, because the CPU limitation is the primary factor for performance.


Interestingly, the RX 7600 XT draws equal with much more powerful GPUs...


Spider-Man...


Finally, we take a look at Spider-man: Remastered. This game is also heavily CPU-limited - to the point where the improvement provided by the RX 7600 XT over the RX 6650 XT matches that of the 6800 and 7800 XT parts.

However, this PC comparison falls apart slightly because it isn't entirely able to keep up with the console port - meaning that those console-specific adaptations have been applied, once again. None of the CPU+GPU combinations are able to achieve a 60 fps minimum in any of the performance modes and we know this is entirely possible on the console port of the game. 

Therefore, this data should be taken as an approximation: we should ignore the absolute values (i.e. lower than 60 fps average) and instead focus on the trend. And the trend is telling us that the game is heavily CPU-bound in performance.


It's like it's the same graph!


I think this title is not a good test of the hardware because of how it's interacting with the software. If I hadn't have performed the tesitng, I wouldn't even have included it in the summary of this article!

The end result is that, like Ratchet, we could expect a large headroom in graphical quality upgrades in this title but very little in terms of actual framerate improvements - at least from the limited overlap in terms of performance profiles between the console and PC...


Wrapping Up...


If we take an average of the results of both the core clock frequency limited RX 6800 and RX 7800 XT over the "PS5 equivalent" RX 6650 XT when paired with the 3.5 GHz limited Ryzen 5 4600G, we get +42% and +62%, respectively.

Once again, the performance profile of the RX 6800 comes out closer to that of the claimed performance difference by Sony. While I believe, from the claimed teraflops number, that the RDNA3 architecture is used in the Pro, the gains we observe in the mid-range and high-end are not available to the console. This points to my position that a large percentage of the gen-on-gen improvements in performance come from that higher clocked fron-end design, which is missing in the monolithic RDNA3 designs - including console.

So, despite the (rather large) claims of many commentators on the effect of moving to RDNA3 or 3.5, the actual performance uplift is going to be rather moderate, when speaking of the architectural makeup, compared to the pure number of FP32 compute unit uplift.

Aside from that, it's not clear that any such gains from a change in architecture will guarantee a performance uplift given the CPU bottlenecks which exist in many game and game engines being targetted to the platform.

The PS5 Pro will likely have its work cut out for it in many titles which do not heavily focus on GPU rendering techniques which would avail them of any extra performance if given access to a larger rendering pipeline...

One final thought I have from doing all this is - why increase the size of the GPU? Simply moving to a 36 CU RDNA 3 design would have brought developers a decent performance increase, with only games which have a GPU-driven rendering engine really benefitting from the increased CU count. I'm not sure how common those examples are, compared to engines which will experience a CPU-limit.

No comments: