15 September 2024

How Powerful is the PS5 Pro?


In part two of my Playstation 5 Pro fever articles, we take another look at the power the console will likely bring to the table. One of the most common discussions that's surrounded the announcement of the Playstation 5 Pro has been the performance of the GPU, with some likening it to an RX 7800 XT or even cards on Nvidia's side of the aisle.

The reality is simultaneously simple and a bit more complicated than just a comparison to a single desktop card.

So, let's take a look!


Previously, I've taken a look at the logic of what's happening based on the prior leaks. I also simulated the performance of the Pro using a Ryzen 5 4600G and an RX 6800 and compared it with both an RX 7800 XT and using a more powerful CPU architecture like the Ryzen 5 5600X - with all parts locked to the console frequency specifications, or as close as is possible. 

As PlayStation Pro fever ramps up, we have many people speculating about the performance now that the +45% rendering uplift and 2-3x ray tracing performance is confirmed from Sony's side of things. 

Actually, people have mostly settled on the rendering performance of an RX 7700 XT and RX 6800 as a pure rendering equivalent, which is around 45-46% faster than the RX 6700 (the equivalent desktop card to the base PS5) and this is something I can agree on. 

In raster performance, we're looking at the RX 6800, RX 7700 XT and RTX 3070 Ti... but these results are very dependent on the games tested and analysis of TechPowerUp has shown that these percentages are not as accurate as their actual performance reviews... I also really like the much recently maligned Tom's Hardware GPU Hierarchy for this sort of data...


What I'm finding myself not agreeing on is the level of performance uplift associated with the ray tracing.

Some are positing the RTX 3080 and RTX 4070 as an RT equivalent. Though, there are those that are adding their own dissenting voices to the discussion.


What's a Fair Comparison?


The problem with all of this pontificating is that it's ultimately almost pointless. As is the problem with talking about CPU IPC improvements, GPU testing and relative performance will heavily depend on four things:
As games industry commentators and enthusiasts, we can spend hours justifying our decisions based on the typical TechPowerUp database, individual testing, or random youtube videos. Ultimately, none of these are 100% perfect and many have known (and oft-ignored) flaws in their and our methodology for comparisons.

The only truly accurate comparison is by direct capture of the output of a system, such as Digital Foundry perform. Even frametime captures from internal software monitoring systems (e.g. RTSS, FrameView, etc) only come close to the actual end-user experience as they can't take into account the disparity between the system output the display.

However, if we had to be perfect, there would be almost no analysis or discussion in the world - and I think that would be very boring and trite. We need new voices in the space and new challengers to actually create a (preferably amicable) discourse. So, with that out of the way, let's address the issues with Sony's disclosure...

Tom's scaling shows that both the RX 6800 and 7700 XT are around 45% better than the RX 6700 at 1440p Ultra, but are these the resolution and quality settings that Sony is testing against?


As noted above, the main claims Sony has made that get everyone excited are a 45% increase in rendering, and a 2-3x ray tracing calculation speed increase. Those sound great on paper but have several major caveats in that those same four principles for testing GPUs apply, here, as well!

If we gloss over the first three points, we find that the RX 6800 or RX 7700 XT are approximately 45% faster than the RX 6700, ignoring the fact that the RX 6700 has a faster core clock speed than the base Playstation 5 console does. Digital Foundry have effectively confirmed that this GPU is very similar in performance, even with higher clockspeeds, to the console hardware, mitigated by memory configuration and data locality due to the shared memory of the console.

That still leaves the last point in contention.

We know the console APUs of both the Xbox Series X/S and PS5 are limited in both their cache sizes and processing power because they are frequency limited. Looking at the data I gathered during my simulation of the PS5 Pro, I found that Ratchet and Clank was only performing at 80% of the same components at stock settings. Similarly, I found that Starfield was performing at 87% of stock. Locking the frequency of the CPU and GPU and power limiting them has a big effect on the performance!

This isn't the only aspect - practically all the data points we're talking about sourced from the web are performed on systems with DDR5 and Intel's 12th to 14th generations or AMD's 7000 or 8000 X3D parts - CPU, RAM, and PCIe configurations which will not cause a bottleneck to hinder the performance. 

As we can see in Tom's Hardware's testing, summarised in the table above, switching from 1080p Ultra to 1080p Medium settings changes the performance increase from the RX 6700 10GB by a pretty large margin and I'm pretty sure that these tests are not considering ray tracing... 

And this is one of the primary concerns in all this discussion - our data points are mostly far from the truth of the consoles' performance abilities.
And this says nothing of driver versions and windows versions, which we know can have an effect!
My own testing (for an upcoming blogpost) shows the effect of the CPU and subsystem capability on GPU performance and the CPU portion of the APU in the Playstation 5 and Xbox Series X are even more constrained than these desktop parts!

These are quite high settings on titles which can show the difference between CPU, GPU and memory system limitations... 


The RX 7800 XT is really suffering under the yoke of the very limited Ryzen 5 4600G in non-synthetic workloads... 


Architectural Differences...


Added to ALL OF THIS are the architectural differences. I found that RDNA 3 likely had a lot of its performance uplift from the front-end clock frequency scaling - not found in RDNA 2 or in the N33-based parts. Something I hope to further confirm in the near future! RDNA 3 also has vastly increased FP32 throughput and other bonuses which may or may not contribute to additional performance depending on the game engine and specific game code (some engineers optimise!).

The APU on the PS5 has neither this, nor the large L3 infinity cache, nor the higher core frequency boosting behaviour of the desktop parts. 

Plus, the Shader Engine configuration is completely different. The desktop parts of N32 (RX 7700 XT and RX 7800 XT) consist of three shader engines comprised of 2 shader arrays (5 WGP+, 1/2 rasteriser, 2 RB+) but the Playstation 5 Pro appears to be 2 shader engines of 2 shader arrays (8 WGP+, 1 rasteriser, ?? RB+*)
N31 and N33 are similar to each other but different from the above two configurations...

*This is an interesting thing... the RB configuration of the PS5 Pro is presumed to be 96 ROPs but this may not be true and could result in lower performance. For sure, the rasteriser config is more optimal than the desktop parts as it was in the base PS5 - I'm assuming they're the same! - but if the ROP configuration is still 64, the Pro will be struggling with the output of all those ALU operations...

So, what are we to do?

Well, my own interpretation of the answer to this question is: Let's use logic!


The Logical Argument (IMO)...


Here's the thing - it doesn't really matter what the CPU and GPU are capable of in relation to desktop parts, we can reason things out based on our (very detailed) understanding of what those desktop parts are capable of!

So, here's my reasoning - see if you agree!

Rasterisation is the pinnacle of performance. Our GPU arhictectures are primarily optimised to render a requested frame through rasterisation of triangles and texture and light application to those triangles. It's as simple as that. This means that the rasterisation rendering performance (and my nomenclature may not be perfect here, so please forgive me!) is the highest potential performance of any given GPU and GPU/memory architecture.

Real-time ray tracing is an additional calculation based on ray intersections and other calculations which must be performed before the frame can continue to be rendered. The important part, here, is that the frame is still rasterised! It's just that the lighting/sound information applied to a particular pixel/triangle will be adjusted based on the output of the ray traced calculations.

In order to do this, the GPU must first calculate the RT portion of the frame data. It's important to note that this isn't happening in parallel! So, even though Nvidia and Intel GPUs have portions of their GPU dies dedicated to hardware that can improve the calculation speed of the ray tracing, clock cycles of each frame are spent only doing that, before the GPU's ALUs/shaders can be put to work on the meat and potatoes of the frame the user is waiting to see.

Maybe I've misunderstood all of the above, I am of the understanding that I have not! But please correct me if I am because it forms the premise of the whole argument here:

Using RT effects means you will always be some percent below the theoretical 100% rasterised rendered frame time - because the ray tracing operations took some amount of performance from the GPU's ability to be able to render the frame with traditional rasterising techniques.

What percentage that is, is dependent on the ability of the architecture (dedicated calculation hardware, cache size, memory hierarchy, and memory hierarchy bandwidths and associations) to do that work.

This means that the pure rasterisation frame rendering performance of a GPU is its upper limit - a ray traced frame will only ever be able achieve a percentage of that 100% frame time.

Using this logic, let's take a look at the potential RT performance of the Playstation 5 Pro...

Here we have the raw fps results from the raster and ray tracing performance covered by TechPowerUp. These are then compared to show a % performance loss due to ray tracing...


Using the data from a review of the RX 7700 XT Sapphire Pulse, we can see the amount of performance loss per title when ray tracing is enabled. From this data, I can calculate the performance loss of RDNA 2 to RDNA 3 (which is know is 50% per CU). Using this, I will make the assumption that Sony are not speaking of 2-3x RT calculation performance uplift of the entire GPU (which would include CU increase) and instead assume that they are per CU in the design*.
*This is by far the best case scenario for the Pro's RT performance and goes against Sony's other metrics which are APU to APU comparisons... if we take this metric as a per die comparison figure, then the RT performance of the PS5 Pro is no better than a downclocked RX 6800 - or thereabouts...

Here, I work out the percent uplift from RDNA2 to RDNA3, based on the known 50% uplift - with almost all other factors constant based on N22 to N32 comparison...


Here I calculate the fps per title, taking into account the 2x and 3x RT performance uplift against the loss of performance from pure rasterised frame to the ray tracing-included frame...


We can see that in some titles, the effect of ray tracing is very minimal and in others is quite severe! Just one of the reasons this comparison is very difficult! Adding to that, we cannot separate differences in cache size, association, and memory hierarchy bandwidth differences - but this is basically as close as it gets for this size of GPU...

Finally, I take that data and plug it into the real data for the desktop GPUs running on desktop CPUs.(again, not realistic! But humour me!) I decided that I would take the average of this data to present 2.5x as the RT performance of the PS5 Pro.


Finally! The REAL performance of the PS5 Pro GPU (not)...


For some reason, I forgot this data in the chart above - I believe it was due to aesthetics...


What we can see from these calculations is that a theoretical PS5 Pro GPU with 2.5x RT calculation performance per CU would behave very similarly to an RX 7700 XT - winning in some scenarios/game engines but drawing equal in others. 

It's literally impossible for the ray tracing performance of the GPU to be greater than the rasterisation increase of 1.45x - which is approximately the RX 6800. What is probable, is that the %loss due to the extra burden of performing the ray tracing calculations is significantly reduced which we can see in the above chart in relation to the RX 6800 RT performance.


Conclusion...


These comparisons and calculations are fraught with errors and issues due to their very nature but, in my honest opinion, these are the most detailed determinations of the potential performance of the console out there at this point in time...

In reality - the GPU in the console will not perform as well as any of these calculations in scenarios were the CPU is the limiting factor as the i9-13900K used by TechPowerUp is a far cry from the Ryzen 5 4700G equivalent found in the console APUs.

Let me know your thoughts in the comments!

No comments: