7 April 2020

Analyse This: The Next Gen Consoles (Part 10)


Last time I took quite a deep look into a complicated subsystem of the next gen consoles. There were some items which were either taken out of context by the press or completely misunderstood. No one is perfect - not even me - and I'm always up for a debate on whether a particular idea or understanding is correct so I wanted to take the time to address some of the more important misconceptions and also address general misconceptions about the future consoles that I'm seeing in the press and on youtube.


I did not say the PS5 would be more powerful than the Xbox Series X...


I know that maybe my words were a bit ambiguous but I specifically said that the SX would be more powerful than the PS5. I said that the PS5 GPU was faster (i.e. running at a greater frequency) and then I performed a simple (frequency x hardware unit count) calculation to work out the specific performance deltas between the CPUs and GPUs of the two consoles. It's a ratio that is indicative of overall difference in "power" all things considered equal (i.e. based on limited information for  specific feature sets). 

This presents us with the fact that the SX's CPU is around 1.02 - 1.10x as powerful as the PS5's and the SX's GPU is around 1.18-1.20x as powerful as the PS5's when frequencies are taken into account. This mitigates, somewhat, the pure 36 vs 52 CU comparison - which is devoid of actual peformance per time unit information.


I did not say that the SX has only 7.5 GB RAM for games OR that there's only 10 GB total RAM...


Again, this was a complicated topic and people sometimes get lost in the minutiae of things. It might interest you to know that I was working on that post for 1.5 weeks, diving into the standards surrounding DDR and GDDR memory in order to ensure I was not making a mistake when speaking about the implementations revealed on the SX and PS5 (barring them deviating from the standards and implementing an entirely custom RAM interface!). This was not just some slap-dash effort, though I still made a couple of minor mistakes!* So I understand that people just having a quick read through the long post got a couple of details incorrect in their reporting.
* Actually, they were mistakes in the Xbox's favour, making it appear more performant than it would be in the scenarios I painted.
This particular statement was surrounding a potential scenario for RAM usage on the SX: that, if the system wanted to fix RAM access to 560 GB/s it would have to locate the OS in the "fast/wide" pool of RAM, not the "slow/narrow" pool. Since 10 GB - 2.5 GB = 7.5 GB... i.e. what would be remaining for the guaranteed 560 GB/s access for a game.

Now, many commentators have stated that "the OS resides in the slow RAM". Generally speaking, that could be the case... but that's not what Microsoft have said. What Andrew Goossen was quoted as saying was:
"Memory performance is asymmetrical - it's not something we could have done with the PC," explains Andrew Goossen "10 gigabytes of physical memory [runs at] 560GB/s. We call this GPU optimal memory. Six gigabytes [runs at] 336GB/s. We call this standard memory. GPU optimal and standard offer identical performance for CPU audio and file IO. The only hardware component that sees a difference in the GPU."

Digital Foundry went on to break down that 2.5 GB of RAM is reserved for the OS and 13.5 GB is for games. Of course, the way they worded it was in reverse. So, 13.5 GB across the optimal and standard, leaving 2.5 GB across the standard memory. But that is not an exclusive statement - there is nothing there saying that the OS can only reside in the standard memory. They even go on to clarify that the system sees the RAM as a unified pool of shared memory and they state that the performance of the memory will vary - not that it has two levels of access speed. If it was just two levels of access speed, it wouldn't be varying at all, they would both be static performance metrics (336 GB/s and 560 GB/s) - there's no variation there!


James Stanard clarified, on twitter, that the Sampling Feedback System is essentially a sort of prefetch technique - looking at which part of a texture needs to be read before fetching the entire texture (the extraneous parts of which would then be discarded during scrubbing)... if they prefetch the wrong portion of the detailed texture, then they need to temporarily fall back to the next available highest resolution while the required texture is loaded. With the speed of the SSD, this should take no longer than 1-2 frame renders (which is very fast compared to the load-ins we observe in current games on PC and console).

The RAM access will be dependent on the internal buses of individual components on-die and they will be directly plumbed into the fast or slow RAM pools...


This is sort of derived from the prior misconception. In concept, this statement is partially correct. The individual memory access of each compotent will be determined by how wide their individual bus connect will be. However, the performance of the unified pool of system RAM will not. This is what has been disclosed for both parties and for what I was analysing last entry.

The thing with how these metrics have been presented is that I believe that UMC (IMC) and DMA are linking directly into the memory controllers of the GDDR6. That means that there's a unified access to the entire bank of RAM - the whole 320-bit bus on SX and 256-bit bus on the PS5. If individual components could chalk-off chunks of the RAM directly, then there would not be such a value for the interface - it must be accessed through a unified controller in the I/O. So, I think that speaking about the buses of individual components is a bit of a moot point because the APU itself has access to the entire bandwidth of the system memory through the infinity fabric interconnect and I/O.

In this scenario the CPU/GPU and other on-die components could have access to the infinity fabric through a 512-bit wide interface (as per Zen 2/IF2 specs)- wider than either memory bus on the PS5 and SX - which would allow excess bandwidth across the die for direct memory access  (DMA) for the SSD, CPU, GPU and the audio solutions of both consoles without impacting the bandwidth of an individual component (though it would, obviously, consume available bandwidth to the RAM through the I/O complex at the time of access).


The SX is not backwards compatible with all generations of Xbox hardware...


This is something I've seen mentioned many places and it's really sort-of-not true... but also true. Both PS5 and Xbox Series X are backwards compatible only with the Xbox One and PS4. 

HOWEVER, because Microsoft heavily invested in their backwards compatibility programme on the One, all those games that were ported to the emulator are thus compatible with the BC mode on the SX for the One and One X. On the other hand, SONY are having to do what MS did years ago with the PS4 BC mode on PS5 - yes, there's a lot of included hardware timings for compatibility but there's enough of a difference between the Jaguar architecture and the Zen 2 architecture that a simple down-clock isn't going to fix it. It has to be verified on a game-by-game basis.

In comparison, MS just has to verify the emulation on the One and One X as working in the SX and then all their Xbox and 360 games will just work. I'm not sure about the One and One X games but Microsoft has been very careful with their wording surrounding backwards compatibility. It appears to be on the same game-by-game basis as the original Xbox and 360.

If SONY had invested in a BC programme on the PS4 like MS did then they would have the same level of backwards compatibility. Which is a shame since I'd love to play my PS2 & PS3 games on the PS4 or PS5 without having to re-buy them.

Yes, the backwards compatibility is as "limited" as on the One and One X... that is to say - that's a craptonne of games and Microsoft should be proud.


The PS5 being less powerful than the SX means nothing...


I mentioned very briefly last time that i believe the "pixel" is dead. What i mean by that is that features such as machine learning,  DLSS, VRS and various other upscaling techniques such as temporal resolution/filtering* mean that virtually no console or discrete graphics card is pumping out a 1:1 equivalent of pixel to display resolution. You can do that, but why waste available power on something that the human eye just won't pick up on (if implemented correctly)...?
*That's dynamic resolution and temporal antialiasing for anyone not following.
These features are allowing developers (and users) to focus the power we do have in more intelligent ways. However, what it means is that the PS5 can be less powerful than the SX and still perform imperceptibly the same through utilisation of any of these techniques. Even the SX will need to use these techniques - no matter its extraordinary GPU heft, ray tracing will bring it to its knees - in fact, Minecraft DXR already brought it to its knees in the demos to the press. That burden can (and will) be alleviated through intelligent implementation of the above features.

If a cross-platform game comes along that the PS5 struggles with then it will just be rendered at a lower resolution and then upscaled to whatever resolution is required. If you asked someone to judge the difference between a 4k image from a 2080 ti with a 2070 using DLSS, could they pass a blind taste test?

I'm not convinced 95% of the player base could tell the difference.

As I mentioned last time, the majority of the game playing population are not even playing on 4K screens. I, myself, have a 1080p monitor and a 720p TV (yes, it's really old but works fine and I can't justify upgrading for an incremental improvement in pixel density compared to actually spending the money on the hardware and games). Yes, some people are finicky about this, some people sit a mile from their screens, most people don't even have perfect vision - especially as they age. So why should Microsoft and SONY beat themselves up over achieving a 1:1 pixel output to 4K when most of their customers won't even play at that resolution?

What is more important to me are features such as proper HDR (not HDR below 1000)... in fact, one of the reasons I've never bothered to upgrade my screens is that the industry is faffing around, selling substandard products to consumers (because they can and are getting away with it) and the actually good quality displays are ridiculously expensive. It's like everyone decided to settle for selling expensive TN screens instead of advancing the technology to improve IPS/VA displays. I've actually seen little movement in this sector for actual high quality displays (outside of increasing pixel density and light bleeding) over the last 10 years.


Scenarios for RAM access in the Xbox Series X... if you want to be technically correct, I should have written "APU" instead of CPU. That was my mistake and I think it caused a lot of confusion for many people! Urian, over at Disruptive Ludens, helped me realise this mistake. Head over to his blog for a good explanation of the overall architecture of the APUs found in consoles.

I compared the RAM setups of the SX and PS5 as equal, but the systems and their underlying requirements are not...


This was a point that came up in the comments of the last article, when you look at the theoretical amount of data that can be pushed to and from RAM in both systems it comes out at around the same amount. 

That's around 7.2 GB per 16 ms frame time for PS5 and between 5.4 to 9.0 GB for the SX for a 0 to 100% split between the narrow and wide pools of RAM over that period of rendering a 60 fps frame. Unfortunately, things aren't this clean in real life - there are overheads that affect the amount of time the RAM would be able to actually receive and send data - the CPU and GPU must process data and return it to the RAM for operation by other components (e.g. the intersected ray information for the current scene/frame might be passed to the lighting engine and audio engine for processing, positional data for elements in a scene might be passed to the GPU for rendering or physics processing, etc.).

Normally, these elements would be handled mostly within the large caches available on the CPU or GPU. However, we know for a fact that the APU of the SX has only 76 MB of SRAM (very fast cache memory) across the whole SoC.

Going through the numbers, that actually sounds like quiet a large cache supply... I re-checked my calculations from the comments section last time and found I had over-estimated the graphics cache sizes by an order of magnitude, so let's lay it all out: For comparison, the 3700X has a total of 37 MB cache (512 KB L1, 4 MB L2 & 32 MB L3), the 4800H has a total of 12 MB cache (512 KB L1, 4 MB L2 & 8 MB L3). An RX 5700 has a total of 5.2 MB cache (576 KB L0, 512 KB L1 & 4 MB L2).

Adding those up gives us 42 MB cache for a PS5 and 44 MB cache for an Xbox Series X if we just combine a 3700X with a RX 5700 for the PS5 and increasing the cache size for the SX through a simple ratio of 56/40 CU (7.3 MB, total). That's actually much lower than I expected and a little surprising given Digital Foundry's supposition that the L3 cache would be cut down like the mobile Zen 2 parts are.

In fact, I myself, had considered that the SoCs for both consoles would be the equivalent of mobile Zen 2 cores... but perhaps this isn't the case? These numbers would give us another 31 MB of cache to spread around on the SoC for other engines such as I/O and DMA....or even CPU and GPU cache. That's really huge and could be possible due to the supposed 7 nm+ process node that these SoCs are rumoured to be manufactured on.

This was a bottleneck I had thought was going to be impacting RAM usage and latency but it looks like it will not be a problem - at least for the Series X. We don't know any numbers for PS5, yet!

You can see in this graph showing data transferred over a 16 ms window that as the ratio of that period is shifted towards the narrower RAM pool the overall amount of data that can be moved around drops significantly. The SX equals the performance of the PS5's memory system at the 1:1 ratio between using the narrow and wide pools (i.e. 8 ms each during this period)...

Discussion of the available cache aside, the requirements of the two systems are quite different because the PS5 has around 0.84x the amount of data required to feed the 36 CU of the GPU compared to the SX's 52 CU. It also requires 0.98x the amount of data for the CPU as well. So, from my completely niaive perspective, the SX looks like it would reach a bottleneck more quickly in memory intensive games than the PS5 could - the larger GPU and the faster CPU need more data/bandwidth to feed them, approximately 8.4 GB in total per 16 ms time frame.

If we look at the above graph, we can see that (7.168GB*1/0.84*10/16) gives us a required bandwidth of 5.4 GB per 16 ms frame time from the faster memory (working from the assumption that the PS5 has a 100% utilisation, which it won't and that both GPU architectures will be similar, which they might...), along with 3.0 GB per 16 ms frame time from the slower RAM (7,168GB*1/0.98*6/16) which is achieved at around 83-84% utilisation of the 16 ms time frame transferring data to/from the wide pool of RAM (if my calculation is correct) which equates to 7.3 GB from the 560 GB/s address space and 1.0 GB from the 336 GB/s address space for that total of 8.3-8.4 GB.

What is interesting, and it could just be a complete coincidence, but this is the point at which the SX's 560 GB/s memory access per 60 fps frame matches (or is just above) that of the PS5... It's also very interesting that a pure 50:50 split results in exactly the memory access of the PS5 - which could point to the two pools of memory accessing in a cycle-by-cycle manner, though I think that this scenario would not be able to fully feed the GPU.

If we dial that back a bit from the 83/17 split, we're more realistically looking at around a 70:30 split for that 1.19x data requirement once we take into account various overheads. This would put the system of the Series X in a RAM-starved state, impacting performance from a pure data-moving point of view.

The rescinded interview with Ali Salehi puts another spin on the whole situation, where the engineer said that it's actually quite difficult for programmers to fill the wide GPU of the SX with data to process. Not that it can't happen but that it's likely going to be one of those situations where it's hard to utilise the parallelism of the architecture - similar to how the PS3's Cell processor was difficult to utilise fully.

Going off of Killzone: Shadowfall's demo GPU/CPU ratio, then, that 70/30 split isn't that far off from potential real world game engines (KZ was 65/32/2.7 CPU/GPU/Shared), meaning that my estimation that the GPU might be data starved at these ratios of RAM access might not be far from the truth. Next time, I'll probably go into those numbers and why they're not necessarily representative of current generation console and game requirements - at least not as broken down by psorcerer.

However, despite my concern that the SX might be more easily bottlenecked, it doesn't appear that it's going to be much of an issue as long as access to the narrow/slow pool of RAM doesn't take over 20-30% of the frame time for RAM-heavy games. Above that percentage and the GPU might start to become starved for data.

Like I said above for the PS5, the pixel is dead - developers will work around this limitation on the SX by dropping rendered resolution and using more intelligent ways of managing the required amounts of data and processing. The games will run fine, despite this part of the system being less optimal than the PS5's... in the same way the PS4 Pro runs games fine compared to the Xbox One X.

Conclusion

I'm really looking forward to the eventual PS5 deep-reveal as I'd love to put some proper numbers to the SONY side of this equation and see how close to correct these predictions I'm putting out there actually are!

Yes, the arrangement of the memory on the SX is sub-optimal, but how close is it operating to the point of data starvation for the GPU? And does it really matter?









14 comments:

  1. your posts are always very interesting. thanks for that

    ReplyDelete
    Replies
    1. Thanks, Ger! I'll try and keep them up.

      Delete
  2. This comment has been removed by the author.

    ReplyDelete
  3. One question, and I appreciate your time in advance.
    As far as I see it, GDDR6 is accessed on a dual 16 bits bus. So each 64 bits controller is accessing each memory module on a 2x 16 bits bus, and two modules per controller.
    So if more than 10 GB is used, regardless of bandwidth need, if that memory is accessed we need at least do divert 6 16 bits channels, removing 96 Bits tomthe bus, or 168 GB/s from the fast memory, giving it 392 GB/s.
    So, even with the extra 6GB beeing used for stuff with low acesses, every time an access is made, bandwidth drops to 392 GB/s on the fast memory, regardless of you pulling 168 GB from the slow memory, or not!
    And if you pass 168 GB/s pulled from that 6GB memory, you will need to divert the second 16 bit channel to that memory, leaving 244 GB/s on the fast memory.
    But if a simple access will divert the bus channels, then you cannot ever rely on always having 560:GB/s on the fast memory. 392 GB/s seems a much more secure number,
    For 560 GB/s you would need to use only 10 GB or pull from both memories at once, but having only 2.5 GB available on those 6 GB I find it difficult for you to manage to pull the extra 168 GB/s.
    Am I wrong?
    PS: Sorry I had to pull the previous post, due to some typos,

    ReplyDelete
  4. Hey Duoae,

    Great article. Looking forward to more deep dives when Sony or someone drops more info.

    Another interesting metric is if you calculate (best case) bandwidth per GPU TeraFlop for each system.

    XSX = 560 / 12.15 = 46.1
    PS5 = 448 / 10.28 = 43.6

    Surprisingly close ;-)

    However the PS5 does have the advantage that this holds over the full 16GB whereas it's only for the 10GB for the XSX.

    Not sure how useful this metric is, but I feel like it mirrors your thoughts that the two systems seems quite close from bandwidth point of view if you look at it holistically and not just at the spec sheet.

    ReplyDelete
    Replies
    1. This Resetera user by the name of Lady Gaia measured both systems in terms of GB/s/TF. Turns out if the CPU needs 48 GB/s, the GB/s/TF is the same for both systems so you're pretty dead on. For the XSX, if you need to free up X GB/s bandwidth of the CPU, the theoretical peak GPU bandwidth actually takes a larger penalty than X GB/s because of its asymmetrical RAM design.

      https://i.imgur.com/KzQH8Wc.png

      Delete
  5. @Late to the party
    That is my question above, but as I see it, you have to loose 168 GB, not 80! Can’t understand those 80!
    That’s because to access 48 GB/s on the upper 6GB means you have to direct 6 16 bits lanes to the upper ram to make it available. That’s 92 bits, and a 168 GB/s loss, regardless of you using it all or just 48 GB/s.
    Maybe Duoae can help here.

    ReplyDelete
  6. "If SONY had invested in a BC programme on the PS4 like MS did then they would have the same level of backwards compatibility. Which is a shame since I'd love to play my PS2 & PS3 games on the PS4 or PS5 without having to re-buy them."

    This simply couldnt be done so easy as with X360 thanks to SPE behaviour. I believe this factor is what primarily was biggest problem for back compatibility. And Sony also always does BC different way. PS2 had PS1 hw inside, PS3 had PS2 hw inside. PS5 has PS4 hw inside (customized Zen cores to match jaguar timings). MS is doing that SW way. Xone BC was side project. They didnt invest at it at the beginning. When it was already POCed, they decided to expand it to more games and invest to it. I believe it was possible mainly thanks to much simpler cpu and much higher level of abstraction of DirectX in opposite to tailored PS API for each PS console.

    ReplyDelete
  7. About HDR TVs. Revolution already began. If you forget of OLEDs, you have to look for higher end LCDs.

    I purchased 2 years ago excellent Sony XE9305, which has peak luminance in HDR around 1500 nits. Also next year model XF9005 wasnt bad, but was simply worse :-).

    So, from time maybe 2 years back, you could buy great HDR TV with over 1000 nits around 1200 USD in sale in size of 55" and greater.

    QLEDs from Samsung are also good, but for better FALD you need to reach for the best (usually 6 and 7 line is crap, 8 and 9 are good enough).

    Believe me, picture on that Sony is gorgeous! Then I realized, HDR is much more revolutionary than 4K. I would be completely happy with great HDR on FullHD (On 55").

    I acknowledge, that most HDR TVs on market are only HDR ready and cant do HDR well. But you can already buy a great TV. If you bought that Sony (which also supports Dolby Atmos), theres no need to buy a TV for another X years, because simply 95% TVs manufactured have worse picture even now.

    ReplyDelete
    Replies
    1. Yeah, it's a two-fold problem. Anything larger than 32-40" just wouldn't fit in my home and the prices for the good sets (and by good sets i mean with low latency for gaming) are still astronomical. Pretty much anything less than €500-600 is not worth buying due to latency, colour or crappy HDR implementation.

      ... and monitors are even worse!

      Delete
    2. My problem based on my age, need for trifocals, and fact I game from about 15 feet from screen is I need at least a 70".

      Delete
  8. Duodae

    Any chance of a reply to my question?
    Thanks?

    ReplyDelete
  9. Hi MetalSpirit,

    Of course. I just didn't have time for a proper in-depth non-quick answer. They tend to bite me in the ass :) Sorry for that, I'm not ignoring you guys.

    In fact, I'm writing a follow-up blogpost to cover these questions because the answer has ballooned a bit beyond a simple comment. :)


    ReplyDelete