22 March 2020

Analyse This: The Next Gen Consoles (Part 8)... How the predictions stacked up...

So, yesterday, SONY surprise revealed some of the PS5 specifications - a couple of days after Microsoft surprise revealed a tonne of information about the Xbox Series X. While I won't summarise the information (you can find that elsewhere), I am going to do some navel-gazing and take a look back at my predictions and see how they square against the released information and in the next article think about how this all comes together in comparing the two consoles. So feel free to skip this blogpost!

This is some seriously sexy engineering...

The Predictions:

In my first article on the subject back in October 2019, I attempted to anticipate what would make sense for a next gen console and how that could come together within a single die. From the various project codenames and architecture nomenclature, along with the predicted release window of the consoles, I thought that it made sense for Zen 3 to be utilised. This was because I predicted that the die sizes of the newer consoles would be much larger than even the current largest SoCs in the market - even with a shift to  the 7 nm+ process node. I also predicted that I/O would be hugely important and, knowing that only server architectures (AMD's EPYC) provide this increased bandwidth and data management I extrapolated that to being the "ideal" system for the next gen consoles. Also important (from my perspective) was the unified L3 cache in Milan's CCX design and that would be a desirable aspect from a very optimised system like a console. 

I also predicted that no console APU with greater than 8-12 graphics compute units (CUs) would fit on the ryzen die size/ socket size given that a Zen 2 APU was not going to be made (they instead cut the L2 & L3 caches and moved to a custom SoC in order to achieve a Zen 2 APU with 8 CUs) given that the actual CPU cores are quite small compared to a CU.

I looked at the historic precedents for direct access to high bandwidth memory (HBM) and an on-die RAM cache and predicted that a next gen console would have 8-16 GB of HBM on-die for fast access and wide bandwidth coupled with 8 GB of off-die GDDR6 running at a slower rate and slimmer bandwidth. This combination, I predicted, would result in improvement to streaming of data and access for the GPU/CPU - data could be pre-loaded from the SSD or off-loaded from the HBM into the GDDR6 for quick transfer  (far quicker than even a 5.5 GB/s rate of the PS5 SSD) back to the 8 GB of HBM for use. I predicted that these would operate at 614 GB/s (HBM) and 448 GB/s (GDDR6). Finally, I re-confirmed that 16 GB RAM would be the minimum for a next gen console.

I looked at the possibility of NVMe PCIe 4.0 M.2 drives and came to the conclusion that integrating a drive onto the motherboard would be bad - from a consumer experience point of view and from a system reliability point of view (at some point, the drive will fail, especially when being used as a data cache to load 100s of GB of data). I couldn't see drive performance being above the theoretical 3.9 GB/s read and 2.0 GB/s of the then-current technology. I speculated that the NVMe drive couldn't be the sole silicon contributing to faster I/O from the storage and that there would be something more going on.

The PS5 downclocks the silicon to match the prior two consoles from SONY and has compatible silicon implemented in the updated designs in order to not break game compatibility. This also means that new games are not forced to utilise the new feature sets of the console...
In my second article in November 2019, I thought that the SoC would be a chiplet design and that CPU frequency would be reduced in order to control temperatures going off of the then-current state of the art in processor/GPU thermal working ranges. I looked at the performance of "Flute" and found that it was around the same level of performance as a Ryzen 5 1600 @ 3.6 GHz and determined that final silicon could get the same performance at way lower clocks due to the performance increases between Zen to Zen 2 and Zen 3. I estimated that such a processor could operate at a locked 2.6 GHz and achieve that level of performance.

I predicted that infinity fabric would be of paramount importance in the next gen consoles in order to provide wide and fast data I/O between the CPU, GPU and off-die elements and that AMD's work in the server segment towards this integration would provide a vast benefit in terms of knowledge that could be transferred to the next gen consoles.

My third article in January 2020 covered the release of the Renoir Zen 2-based processors and how that information fed into my prior assumptions about the next generation of consoles. My assumptions were thus updated, my prediction that the CPU would not be Zen 2 was confirmed because this is not and these parts are not desktop Zen 2 cores - they have vastly smaller on-die caches. 

I didn't write it in the main text but I predicted that SmartShift would be important for power management in the next gen consoles - allowing parts of the die to be powered up/down in order to improve efficiency. (I think I forgot to include that part, but the gist of it is there under the image.)

I predicted that, based on the specs of the Renoir chips, high clock speeds for the GPU silicon would not be an issue and that, due to those efficiency and architectural improvements, 2.0 GHz would not be that difficult to achieve within a reasonable power budget.

I estimated from the die size of Renoir the relative sizes of the CPU cores and graphics Compute Units and then took the estimated die size of the SX die and divided it by those (along with an estimated I/O area from the Zen 2 desktop chiplet) and found that you could fit 8 cores and 52 CUs in that space using the 7 nm process node. Going from there, I found you could get 11.74 TFLOPS* @ 1.8 GHz.
*After going through my data after my disasterous (Part 7) of this analysis I now realise there was an error in my spreadsheet where I was overestimating TFLOPS based on frequency. I had opted for a pure ratio between frequencies but hadn't realised that the TFLOP numbers were based on the unrealistic boost frequencies of desktop graphics cards instead of the game clocks. At any rate, you would get 11.1 TFLOPS with the correction at these numbers.
I didn't believe that 56 CUs was possible unless there was a significant reduction in CU size (relative to process node) as I thought that the CUs would increase in relative size due to their increased complexity - especially because Vega CUs should be smaller (in transistor count) than Navi 2.0 CUs.

The 16 GB (10 GB @ 560 GB/s + 6 GB @ 336 GB/s) GDDR6 configuration on the Xbox SX...
In the fourth part of this analysis, I looked at upcoming NVMe controllers and figured, like everyone else, that PS5 was using SMI controllers and the SX was using Phison's - which would peg the Xbox SX's interface to the SSD at a much faster transfer rate.

I poo-pooed a rumour that put the SX die size at slightly smaller than the Xbox One X and calculated that a 13% smaller die size for the PS5 would fit the rumoured 40 CUs (4 disabled). I then went on to predict that TrueAudio Next would be utilised in a next gen console - using graphics compute units to perform the audio calculations instead of the CPU and could be tied into the raytracing power of the CUs themselves. 

I noted that, if my die area calculations were off by a small margin then there is extra space on the SoC for another 4-6 CUs (which would allow for 56 CUs on the Xbox SX!) and then applied those extra theoretical CUs to an audio engine implementation.

I pondered on the exact implementation of this audio solution, stating that CUs might not be the best way to do this, instead opting for specialised silicon in order to process the audio (like the PhysX solution for in-game physics calculations in the early 2000s). I specifically mentioned more specialised ways to perform these sorts of calculations - such as the PS3's Cell processor - as being more optimised and efficient.

Finally, I predicted that Microsoft was going for more GPGPU power (with increased CU numbers) whereas SONY would opt for a smaller GPU and instead have dedicated, specialised hardware for each type of operation. On the back of this, I predicted that SONY might be able to edge out Microsoft in terms of absolute performance by offloading specific functions to specialised silicon hardware on the die, bringing both consoles on par with one another, despite differences in specific metrics such as TFLOPS.

In part 5 of my analysis, I focussed on updated information regarding the SSD in the SX - I was a bit disappointed with the potential for this, with theoretical throughput of 1.8 GB/s reads and 2.0 GB/s writes. This has pretty significant potential impact on the ability to record gameplay sessions - something that was a big deal in the current generation of consoles.

I also predicted that cooling the SSD would be a big challenge for the next gen consoles - with the actual temperatures of the SoC being quite reasonable by comparison, the SSD's performance would fall off a cliff at raised temperatures.

In part 6, I covered the released information surrounding the Xbox SX; the mandate from Microsoft that their developers (it appears to only apply to internal studios at this point in time but there is a big push for 3rd part developers to do so as well) support the current generation of consoles with all their releases - which seems to be a ridiculous handicap from my perspective; the way that raytraced audio might be implemented on both consoles - stating that there is a big difference between how SONY and Microsoft talk about their audio implementations. I predicted (again) that SONY would have dedicated silicon, whereas Microsoft would utilise their increased number of CUs in order to provide support for their audio calculations through the DirectX API.

Finally, in part 7, as you're aware, I predicted (erroneously) that the SX CPU would be as performant as a Ryzen 5 1600 AF based on a faulty assumption due to poor wording (or a typo) on MS's part. I also continued my incorrect ratio-based TFLOP calculation: 48 CUs @ 1.84 GHz = 11.3 TFLOPS. But I predicted that the TDP of the SX would be around 173 - 194 W and the PS5 would be around 165 W.

The PS5 has custom silicon on the SoC to handle I/O between the SSD and RAM, unloading this burden from the CPU... to be fair, so does the Xbox SX.

The Reality:

I'm going to do a bullet point list with break-outs for explanations, caveats and digressions:

  • Partially correct - The SoCs are manufactured on the 7 nm+ process node but the cores are Zen 2 derived but not Zen 2, or Zen 3.
I actually think the lack of shared cache hurts the multicore performance of the CPUs and a Zen 3 style shared L3 cache would have helped bring them more in line with the desktop parts.
  • I was correct that a desktop Zen 2 CPU would not be used and that the die size would be larger than could be accommodateed on a Ryzen die.
Though this isn't necessarily particularly relevant! :)
  • Partially correct - The Xbox SX uses 10 GB of GDDR6 memory with full access across the memory bus and 6 GB of less easily accessed GDDR6 memory.
Though I wasn't accurate in my anticipation of a really wide memory bus for the fast memory (and it's not on-die) the split pool of fast/slow memory was correct for the SX. The PS5 has a slightly slower but completely consistent, unified pool of memory, something which I feel is better balanced due to the much faster SSD interface which will have no penalty for accessing the full amount of RAM.
  • I was correct that the I/O is extremely important and that the SSD couldn't be the only silicon contributing to the I/O speed and capabilities on both consoles. Both consoles have serious technical investments into custom I/O in order to improve upon and remove (as much as possible) the traditional bottlenecks in a system.
  • While both consoles bring their own unique silicon into the fray, I had presumed this would be more server-like in implementation - which is not the case.
  • Not a correct or incorrect - I anticipated that an integrated, non-user replaceable NVMe drive would be bad for the consumer experience. 
Microsoft have gone with a non-user replaceable SSD with a proprietary expansion standard while SONY have gone with a user-replaceable SSD but only using validated drive models. We'll have to see which decision is best for the consumer, however it really depends on the lifetime of those locked SSDs in the SX... at some point they WILL require replacement.
  • I was incorrect in presuming that the design would be a chiplet SoC - both consoles are monolithic in design.
  • I was incorrect that processor frequencies would be lowered in order to manage the thermal output of the SoC.
At that point in time, Renoir had not been revealed, meaning that I was extrapolating from desktop parts. The optimisations in terms of process node and on-die have resulted in much lower TDP compared to the desktop Ryzen 3000 series parts.
  • I incorrectly predicted that the SX would be equivalent in terms of CPU power to the R5 1600 AF. (I covered why, here.)
  • Pending - I predicted that infinity fabric would be essential for the data transfer on-die between CPU/GPU and other elements. (I presume this is still true but I haven't seen any confirmation as yet!)
  • I was correct AMD's SmartShift would be important for the next gen consoles. It's been confirmed that SONY is using this technology in order to load-balance the CPU and GPU.
  • I was correct that obtaining high clockspeeds on the GPU would not be an issue for either console.
  • I incorrectly estimated the TFLOPS processing power of the GPUs due to a silly mistake on my part.
  • Partially correct - I was incorrect in my estimation of CU die area, overestimating it by a few percent. This was due to me not realising that the N7P process node would be utilised, instead of the base 7nm process node and not accounting for that extra reduction in die area.
I had noted that if I was off by that much then the actual number of CUs could increase by 2-4 - putting us at the magical 56 CU number of the SX's GPU (4 disabled).
  • I was incorrect in the anticipation that the SX's SSD memory controller would be faster than the PS5's.
  • Partially correct - I was sort-of incorrect about TrueAudio and utilisation of the CUs to render audio. 
We don't have full details on the Series X's audio solution outside of using Dolby for output but we know that the raytracing information from the CUs can be utilised for both audio setups on SX and PS5. Additionally, the PS5 uses additional, modified CUs, separate from the GPU, to perform the audio processing. I'll go more into this next time...
  • I was correct that un-altered CUs would be unoptimal for audio processing.
  • Pending - I predicted that the wide GPU on the SX would be utilised for GPGPU operations whereas the narrow GPU on the PS5 would be coupled with discrete, optimised silicon on-die that would result in specific, more performant operations. 
I'll have more on this next time but it appears that audio, RAM access and SSD access are all more performant on the PS5... whereas total raytracing capability is much more performant on the SX.
  • Pending - I predicted that the slower SSD access of the SX would result in problems with session recording.
The SX's actual confirmed SSD access is faster than presumed from the engineer's LinkedIn CV but the actual proper specs have not been released. I'll delve into this more next time...
  • I was correct that heat dissipation for the SSDs would be an issue for both consoles. Microsft even included the design of the heatsink into the expandable storage unit. Sony have not revealed the "shape" of their hardware but note that compatible SSDs will need to fit their specific form factor - which probably includes some sort of heat dissipation mechanism.
  • Pending - It remains to be seen how much of a handicap and how much support from third party developers and publishers there is for Xbox One and One X in their games going forward.
  • I was partly correct that SONY would have dedicated silicon on-die for their audio solution. They utilise the raytracing components of the GPU for path tracing and feed that data into custom CUs re-engineered for audio processing. It remains to be seen what Microsoft's solution is.
  • I (again) incorrectly predicted that the SX would be equivalent in terms of CPU power to the R5 1600 AF. (I covered why, here.)
  • Pending - Awaiting on system TDPs of the Series X and PS5... my predictions: 165 W (PS5) and 173 - 194 W (SX).
And when you start from the right place, you get correct results... SX is 4x the CPU processing power of the One X. The results in red are overestimations due to the non-linearity of UserBenchmark's scoring system...


Out of 21 predictions (I can't count the incorrect CPU performance prediction twice, no matter how much you may like! :) ), I have 6 correct predictions, 5 partially correct predictions and 5 incorrect predictions along with 5 pending predictions. That's actually a pretty even spread across all the possible outcomes... I would have expected more completely incorrect predictions.
Okay, we can quibble about whether being very close to the CU count via die area is partially correct or not but that would just change the numbers to 6/4/6/5 - not a big difference...
That puts me at essentially a 50/50 split of being right and wrong on these things.

Considering that I don't have access to insider information, I feel pretty good about that and if MS hadn't cocked-up their wording in their press release and if I'd had read down the article more carefully, I'd have gotten the CPU power correct as well.

So what's next?

I've been looking into the released specs of the SX and PS5 and will be detailing how they effect each console. So that's coming next.


DavidB said...

As a tech nerd into console gaming, loving this series.

Duoae said...

Thanks! I appreciate it. I think you'll really love the next entry. There's a lot of interesting things going on with the consoles...

DavidB said...

Yes, two very different strategies derived from AMD CPU/GPU technologies. Will be very interested to see an analysis of the differences, and commonalities, of the approaches (with what we know so far....).