Hole in my head: Analyse This: The Next Gen Consoles (Part 2)...

Originally, I wanted to update the previous post with further supporting observations but during that research and thinking, along with newly released information, it ended up ballooning into a whole new post.

An Overview...

I've been looking into the leaked codenames for these products and, in the process have revised some conclusions based on the "decoder" that everyone was using to analyse the packages. So, let's summarise the pertinent information. You can find a decent summary in the Gizmodo article I linked last time and below are a timeline of the leaks, who they were from combined with the literary aspects of the codenames:

Unknown '18 - Early PS5 prototypes at developers reported by Gizmodo tipster
Jan '19 - Gonzalo APU reported by TUM_APISAK
Jan '19 - Ariel Navi 10 GPU reported by Komachi_Ensaka
April '19 - Gonzalo APU reported by TUM_APISAK
Jun '19 - Prospero console reported by Gizmodo tipster
Jun '19 - Navi variants (10, 12, 14, & 21) reported by TUM_APISAK
Jun '19 - AMD blogpost stating that Scarlett "builds upon" Zen 2 and Navi GPU
Jul '19 - Flute reported by Komachi_Ensaka
Aug '19 - Oberon GPU reported by Komachi_Ensaka

Gonzalo is the trusted advisor to the King of Naples in The Tempest
Ariel is the spirit that shipwrecks the King of Naples in The Tempest
Prospero is the rightful Duke of Milan in The Tempest
Oberon is the King of the fairies in A Midsummer Night's Dream

Naples is the AMD EPYC processor based on Zen cores (released June '17)
Rome is the AMD EPYC processor based on Zen 2 cores (released Aug '19)
Milan is the AMD EPYC processor based on Zen 3 cores (slated for 2H '20)
Navi (RDNA architecture) Radeon RX 5700 (XT) (released July '19)

TSMC 7 nm node was in commercial production as of May/June '18
TSMC 7mn+ node was in volume production in May '19

Looking at the leaked AMD codenames and the timeline, it's apparent to me that we're seeing some iterations on the dev kits for the next generation consoles - though it isn't clear whether we're seeing PS5 and/or Scarlett. What is clear is that AMD currently has a fondness for Shakespearean codenames. However, this particular trend appears to have started back in 2016-2018 when AMD partnered with their Chinese partner Chengdu Haiguang Integrated Circuit Design Co. Ltd. to make the Hygon Dhyana server chip. This particular chip was based on the first generation EPYC chips, utilising the Zen core design and had the codename "Arden".

Arden, if you're not familiar, was the maiden name of Shakespeare's mother, Mary, but also happens to be the location of his play: As you like it. The AMD codename was spotted by notorious Twitter leaker/data delver, komachi_Ensaka. However, komachi appears to have confused AMD product family codes for a device code since there is no listed Family 18h in the Linux PCI device database and, according to the Wikipedia page covering AMD microarchitectures, Family 17h corresponds to both Zen-* & Zen 2-based designs. What is interesting is that also according to that Wikipedia page the Hygon Dhyana family are assigned the 18h nomenclature.

*As I alluded to last article, the "Zen" generation covers both Zen and Zen+ releases.

Annotaed identical devices in the Linux PCI device database

What can be seen from the Linux PCI device database is that device 18h specifically corresponds to the equivalent zen core microarchitecture found in the Dhyana CPUs, meaning Zen cores. Whilst Zen 2 is covered under the Family 17h, the PCI device listing for purely AMD designed cores is "device 24", corresponding to chips that span the Zen (Fireflight/Subor Z+ & Raven), Zen+ (Raven 2) and Zen 2 (Matisse, Starship[Rome], Renoir & Ariel) microarchitectures:

Another annotaed crop of identical devices in the Linux PCI device database.

You'll notice that those device 18h (decimal: device 24) lists correspond to Family 17h (Models 00h-0Fh) and these are specifically related to the Zen and Zen+ core microarchitectures developed with/through Chengdu Haiguang Integrated Circuit Design Co. Ltd., spanning the first and second Ryzen and Threadripper chips as well as the first generation EPYC chips. It's essentially the same device/silicon, just given a different nomenclature to ensure no confusion in case of updated silicon steppings.

Zen 2, however, is (currently) covered by models 30h-3Fh, which only publicly correspond to the EPYC Rome (Zen 2) products - that's models 48-63 - despite Matisse (Ryzen 3000 series) having been released. In fact, you can see that the document is at version 0.74 which indicates incomplete documentation and there are indications online that public releases of the Ryzen 3000 series have not been forthcoming (in fact, I can find nothing except for model 71h on the AMD support site at the time I have been writing this blog post). Either way, device 24 is carried over to the Zen 2 core design and none of the publicly listed PCI devices are listed in this range.

However, this hasn't stopped some commentators latching the Arden codename to the next Xbox - despite it not matching up to the AMD-designed/owned PCI IDs.

One further thing I will bring to attention is that Arden lacks many of the Zen 2 PCI devices listed for APUs but does have commonality with the server chips, specifically EPYC Rome. I realise that, last time, I related the next generation consoles with Zen 3 but specifically the EPYC architecture - I have to admit, I was prefacing that more on the die size rather than the AMD family. Specifically, Ryzen cannot be squeezed into an 8C/16T APU with RX 5700(XT) die areas using the Zen, Zen+ and Zen 2 dies/chiplets. It's impossible. It would have to be a dedicated SoC and/or socket. If Arden really was a next gen console SoC, I do not think it would have PCI devices listed as derived from the Chinese counterpart to AMD and without any graphics or audio processing and with a non-transparent bridge - which is usually specifically utilised to address the PCIe devices in a multi-host environment.

Publicly released/identified PCI IDs for various Zen/Zen+ and Zen 2 chipsets with commonality indicated with a green cell.

These confusing factors make decifering the leaks more problematic because inter-generational PCI IDs and confusion over actual release chip architectures mean that navigating the potential design space is not easy. However, what I have come to the conclusion on is that the codename decifering is not as has been decided by the majority of commentators.

Marvin's AMD codename decoder is well known at this point and I believe that AMD are either purposely obfuscating device codenames or are not following 100% the codenames they are giving to the next generation console development samples. My own personal thoughts are that the codenames are referring the the equivalent products used by AMD to "prototype" the expected performance of the final spec'ed machine.

That makes sense because the final silicon may be designed and finalised but may not be actually produced, yet. In fact, if what I think about the CPU/GPU manufacturing node and generations are true, then that most certainly is the case!

Let's get onto the "decoding". Below you'll see the decode sheet and the codenames for the identified suspected chipsets that correspond to the next gen console development kits.

This is the "decoding sheet" used by everyone. It doesn't necessarily apply to "unknown" families of product though...

Decifering the Codenames...

2_G_1600_2C_E_8_J_A2_32/10/10_13E9 (TUM_APISAK Jan post)

Z_G_1670_2A_E_8_J_B2_32/10/18_13F8 (TUM_APISAK April post)

100-000000004-15_32/12/18_13F9 (Flute)

The first thing that everyone can agree on is the evolution of the graphics card and the maximum frequency that they operate on has increased over time. What we might have issues on is the interpretation of the three frequency numbers, normally the first two would be associated with the CPU (boost//base) but the third one is confusing things with some commentators thinking it might be the base clock of the GPU. My personal feeling is that only the third number corresponds to the operating frequency of the GPU. The big difference here is that I think that, specifically, the 1600 and 1670 correspond to the performance level of the CPU, not to the ES/QS codes you might normally associate with an engineering sample where people were reading this as a 1.60/1.67 GHz base clock of the CPU because the leaked test frequencies were at 1.6 GHz but I think this may have been either a coincidence or purposeful obfuscation because it makes no sense to have three CPU frequencies (1.6, 3.2 and 1.0) and I do not believe that the GPU would have both the base and boost clock listed.

My other reasoning for this is that the four code numbers relating to the model ID don't match any known configuration for existing leaked codenames of released (and thus confirmed) chips. Personally, I read these as (1600) "generation 1", "High performance", "SKU 00" and (1670) as the same but with "SKU 70" which you can take from the QS/Production portion of the decoder.

This "generation 1" moniker might be simply "this is the first gaming generation of chip" or it could correspond to the expected performance ability of the CPU portion of the chip.... or it could literally be a Ryzen 5 1600 with all 8 cores enabled (Ryzen 5 1600/2600/3600 have the same dies as the 1700/2700/3700 and 1800/2800/3800, respectively, but with different clock speeds and with two cores either disabled or non-functional - this is how AMD makes the most of the manufacturing process in what is termed "binning"). Given that the R5 1600 was clocked at 3.2 GHz, I think this might lead credence to the thought.

In fact, I wouldn't be surprised if the early dev kits are parts that mimic the expected performance of the final system - especially when final silicon is not yet available. This is, to my understanding, normal practice for early development kits. Certainly, from what I discern from the codenames, these are 65 W parts, which correspond to the usually assigned TDP of the 8 core parts across all three generations. It would also explain why there's 8 cores and why the cache configuration is given a different, unique letter ("J" instead of "S").

However, as we've seen, many commentators are comparing the performance of a modern Zen 2 core design at 3.2 GHz but this seems a little unlikely given the thermal and power restrictions placed on a small form factor such as the usual console designs. In fact, the leaked snazzy-looking design of the latest PS5 dev kit show a lot of ventilation holes, allowing for improved cooling. That's because a 65 W part combined with whatever TDP the graphics card demands (RX 5700 has TDP listed as 180 W and RX 5700M's TDP is rumoured to be 120 W) will push the console well above the cooling ability of even the wattage of the Xbox One X (around 180 W total system power).

This just won't be the way the console is designed.

The PS5 dev kit shell design as pictured from the Brazilian patent office.

We normally expect lower wattage and lower operating frequencies in both laptops and consoles because they can't handle the heat generated when operating at higher performances. For example, the PS4 CPU ran at 1.6 GHz whereas the Athlon 3570 ran at 2.2 GHz, despite being effectively the same 4 'Jaguar' cores (okay, there was some difference but it was relatively minor).

Obfuscation and Process Nodes...

However, this ties in quite nicely to my feelings on the makeup of the embedded SoC that will feature in the next generation consoles: Zen 3. Last time, I made the arguments based on the association with "Milan" from the prototype names, the ambiguity of what is the third generaiton of Ryzen and the required die size for an 8 core Zen 2/RX 5700 APU. This time, I'm going to expand on this further:

Back in June, AMD's blogpost announcing the partnership with Microsoft on Scarlett also featured the similar mealy-mouthed ambiguity that featured in the PS5 wired article:

This processor builds upon the significant innovation of the AMD Ryzen™ "Zen 2" CPU core and a "Navi" GPU based on next-generation Radeon™ RDNA gaming architecture including hardware-accelerated raytracing.

Now, Microsoft's reveal of the Scarlett hardware at E3 this year specifically mentions Zen 2 and Navi technology. Let me type that out for you:

[...]our custom-designed processor, leveraging the latest Zen 2 and Navi technology[...]

This is quite a different meaning from "building upon". Right after this, the engineer (I suppose that's what he is) uses the same language to speak about the GDDR6 RAM. You don't say "we're building upon the GDDR6 RAM" when you're using that particular core aspect but you could say that if you were utilising Zen 3 - which does build upon all AMD have learned from the process of getting to Zen 2. In fact, right after AMD say they "build upon" Zen 2, they state that the Navi is based on the next generation RDNA architecture - that's RDNA2 for those keeping track; the RDNA2 that will be produced on the 7nm+ EUV process node that is slated for release next year and includes the ray tracing hardware. So the graphics portion of the SoC must be on the 7nm+ process otherwise it would mean that AMD would have to do development work for two different process nodes for the same architecture - which I think is not really worth it from their perspective. If the next gen consoles were going to use Zen 2 then surely it would make more sense for AMD to say "based on" or "using"? Saying the processor builds upon Zen 2 implies that it is not Zen 2... in the same way Zen+ built upon the foundations of Zen.

Given AMD's chiplet design strategy for Zen 2, it's not unlikely that 14nm (I/O chiplet), 7nm (CPU) and 7nm+ (GPU) chiplets could all be on the same SoC but given the current bottleneck surrounding the production schedule of the 7nm production lines at TSMC, it's difficult to see how TSMC could fit in production for two major console launches, currently shipping AMD CPUs and GPUs as well as all their other customers on the 7nm lines which are already at capacity. Moving to a 7nm+ process would allow AMD to get priority on the new production lines and avoid all that extra development hassle. In that scenario, since Zen 3 is already developed on the 7nm+ process, it wouldn't make sense to port the Zen 2 design to the smaller node and different manufacturing method which apparently enables a 20% increase in transistor density over the 7nm process.

It says right there - "Next Gen RDNA"

There are three other benefits to having 7nm+ and Zen 3/RDNA2... increase in IPC (instructions per clock), power efficiency and vertical integration. Let's deal with the easier of the two first:

Efficiency and IPC

Power efficiency goes up with decreasing transistor size, that's just a side benefit. However, according to Forrest Norrod, AMD's Senior Vice President and General Manager of the Datacenter and Embedded Solutions Business Group, Zen 2 had an "unexpected" IPC improvement over Zen+ of about 15%. If we expect Zen 3 to provide a similar or slightly larger IPC improvement, as is hinted at in the interview with him, then the overall performance gain from Zen to Zen 3 could be around 30%. This increase has the benefit of being able to achieve the same performance at lower clockspeeds. If we take the assumption that the dev kits above were running Zen and R5 1600s with 8 cores at 3.2 GHz, then you could potentially run a R5 4600 with 8 cores at around 70% of the frequency (2.25 GHz) and achieve similar performance. That would really help with the thermal output of the final system.

However, looking at the benchmarks between first gen Ryzens and third gen Ryzens there's already a performance boost of around 20-25% across the board including the IPC uplift. I believe that Mr Norrod was perhaps specifying the performance gains between Zen+ and Zen 2 which seem to be more around 10%, though there's some obfuscation with different CPU operating frequencies. This would mean that the actual performance could be more in line with a 35-40% increase which could deliver the same performance at a lower frequency of around 1.92 to 2.10 GHz which would also result in a lower TDP, perhaps around 35 W . That puts us well within the realm of the current generation of consoles with the PS4 operating at 1.60 GHz, the XBOX One at 1.75 GHz and the Pro operating at 2.13 GHz. Famously, the Xbox One X operates at 2.30 GHz but with specialised cooling. This would allow the use of "normal" cooling systems and a "normal" sized console box (compared with that dev kit pictured above).

If you take a look at the leaked "Flute" test results, you'll notice that it's performing slightly below the R7 1700X, right where a R5 1600 would be predicted to be. The R5 1600 has a single core performance of 102 pts and a 4-core performance of 392 pts compared to 93.8 and 375 pts, respectively, for Flute. You can even take a look at the R5 3550H APU which operates at a 2.1 GHz base clock with a 3.7 GHz boost clock and you'll see it also matches quite closely the single core performance of the "Flute" CPU though is less close on the 4-core score. Considering that the 3550H is a Zen+ part and only has 4 cores, that's pretty decent.

Compare the single core performance of the 3550H, which only has 4/8 cores/threads and is Zen+, 12nm-based...

However, as I mentioned above, I do not believe that this Flute CPU is working at a 1.6 GHz base clock - I believe it was obfuscation on the part of the person(s) peforming the test. The performance uplift for it to achieve the scores it had would be far beyond any 35-40% increase I speculated about for Zen 3. If you look at the R5 2600, single core is 111 pts and 4-core is 422 pts at 3.4 GHz base clock; an R5 3600 scores 130 and 497 pts respectively and an R7 3700X scores 135 and 518 pts respectively - both at 3.6 GHz base clock. That's a performance gain of 17-21% for single core performance from the 2600, add another 15-20% and you're looking between 149.5-162 pts single core performance on a Zen 3 part at 3.8 GHz.

I understand that performance of a core is not strictly linear with frequency (especially because of the differing boost clocks accross these parts) but just doing the back-of-the napkin calculations (assuming linearity) you'd be looking at single core performance of around 63 pts at 1.6 GHz for a Zen 3 part, not the supposed Zen 2 part which actually works out around 60 pts when performing the same calculation with the same assumptions.

There is just no way that this CPU could have those results at that clock speed. Crossing the isle and taking a look at the top of the line Intel i9-9900K (basically the fastest gaming CPU at this point in time) and you're looking at 144 and 564 pts for single- and 4-core results at a base clock of 3.6 GHz - though this is massively helped by the 5.0 GHz boost clock. The same calculation I performed above gets you 64 pts in single core performance for a down-clocked i9-9900K at 1.6 GHz.

If you take the boost clock as the operating speed for flute and instead do the calculation above with that in mind, the R5 3600 single core would be 115 pts and the i9-9900K would be 128 pts which makes more sense. In effect, the Flute CPU is basically very near it's max boost all the time during the test, meaning that it's base clock is worthless information - any CPU could have a reported base clock of 0.5 GHz and a boost clock of 3.2 GHz but if they're operating consistently near 3.0 GHz then their base clock is 3.0 GHz. In fact all modern CPUs down-rate themselves when not in use or when lower CPU utilisation is occurring.

Finally, using that logic, the Flute CPU is maintaining a consistent average frequency of 2.6 GHz in order to achieve a single threaded performance of 93.8 pts without taking into any consideration for the boost frequency. You could argue the difference between that and a lower base clock between 2.0 - 2.6 GHz and the boost clock of 3.2 GHz but with a higher sustained boost time.

Vertical Integration

The last thing is "vertical integration". What I mean by this is that AMD is already going to pair Zen 3 and Vega GPUs (Radeon Instinct) via infinity fabric in the EPYC (Milan) ecosystem and since they're already doing that work, it makes sense to port it to the console space as well... if I don't have this backwards and it's the console design philosophy being ported to the server space! Either way, AMD would be capitalising on their development work in different segments and saving costs.

In Conclusion...

I might actually be crazy but I think there's a good possibility that the next gen consoles will be using 7nm+ Zen 3 cores and Navi SoC and I would expect the final clock speed of the CPU to be around 2.0 GHz. I guess only time will tell and then I can either sigh with relief or send myself to the mental asylum...

Hole in my head

22 November 2019

Analyse This: The Next Gen Consoles (Part 2)...