5 September 2021

Does AMD's Dominance Even Matter...?

 


Over the last year or so, a lot has come to light about how the current generation of AMD's graphics cards are superior to Nvidia's in various ways. While I will mention them here, my intention isn't to dive into these features... the point of this post is to question if they even matter.



Note the significant dip for the 3060 Ti and 3070 in the performance trend at 4k? [Guru3D]


Memory...


I'm going to be linking a lot of Hardware Unboxed's content because their coverage of these issues has been singular in the industry and they have a very good testing methodology in place, coupled with an easy to digest manner of presenting their findings. They're also extremely good at arguing their point and eloquently replying to queries from their viewers. There just isn't anyone else in the industry that I follow that does as good a job as they do. So, with that out of the way...

Let's start with the simple stuff: AMD's Radeon cards generally have more RAM per tier than Nvidia's. That means they can handle more, higher detailed textures and higher resolution gaming than Geforce cards without running into the dreaded memory wall whereby data must be swapped out to or moved in from main system memory. This situation results in sometimes quite severe decreases in game performance at higher resolutions like 1440p ultrawide or 4K, and/or higher quality quality settings when ray tracing is enabled and has led to many people questioning if Geforce's cards lack enough VRAM, and if VRAM (in general) is a bit too low on the lower-end cards.

I think Tim makes a valid point in that last video - the consoles from Sony and Microsoft only have 16 GB of shared memory between OS/game. The maximum addressable memory for a GPU on Microsoft's side is 10 GB GDDR6 on the Xbox Series X which is mostly dedicated to the GPU. Some of the slower memory pool is available for games to utilise for non-graphics data (around 4.5 GB since 2.5 GB is OS reserved) but games can use the faster pool for that as well. 

The Series S has 8 GB in its fast pool, though I do not know for sure how much is dedicated to GPU usage since I couldn't find any direct confirmation. I think that, since 2 GB of the memory is reserved for OS operation, it seems that the 8GB is split between graphics data and non-graphics data for games.

So, let's say that current consoles have a "VRAM buffer" spread of 6-10 GB. This means that games will be developed with settings to account for this range. In this light, the 12 GB VRAM on the RX 6700 XT is more than enough at 1440p resolution and 16GB on AMD's higher-end cards is basically unnecessary, even at 4K.

For Nvidia, 8 GB is probably too litle for identical max settings as the 6700 XT at 1440p over the next five years but dropping texture quality slightly will immediately remedy that issue, though you can argue that it is still an issue that the user must take into consideration when comparing two "equivalent" cards for purchase.


You can see the drop-off from the higher-end Geforce GPUs compared to the Radeon cards... [Hardware Unboxed]


Driver overhead...


The next issue is Nvidia's driver overhead. This refers to the fact that Nvidia cards do not have an onboard scheduler like AMD does and so this function requires CPU resources to operate when using a Geforce card. In the above image, we can see that the Nvidia cards perform significantly worse when paired with a Ryzen 1600X/2600X than the AMD cards, where GPU bottlenecks are more easily reached (i.e. the CPU is able to allow the GPU to reach its maximum possible performance on this title).

This is a significant performance consideration for any potential buyers of graphics cards. You will be losing out on performance if you purchase a higher-end Geforce card and own an older or lower-end CPU.

In addition to this issue on Nvidia's side, Hardware Unboxed have made it clear through their research that the amount of L3 cache has an out-sized impact on game performance when the number of cores is consistent... up to a point; beyond 6-cores and 12 MB of shared L3 cache the effect is minimal for high-end AMD and Nvidia products on the latest generation of Intel and AMD CPUs for a recently released game/engine.

However, just as the "how many cores do you need?" discussion is a complicated one, "how much cache do you need?" is also not a simple question because you cannot isolate the effect on performance across generations of processor families. What is a sufficient amount of cache for one game in one processor family may not be enough in another. An i5-9600K with 9 MB L3 performs just as well as a i5-10600K/11600K with 12 MB L3 but all three outperform the R5 1600X/2600X with (8 + 8 MB L3) but I don't think we can say that this is solely down to the cache design because I don't think I have the data to prove it at this point - you just can't separate core performance when comparing across generations of processors. What does seem to be apparent, however, is that core-to-core latency doesn't seem very important for game applications, in general.


Zen 2 performs better, with double the cache (16 + 16 MB) but how can you separate that from the difference in core architecture?


It's a shame that Hardware Unboxed aren't able to apply this same scaling to Intel's processors for less cache but I'm pretty sure it's impossible to disable cache as you can cores... so it's a scenario we won't be able to test. It would also be really interesting if they performed the same testing on Zen/Zen 2/Zen 3 as we could then see how much the cache design of Zen 1 and 2 were holding those products back relative to their core architecture. It is clear from the above charts that there was a significant increase in core performance between Zen and Zen+, without altering the structure or quantity of the L3 cache (in these tests, we're seeing a 10% increase across the board).

Since, the testing methodology used by Hardware Unboxed (i.e. quality settings) is different across the two studies they performed (at least for the inclusion of Zen 2), it's actually very difficult to make a firm conclusion as to whether the scaling observed for Intel's cache would be similar for AMD's processors. All's we can do is hope someone recreates this study, taking into account the other side of the processor equation.

One final consideration for this effect on game performance is that it can be highly game/engine dependent. Comparing the results between Shadow of the Tomb Raider and Watch Dogs Legion shows how much cache and no. of cores affects the results in each game: 
  • SotTR shows a difference of 1.12 between 20 MB - 12 MB for 6 cores, 1.16 for 4 cores and a massive 1.45 for 4-core 20 MB - 6 MB. 
  • WDL shows a difference of 1.14 between 20 MB - 12 MB for 6 cores, 1.08 for 4-cores and 1.09 for 4-core 20 MB - 6 MB.
This shows us that a 12 MB shared cache is probably sufficient for most games (as I noted above) but that there is a huge absolute jump in performance from 4 core to 6 core CPUs of at least 25%, despite the relative numbers for cache discrepancies being similar between those tiers.

In this instance, it's easy to see the difference in the addition of those cores - and this is for a game engine designed for last-gen consoles. Which is why I had previously said that if you want a PC to last into at least 2025 the minimum specs include an i7-10700K. 8 cores will make a difference by that point in the same way that 6 cores do now. In this same way, I just cannot suggest anyone buy a 4 core CPU from 2020 onward - as long as their budget allows for it and it's one of the reasons I was disappointed in the Steam Deck.



Nvidia's driver overhead really hits CPUs in the cores/cache for certain games...



Unfortunately, the testing performed here is mostly academic - it's very useful for understanding the effect of differences in architecture/quantities of elements within a product but not really reflective of the type of systems that consumers tend to buy or build themselves. The typical consensus is that "balanced" systems have parts of equal quality: not many consumers will be pairing i3-10100's with an RTX 3090 or an R5 5600X with an RX 6900 XT, it's just generally not done.

So, in fairness, these incredibly informative results have very little real-world application - especially given that the processor manufactures are not making ultra-fast 4-core CPUs with huge amounts of cache. While Hardware Unboxed did not like the misinformed narrative that "more cores equals better and more performance"... the reality is that the advice, although given from a lack of understanding of what is holding performance back, is not actually incorrect. An 8 or more core CPU will be clocked higher and have more cache available for the cores to work with, while a 4 core CPU will be cache-strapped as well as running at lower frequencies. You can't separate the high-end parts from the high-end quality of design/performance: something I tried to point out, previously.

*Ahem*
Back to the point... it's clear that you need more "CPU" when using a Geforce card than a Radeon card but mostly only on 4 core/older CPU architectures; the difference is only a couple of percent between a 10-core and a 6-core 9th/10th/11th gen Intel CPU for Radeon cards, whereas we're talking about a 15% decrease in performance on Geforce cards for the same change in processors. The 4-core effect is more along the lines of a 25% decrease from the 6-core relative performance for both product lines, with Radeon performing better in absolute numbers. 

IMO, this can be a serious negative as it seems that the actual amount of CPU performance required is consistent across performance tiers on the Geforce stack with the same percentage losses seen between an RTX 3090 and 3070 (presumably it continues down to the RTX 3060 as well). This all means that the consumer needs to spend more not only on the GPU but also on the rest of the system.


The R5 2600X performs like an intel part with 6 MB of cache but the 1600X performs way worse... cache, itself, is not enough to explain the differences in performance between generations of CPU architecture. [Hardware Unboxed]


Performance...


This is going to be a short one, I promise! Just look at those graphs above, you can see, time and time again that the Radeon cards grant more performance and not only that but more performance for less money as well*. No one in their right mind would be choosing an Nvidia card over their AMD equivalent.
*Not pictured in the charts...
Unfortunately, for AMD the landscape of gaming technology is slowly changing. Yes, high-framerate and/or higher resolutions are now much more of a focus for high-end gamers but ray tracing is entering into the mainstream consciousness through AMD's adoption of the technology in the consoles. This means that raster performance, which was already good enough two generations of graphics card hardware ago, is no longer the sole focus of gamers (myself included) or developers and publishers. Ray tracing, for better or worse, is now a "back of the box" feature and affects the critical reaction to a game if it is present or not. 

As a software feature it also reduces the workload of level and environment designers and artists and it's the shiny new technology* that engine programmers love to get their fingers into and play around with... and AMD trails Nvidia in their performance
*New as defined by now having GPU hardware acceleration...
In fact, the lack of performance in RT removes any advantage that AMD has over Nvidia due to their driver overhead issues on the CPU.

On the other hand, RT implementations magnify the issue Nvidia has with a lack of RAM... so, it's not all a win for them.


Ultimately...


The issues I outlined above are serious and detract mightily from Geforce's attractiveness to an informed consumer... when taken in isolation. If we instead look at the market in a holistic manner, we come to a different conclusion.

Last time I looked at the discrete GPU market and how supply and demand are affecting it... in the process of that analysis, the data from JPR showed that AMD was shipping a median number of GPUs per year slightly below 30% of the total. In absolute numbers of today's gaming PCs, it's likely that number is repeated, overall*. Which means that, although not insignificant, AMD is incapable of supplying the market with enough GPUs if every gamer saw the light and decided that their Radeon cards are the best for the money (ignoring other features, they certainly are!).
*You can make the argument that the recent mining binge has only really affected the RTX 30 series of cards, largely bypassing AMD's RX 6000 series but the numbers shipped do not really make a dent in the 560-ish million gaming PCs out there...

AMD have been shipping around 30% of discrete GPUs per year since at least 2014...


So, ultimately, my conclusion is this - it doesn't matter that AMD has the best raster performance, the least driver overhead or the most RAM per performance tier. Their presence in the dGPU market is relatively small and only getting smaller*. All of this talk is academic masturbation (very interesting and informative but, quite frankly, pointless) because developers and engine makers will optimise their games for the largest segments of the market. First and foremost of which are the consoles and, in comparison to those architectures, Geforce GPUs perform well. Secondly, we have Nvidia's discrete GPUs and lastly we have Radeon discrete GPUs.

Nvidia will never be left behind because they have too large an install base, too much marketing power and too much consumer mindshare. Unless AMD starts shipping significantly more dGPUs the market cannot change and, I think, AMD cannot and does not want to do this because it would cause a price war with Nvidia - something neither company wishes for at this point in time, particularly AMD because it would take wafer allocation away from their far more profitable CPU business.
*Now, I'm not sure I agree with the specific numbers being thrown around in that article as shipped marketshare is not "marketshare" - as I said above, the number of cards shipped for one year has a negligible impact on the overall marketshare - we're talking 7% of the total market shipped in 2020, so in absolute numbers we're talking about a 2% dip, which is nothing.
The bigger issue for AMD is that Intel is coming and they're coming for the low-to-mid range. Nvidia own the high end, through sheer volume of cards shipped and Intel can ship at least as many chips as AMD has discrete GPUs if they wanted to*.
*I actually believe Intel would prefer to ship the majority of their supply to laptops and crowd Nvidia out of that market, first.
In order to keep competing, AMD need to decouple their process node output from their product portfolio in order to prevent competition between their CPU and GPU business. RDNA has been a success in terms of increasing performance compared to their prior architectures but AMD have heavily relied upon the process node shrink to get that performance. Nvidia are still competitive when using inferior chip fabrication technology and have remained at least a process node behind AMD since 2016.

Once Intel and Nvidia catch up to the level of fabrication that AMD has, their advantages begin to fall away, as will (I believe) their marketshare, being squeezed between two companies with more resources to throw at the problem.

Many people are excited about the prospect of a third discrete GPU manufacturer entering the market (myself included) but this change brings with it the possibility that after a short while, we may return to just having two dGPU manufacturers to choose from...

No comments: