tag:blogger.com,1999:blog-75606103933426503472024-03-19T08:46:36.514+00:00Hole in my headA blog about video games and technology and my thoughts on those things...Unknownnoreply@blogger.comBlogger328125tag:blogger.com,1999:blog-7560610393342650347.post-37499340924827614492024-02-01T09:38:00.002+00:002024-02-01T12:02:34.779+00:00We Need to Talk About FPS Metrics Reporting... (Part 2)<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEifUG8YLl5fZNota4EB-sbw3Dt7ja0ar-8CTmuVDuutn7JElDg48yfBjnHXkO-4amyysKkwMxKO4gw6MtDRKOcNAGNarGbJ9aRFiMpZdivJOHLAL-UqZ3LcG8PBKg4zblm8DPv5Ju6lC5r2xsZ4dog4j3DQe03exMo90HFlpJ1KFXZDEULn7xy1YknNdHA/s1920/Title.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="1080" data-original-width="1920" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEifUG8YLl5fZNota4EB-sbw3Dt7ja0ar-8CTmuVDuutn7JElDg48yfBjnHXkO-4amyysKkwMxKO4gw6MtDRKOcNAGNarGbJ9aRFiMpZdivJOHLAL-UqZ3LcG8PBKg4zblm8DPv5Ju6lC5r2xsZ4dog4j3DQe03exMo90HFlpJ1KFXZDEULn7xy1YknNdHA/w640-h360/Title.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: left;"><div style="text-align: justify;">There's a well-known idiom that's often said: "There's lies, damn lies, and then there's statistics...". This implies that the "statistics" in question are another, worse form of lie that is somehow obfuscated from the receiver of the information.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">We also have multiple well-known sayings which revolve around the concept of, "you can make the statistics/data say anything you want". It seems readily apparent that people, in general, do not like or trust "the statistics".</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">I experience this, in my own way, in my day-to-day work. Scientists are currently not the most trusted of individuals - for whatever reason - and one of those reasons, in both cases, is a lack of understanding on the part of the consumer of the results of data analysis, both within and outside of scientific circles.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">In the same way people say "science is hard", people say "statistics" is hard... and this is for good reason - though it might not be for the specific reason that might immediately spring to mind!</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Statistics is not that difficult once you know what you are doing (at least in my opinion). The difficult part is knowing which statistical test to apply when and where. Yes, the difficulty, as when designing scientific experiments, is understanding the context, limitations and biases of what and how you wish to test.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">This is why there are many statistical tests where the number of data points needs to be below or above a certain limit; why it is important to know the relationship between the individual data points and the set as a whole; and how the interpretation of the result of the analysis might be changed based on myriad factors.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Hence, we come to today's topic for discussion: hardware performance testing in games!</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Last time, <a href="https://hole-in-my-head.blogspot.com/2023/03/we-need-to-talk-about-fps-metrics.html">I attempted to communicate</a> the shortfalls and incorrect analysis being performed in the industry at large. Admittedly, I was unsuccessful in many ways and was roundly dismissed by most parties...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Today, I will try a different tack.<span><a name='more'></a></span></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhP7-t_-bQWbSSlauUEF60uOeJdsuq-9klxVoQWNdpA1LsdnvVh01BM1t4bJkvE1QDeAeC4pa2zdor5Y68sb49ZOMqnTB3fwr0MhrL4J4mztJMkRAwUA7bMYvSYb5jSDiH4JKMJ_ovPkqy5oSep8YckaAZCB-lmgBXfXns0KX7nOekvi6usWUhfsIyDBhU/s970/Tomshardware_4k.png" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="546" data-original-width="970" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhP7-t_-bQWbSSlauUEF60uOeJdsuq-9klxVoQWNdpA1LsdnvVh01BM1t4bJkvE1QDeAeC4pa2zdor5Y68sb49ZOMqnTB3fwr0MhrL4J4mztJMkRAwUA7bMYvSYb5jSDiH4JKMJ_ovPkqy5oSep8YckaAZCB-lmgBXfXns0KX7nOekvi6usWUhfsIyDBhU/w640-h360/Tomshardware_4k.png" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Just an example of an average and percentile result... (<a href="https://www.tomshardware.com/">Tom's Hardware</a>)</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">Porkies...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Let's talk about statistical analysis! Yes, I can see you falling asleep already but this is central to my point today. One of the innovations discussed last time was the application of very light statistical analysis tools to frametime data.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">We analyse the average framerate, along with the lowest framerate and/or the percentile lows (as picked by the specific outlet). I tried to point out that the method chosen to do this is quite literally wrong. However, I came at it from the perspective of a reviewer trying to pull together the data to make a story - who may or may not have any more statistical training than I do. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">This time I want to come at the problem from the point of view of statistical logic. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">First let's acknowledge what already works: average framerate. This metric is well-understood - it is being applied correctly. But it was the desire to more clearly understand the experience of playing games on specific hardware which caused the industry to move to look at the lowest framerates as these can have an outsized impact on the experience in the form of stutters or incorrect frame pacing.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">So, how was this approached?</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The industry, as a whole (at least as far as I can tell), has done this by doing two things:</div><div style="text-align: justify;"><ul><li>Assuming that the dataset of frametimes is normally distributed (or close to it). </li><li>Directly converting the individual frametimes into a framerate value (or fps).</li></ul></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">For example: By taking the 1 % low fps, the reviewer interpreting the data gathered during a benchmark will form a distribution of the dataset and take the 99th percentile (corresponding to the longest frametimes), then convert that value into an framerate (fps) metric.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">There are two issues with what has been done because frametimes are temporal data:</div><div style="text-align: justify;"><ul><li>The order or sequence of their situation in the data is important.</li><li>A frametime is not a framerate.</li></ul></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">By taking the frametimes and rearranging them to be near to other similar frametime values to form a distribution, you are destroying the relationship between one frame and the previous and next frames in the sequence.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">For anyone who is familiar with a certain level of mathematics, this is like confusing the application of <a href="https://www.techtarget.com/whatis/definition/combination-and-permutation#:~:text=In%20a%20combination%2C%20the%20elements,a%20finite%20number%20of%20permutations.">permutation (nPr) and combination (nCr)</a> when attempting to work out the number of possible arrangements of a set of data.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Now, many people will argue that it doesn't matter and that you just want to see the worst performance in any given benchmark. However, this is where we hit the second problem - a frametime IS NOT a framerate.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">If we take the analogy of a car journey, the framerate is the averaged speed over a period of time. If you divide the distance travelled from home to work by the time you took to make the journey, you will get your average speed - aka the average framerate.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">However, this is where that analogy breaks down because an individual frametime is not like a point speed value. You can extrapolate how much distance you will cover per unit time if you continue on at that same frametime value. Unfortunately, this is almost never the case in situations where we have unlocked framerates - the frametimes will vary, potentially by quite a lot.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">In this analogy, there is no equivalent for frametimes. The closest we can get* is the difference between sequential frames, which is like the derivative of the speed value - i.e. the magnitude of your acceleration.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">*That I can conceptualise!</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">A frametime is the time to deliver one frame. The only analogy I can come up with is related to the speed of light. The speed of light is a constant. However, that constant changes depending on which medium the photon* is passing through. Thus, a frametime is like the distance travelled by a photon where the medium it is passing through changes for each frame presented.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">*Let’s assume it’s a photon.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The speed of light in a vacuum is that upper limit constrained by the game engine but every frame has to travel through different media. Sometimes it’s glass, sometimes it’s a gas, sometimes it’s the human body. Occasionally, the media between sequential frames is the same or very similar material - but often it is not.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">We can still perform the average distance over time as we did in the car analogy (i.e. framerate) but we end up with these individual instances (frames/photons) each travelling a different distance that make up the whole each travelling at the speed of light, only that speed is different.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Each point speed is not an average - it is a constant (that is constantly changing). Thus, you cannot change that value from a constant into an average. I.e. you cannot directly convert a frametime into a framerate (fps) because (in this analogy) the photon will not be travelling the same distance per unit time.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEimfD5XuVCcmhvfkmBv1IBTaLoLmjPZ-hryotQXRivBf5rPlCyjcBZ9T1leNIkk6hDhTafdB_Sz-XjyenEjM-uPRlQ73CET6nGwaMTUuNlE25m5UI9pdGbkRRXbfz1ogEVLjUqg5Jrkgt-M-whOVg6xxEhWalKNSsonn_qXc-osvACx6QA4BpoVPF3ZM0Y/s563/AMD%20_moving%20average.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="340" data-original-width="563" height="386" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEimfD5XuVCcmhvfkmBv1IBTaLoLmjPZ-hryotQXRivBf5rPlCyjcBZ9T1leNIkk6hDhTafdB_Sz-XjyenEjM-uPRlQ73CET6nGwaMTUuNlE25m5UI9pdGbkRRXbfz1ogEVLjUqg5Jrkgt-M-whOVg6xxEhWalKNSsonn_qXc-osvACx6QA4BpoVPF3ZM0Y/w640-h386/AMD%20_moving%20average.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Previously, I tried to show visually that the framerate is a moving average... </b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">Flawed analogies...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Now we can move back to the first point - the idea that the order of frames is important.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Let’s make another analogy! (They’re fun!)</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">If we collected the data on the height of all men in Sweden and we wanted to know the average height of the male population, we’d sum all the values and then divide by the number of data points.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Now, if we want to find out the 1% lowest height in the population, we’d arrange that same data in a distribution and find that percentile (even if it landed between two data points) and report the value.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">There is nothing wrong with this application of statistics.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The reason why there’s nothing wrong with it is because none of the data is related to the other - there are no dependencies. The height of male #1 has no relationship to the height of male #322, or #4536 other than we will measure them using the same units.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">For frametimes we actually have the same situation, if we’re just looking at the data in isolation, there is no relationship between frame #1 and #8764. However, in reviewing the performance we are doing something very important - we are assessing the user experience by assessing the relative performance of the hardware in question over a period of time (the benchmark).</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Thus it becomes important which frame follows which frame and where in the benchmark the frame is located.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">This is another issue with focussing on the percentile data: it does not show you how impactful that value actually is!</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Let’s assume for a moment that we accept the percentile data is an acceptable way to analyse the performance of PC gaming hardware and that directly converting frametimes into framerates is also acceptable:</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">If we keep all other hardware variables the same and GPU 1 gives you a 1% low of 31.5 fps while GPU 2 gives you a 1% low of 35.6 fps. You’d say that GPU 2 is better, right?</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Well, what if I show you the frametime graphs of those two benchmark runs?</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiH-X93DznEni6j8fqLfudlD4AYFq4Sqf4Wyi7w7f76eRcsvfYy_MVtNetcmSigfqUmsRittfFSPyeLttTlm0Xm7NO8cb-zWkQOB58cRF_SeLGUwOoYKqERSFyhZR0nEHEu8SqjMgeLMr4Zy2yN6La5-bPrgaeKk0y1mXsaBNU4CMwW-ytf-RUzlGpyhQ8/s1067/GPU1%20vs%20GPU2.png" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="293" data-original-width="1067" height="176" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiH-X93DznEni6j8fqLfudlD4AYFq4Sqf4Wyi7w7f76eRcsvfYy_MVtNetcmSigfqUmsRittfFSPyeLttTlm0Xm7NO8cb-zWkQOB58cRF_SeLGUwOoYKqERSFyhZR0nEHEu8SqjMgeLMr4Zy2yN6La5-bPrgaeKk0y1mXsaBNU4CMwW-ytf-RUzlGpyhQ8/w640-h176/GPU1%20vs%20GPU2.png" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>The percentile lows fail to capture the experience and especially the magnitude of the experience... GPU1 (left), GPU2 (right)</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br />Would you still say GPU 2 is better?</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">What if we have another situation where this time we’re doing CPU comparisons. CPU 1 gives you a 1% low of 35.4 fps whereas CPU 2 gives you 34.6 fps? Which is the better CPU?</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjpSiyGWDPv1H2EM-vKlzS4Z4xqRyrihkMAnHLiem5NcPJchj2uKVXPB_YbLdn1mgcq3LyjrDuJeV5OTWUziWUgWYbocNUHOJs3SoCh0smrRdfjnLRXVvBQJWMOijSmqrAPuXt1UmjF9oabVZj5IBNkxwIDyyk16NA9Gob0Z0dfUS_b1tqt-0lvrr38FDg/s1066/CPU1%20vs%20CPU2.png" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="294" data-original-width="1066" height="176" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjpSiyGWDPv1H2EM-vKlzS4Z4xqRyrihkMAnHLiem5NcPJchj2uKVXPB_YbLdn1mgcq3LyjrDuJeV5OTWUziWUgWYbocNUHOJs3SoCh0smrRdfjnLRXVvBQJWMOijSmqrAPuXt1UmjF9oabVZj5IBNkxwIDyyk16NA9Gob0Z0dfUS_b1tqt-0lvrr38FDg/w640-h176/CPU1%20vs%20CPU2.png" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>CPU1 (left) gives you a large hang when passing through a loading area. CPU2 (right) has stutters as the enemy AI engages with you during a fight...</b></td></tr></tbody></table><br /><div style="text-align: justify;">Well, that's the thing! CPU2 is stuttering less in terms of magnitude but at an important part of the gameplay - combat. CPU1 has a much worse stutter but it's during the loading of a new area - which won't kill you...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">From these simplistic graphs, I hope you can see that these percentile values are almost meaningless because the sequence of events is important when playing a game! All context has been stripped from the data by applying an incorrect statistical interpretation...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Worse still, any outlet using 0.5% and 1% percentile lows will be reporting values which will most likely not represent the real order of performance between products (ignoring the fact that what they are doing is wrong in the first place!).</div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEil8KnGzlFhQ5YVfWlagO7V_v4rZLOUMraGG2NxogaU1ygmD27Rbmdl6dM6Ltifikypih3svecWmxXA6I1wwqR2Cf33QyEWgV57PICSwPWEwdY0eW4StvUQZYpHZF41_sExlceyGdFwVnF1648CediGGthX-DCQhVcxiLcTCyIm8LcDhtm9SF4h_tnf_LY/s956/Intel_ground%20truth.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="635" data-original-width="956" height="426" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEil8KnGzlFhQ5YVfWlagO7V_v4rZLOUMraGG2NxogaU1ygmD27Rbmdl6dM6Ltifikypih3svecWmxXA6I1wwqR2Cf33QyEWgV57PICSwPWEwdY0eW4StvUQZYpHZF41_sExlceyGdFwVnF1648CediGGthX-DCQhVcxiLcTCyIm8LcDhtm9SF4h_tnf_LY/w640-h426/Intel_ground%20truth.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>One of my suggestions was to look at the differential between sequential frames then define a limit to see how many excursions happened...<br /></b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">Finale...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">And that, I think, does it for today. <a href="https://hole-in-my-head.blogspot.com/2023/03/we-need-to-talk-about-fps-metrics.html">In the previous post</a>, I went over what I thought could be an improvement in reviewing procedure. Maybe next time, I will come back to that and see if I can make it even more user-friendly.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Thanks for reading!</div></div>Unknownnoreply@blogger.com2tag:blogger.com,1999:blog-7560610393342650347.post-65420238725713813202023-12-29T20:51:00.009+00:002023-12-30T03:25:23.308+00:00Looking back at 2023 and predictions for 2024... <div style="text-align: justify;"><br /><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhMI6dn9hReNK728gcnplejlCGV2qBWPdgTjVPbxvgC3wLq1R1-m0QyFSksb7rJQ-sEWDtuahEpFvYAhWaN-FZbS4FQA7ApyXySLXHz6eC1BESRX_Dxm5-xz1fsCIoBGza9lRmM65uCMf8jqzJCbcodGpEVIAtHOQafvp0oWNa8kCRZlMqXjRYNSwBsWBU/s1420/Happy%20birthday!.png" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="1078" data-original-width="1420" height="486" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhMI6dn9hReNK728gcnplejlCGV2qBWPdgTjVPbxvgC3wLq1R1-m0QyFSksb7rJQ-sEWDtuahEpFvYAhWaN-FZbS4FQA7ApyXySLXHz6eC1BESRX_Dxm5-xz1fsCIoBGza9lRmM65uCMf8jqzJCbcodGpEVIAtHOQafvp0oWNa8kCRZlMqXjRYNSwBsWBU/w640-h486/Happy%20birthday!.png" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Hap-pey Burf-yay!</b><br /></td></tr></tbody></table></div><div style="text-align: justify;"> </div><div style="text-align: justify;"> </div><div style="text-align: justify;">The introduction to last year's post could almost be copy/pasted into this one. Work was even more intense than last year and I know for a fact that 2024 will be even tougher, so my hopes for spending time writing for this blog look set to be a big loss... But, that doesn't mean that I won't try and address things when I feel I can dedicate the time or perform more comparisons and benchmarks for everyone to digest. <br /></div><div style="text-align: justify;"> </div><div style="text-align: justify;">2023 was a big year for hardware releases - with most of the current generations of CPU and GPUs being released at some point. Sure, we're getting some minor refreshes from Nvidia next year but, overall, 2024 looks set to be quite boring.</div><div style="text-align: justify;"> </div><div style="text-align: justify;">With that in mind, my predictions for this coming year were quite hard to pin down - what is left to predict? </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">So let's start where we always do: the review of last year's predictions.</div><div style="text-align: justify;"></div><span><a name='more'></a></span><div style="text-align: justify;"><br /></div><div style="text-align: justify;"> </div><div style="text-align: justify;"><h3><span style="color: #274e13;"> 2023 Recap...</span><br /></h3></div><div style="text-align: justify;"> <br /></div><div style="text-align: justify;"> Last year, I managed to get 60% of the prior year's predictions correct - let's see if I can beat that.</div><div style="text-align: justify;"> </div><div style="text-align: justify;"><ul><li><b style="color: #274e13;">This graphics card generation is a lost generation. There will be ZERO cards that consumers or reviewers consider actually good value... (Look, even the 4090 is not good value!)</b> </li></ul></div><div style="text-align: justify;">The cards released this year have all disappointed with their price points and/or price to performance. The RTX 4060 Ti and 16GB were roundly destroyed in the reviewing press, while the RX 7600 and RTX 4060 were head-scratchers with their lack of performance uplift over the prior generation and similar price points. </div><div style="text-align: justify;"> </div><div style="text-align: justify;">The RX 7700 XT was poorly priced with respect to the RX 7800 XT compared to the performance drop and the latter part was <a href="https://www.techpowerup.com/review/amd-radeon-rx-7800-xt/41.html">priced "okay"</a> but <a href="https://www.techspot.com/review/2734-amd-radeon-7800-xt/">not great</a>, relative to the rest of the product stack from both Nvidia and AMD. Especially when considering that it also performed the same as the prior generation's same-named SKU, the RX 6800 XT, though for a cheaper price. This relatively positive reaction to the price has been eroded by the <i>actually available </i>street price of the RX 7800 XT which has consistently been around $40 higher in this post launch period (matching the RX 6800 XT in the open market)...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">However, saying all this - reviews have literally called the 7800 XT as the best priced GPU this generation... so, you could say that I was wrong in my prediction. Let's go with that:</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><b><i><span style="color: #274e13;">Verdict: Wrong - but SO close!</span></i></b><br /></div><div style="text-align: justify;"> </div><div style="text-align: justify;"> </div><div style="text-align: justify;"><ul><li><b><span style="color: #274e13;">Nvidia video Super Resolution will be a BIG thing...</span></b> </li></ul></div><div style="text-align: justify;">This just didn't happen... Sure, Super Resolution was interesting in the browser implementation and VLC also implemented it but the tech actually wasn't that impressive because it was SUPER (get it?!) limited in how it worked. Quite a disappointment and practically no one has even used this in any really useful manner. Plus, the power consumption increase for using this on video is quite large, rendering the point of it a bit useless, in my honest opinion...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><b><i><span style="color: #274e13;">Verdict: Wrong!</span></i></b></div><div style="text-align: justify;"> </div><div style="text-align: justify;"> </div><div style="text-align: justify;"><ul><li><b style="color: #274e13;">There will be no "pro" consoles for either Playstation or Xbox this year... Xbox Series S will continue to be a thorn in developers' sides... <br /></b></li></ul></div><div style="text-align: justify;">This is 100% correct - not even a question! There have been no mid-gen refresh consoles this year and <a href="https://www.eurogamer.net/digitalfoundry-2023-call-of-duty-modern-warfare-3-runs-well-on-ps5-and-series-x-but-series-s-has-issues">Xbox Series S</a><a href="https://www.ign.com/articles/remedy-opens-up-about-challenges-developing-alan-wake-2-for-xbox-series-s"> has continued </a><a href="https://www.vgchartz.com/article/458333/quantum-error-developer-says-game-is-in-unacceptable-state-on-xbox-series-s/">to be shown to be</a> <a href="https://www.thurrott.com/forums/microsoft/uncategorized/thread/baulders-gate-iii-shows-how-the-xbox-series-s-may-have-been-a-mistake">an issue for developers</a>...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><b><i><span style="color: #274e13;">Verdict: Correct!</span></i></b></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"> </div><div style="text-align: justify;"><ul><li><b style="color: #274e13;">DirectStorage will be a flop... again.</b></li></ul></div><div style="text-align: justify;">I know this is becoming a perennial thing with me but I have sound technical reasons for not liking this tech with the current design of the personal computer. However, as it stands (and <a href="https://www.youtube.com/live/H1f6PkFFvu0?si=Ekh5PDBt37Re0Pp1&t=1349">as confirmed by the PC World Full Nerd crew</a>) DirectStorage really has not had any real impact on PC gaming... and it's barely faster than conventional methods.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">I'm still waiting and still critical of this API - if new hardware needs to be implemented to actually make this make sense, then I feel like that's my point made for me...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><div style="text-align: justify;"></div><div style="text-align: justify;"><b><i><span style="color: #274e13;">Verdict: Correct!</span></i></b></div></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><ul><li><b style="color: #274e13;">32 GB of RAM will become standard for the recommended specifications of new AAA PC games...</b></li></ul></div><div style="text-align: justify;">Unfortunately, I cannot claim this one. I did state that it was a long-shot but it turns out that (as per my tracking) only one main stream, high-end game required more than 16 GB of RAM - Forspoken, which required 24 GB, not 32 GB... Not a great endorsement of the prediction!</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><b><i><span style="color: #274e13;">Verdict: Wrong!</span></i></b></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><b><i><span style="color: #274e13;"><br /></span></i></b></div><div style="text-align: justify;"><b><i><span style="color: #274e13;">Summary: Not quite on the target...</span></i></b><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">This year, I'm hitting an "at best" 50:50 and "at worst" a 40:60 against - depending on how you interpret the first prediction in this list. You could say it was a wash or wrong... </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Considering that I thought I was pretty conservative in my predictions, that's actually worse than I might have expected! </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Anyway, let's take a look at what might be on the horizon!</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhYg27y4cTnTsIlF1izo-hz6Dz-l3Vm_NvYVaXqdxpJnN8qW-RktzYYC24yI1Ewtl31nBLslwdklw2F8On-SsKmjZ7SN5aHWJwY0t-3F7_Prrn_fhtGgbYjCDh2k4VM7qyhsggvCvViawxwxZAnklJ029Ew9llG0Cv-KLnWXhJ1QSuGIzTmDTzrhF3Q66k/s1920/Header.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="1080" data-original-width="1920" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhYg27y4cTnTsIlF1izo-hz6Dz-l3Vm_NvYVaXqdxpJnN8qW-RktzYYC24yI1Ewtl31nBLslwdklw2F8On-SsKmjZ7SN5aHWJwY0t-3F7_Prrn_fhtGgbYjCDh2k4VM7qyhsggvCvViawxwxZAnklJ029Ew9llG0Cv-KLnWXhJ1QSuGIzTmDTzrhF3Q66k/w640-h360/Header.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Last year, we really jumped the shark. Maybe this year, things will calm down....</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><h3><b style="color: #274e13;">2024 Predictions...</b></h3></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">This year, I'm going to be a bit bolder, in order to up the stakes and make the highs higher and, consequently, the losses deeper...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">I've also taken more time to think about the industry and where it is and where it may be going, so, hopefully, that will make things a little more interesting.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><ul><li><b><span style="color: #274e13;">The client (desktop) RTX 50 series from Nvidia will not release in 2024.</span></b></li></ul></div><div style="text-align: justify;">All indications point to Nvidia releasing a refreshed lineup in early 2024 and this, to me, is a very good reason for no next gen cards to appear this coming year. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The 20 series released in late 2018, with the super refresh around 8 months later in '19, with the 30 series appearing at the two-year mark in 2020. The 40 series began appearing in late 2022, but the mid-range cards didn't appear until well into 2023 (this year)... </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Just from a company perspective, these SKUs have not been on the market long enough to allow for a new generation and any refresh would push that period back. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Added to this, AMD are not competitive enough to force Nvidia's hand. Therefore, I just don't see any next generation from the green team in 2024.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><ul><li><b><span style="color: #274e13;">If Zen 5 desktop launches this year, it will launch with an X3D part in the lineup.</span></b></li></ul></div><div style="text-align: justify;">I honestly thought that this would happen for Ryzen 7000 but we now all know that was essentially a second launch. For the next Ryzen generation on desktop, I believe that the X3D parts have enough cachet that AMD will want them front and centre with the reveal - at least with one prominent SKU. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">It's not as if these are cheap parts that will drag the average selling price (ASP) down, so I don't think there's even an economic reason for holding them back.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><ul><li><b><span style="color: #274e13;">If Zen 5 launches, no new motherboard generation will launch. Prices will not drop on current lineups.</span></b></li></ul></div><div style="text-align: justify;">One of the justifications for the (in my opinion) insanely-priced AM5 motherboards has been the promise of in-place upgrades. I feel like the board partners "suffered" a little during AM4's reign because enthusiasts generally kept cheap motherboards and upgraded in situ, denying the manufacturers of their usual revenue stream across multiple CPU generations. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">This was, of course, along with increased costs associated with BIOS development and extended support with each new AGESA update and managing the intricacies of supporting certain CPUs/generations of CPUs on more motherboards and revisions of motherboards than would normally be the case.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Taking this all into account, I would be surprised if there is any new chipset introduced, with motherboard makers saving money on that side of things and managing to clear existing stock without having to potentially resort to price cuts on the other. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">I might clarify here that there may be a numeric "uplift", as manufacturers will be wont to do to prompt sales of the "new" thing, but the underlying chipset and architecture will be the same - unlike going from B350/450/550, which saw advances in PCIe and other bits and pieces. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><ul><li><span style="color: #274e13;"><b>PC ports/releases will continue to get better from the current low in terms of quality. 2023 was an outlier.</b></span></li></ul></div><div style="text-align: justify;">2023 felt like a real slog when it came to the quality of games being released on PC. Sure, most of it was fixed through subsequent patches, though there are still games that run very poorly 'til this day that don't appear to be getting the deserved support for the users, which is a big shame. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">I believe that the factors that resulted in this spate of high profile releases will mostly be resolved for 2024, along with developers and publishers being much more cognizant of the backlash, negative PR and reputational effects in the near future. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Or maybe this is just me wishful thinking...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><ul><li><b><span style="color: #274e13;">No new Radeon cards will launch.</span></b> </li></ul></div><div style="text-align: justify;">I know there are rumours of an RDNA 3 refresh but I really don't see the point. I don't think there is much more performance to be squeezed out of the architecture or the current process node, and if the rumours of AMD not bringing RDNA 4 to market in the higher end segments, then the implication is that they will rely on the current high end parts to continue to flesh out the upper end of the stack...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">In that scenario, faster GDDR6 memory modules and a slightly higher frequency will not really make a very large increase - maybe 5 - 10 % at most.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><ul><li><b><span style="color: #274e13;">Intel will not launch Battlemage desktop GPUs this year. </span></b></li></ul></div><div style="text-align: justify;">Although Arc Alchemist launched more than a year ago now, the followup Battlemage doesn't feel like it's imminent in its arrival. I would be surprised if it made an appearance on desktop.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><ul><li><span><span style="color: #274e13;"><b>Microsoft will charge a nominal fee for Windows 10 security updates from client; maybe ~$25 per year. </b></span></span></li></ul></div><div style="text-align: justify;">I read the reports regarding Microsoft extending security support to the client arena with great interest. This feels like the company trying to generate more revenue for essentially "free".</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">At the end of the day, Windows 11 is a bit of a dud, like Windows Vista and 8 were before it - there are just too many poor decisions and bloated design choices relative to actual improvements in stability and speed* for it to be. </div><div style="text-align: justify;"><span style="color: #274e13;"><b><blockquote>*Unfortunately, aside from the normal early teething issues with Win 11, there appear to be constant issues with stability and OS "slowness", resulting in a less than perfect user experience...</blockquote></b></span></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">However, saying all that - what is the <a href="https://www.theregister.com/AMP/2023/10/05/win_11_penetration_still_low/">uptake of Windows 11</a> in the business sphere? </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">If Win 11 is just at 8 % of devices surveyed versus 80 % for Win 10, corporate investment in the version is likely a fraction of that... </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">All of this means that Microsoft's wheel-spinning on getting this version to market in the terrible state it was, with slow, opaquely questionable updates has been a very costly enterprise, with little to no return on that investment... They need to find that return!</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">This is where my prediction comes into play - consumers and businesses want to stay on Win 10, they want support. Windows 12 is not ready, therefore, why not make the extended support reasonably priced, so that uptake is much higher? </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">They have the ability to market this within the OS with intrusive pop-ups and other nefarious devices, so why not? </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">I could put my perennial prediction regarding DirectStorage somewhere here but I'm getting a bit tired going on about it. I still think it won't be a big deal but I'm still waiting to be proven wrong...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">The Show Must Go On...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">As always, I've had a bit of fun ruminating on the possibilities I've concluded on, above. I look forward to seeing how this year plays out. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Overall, it looks like it could be a quiet year from a consumer standpoint - with a greater focus on just playing the games instead of worrying about the hardware. In a sense, that will be nice... Though, I do have quite a backlog of posts which will fill the time. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Wish me luck in getting them completed and published and I'll see you around!</div>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-7560610393342650347.post-57158100663253007282023-11-12T05:21:00.003+00:002023-11-12T15:56:01.865+00:00The Performance Uplift of RDNA 3 over RDNA 2...<div style="text-align: left;"><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEikfZ79Pn7euiyB8ygrysuY91_gQ5qAm5rTkTyWLltEOmG4Jzvx0VRg_BzhTeI41UvoR4AoZyaCIEzJsgb-dldEJZf_bCPNOfePMgzxagYohy-dYe7zf-UskNpCgRh8UBtjYo57TDqr2n27afcWVnzUr7XMw56Vy7HUSBXHMVvVYEZMvd1-qxBu_WWaH2I/s1920/Title.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="1080" data-original-width="1920" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEikfZ79Pn7euiyB8ygrysuY91_gQ5qAm5rTkTyWLltEOmG4Jzvx0VRg_BzhTeI41UvoR4AoZyaCIEzJsgb-dldEJZf_bCPNOfePMgzxagYohy-dYe7zf-UskNpCgRh8UBtjYo57TDqr2n27afcWVnzUr7XMw56Vy7HUSBXHMVvVYEZMvd1-qxBu_WWaH2I/w640-h360/Title.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Yes, this is an RDNA 2 card, I wasn't able to create a new one with RDNA 3 just yet...</b></td></tr></tbody></table><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Much was made about the performance uplift (<a href="https://hole-in-my-head.blogspot.com/2023/08/what-went-wrong-with-rdna-3.html">or lack thereof</a>) of the initial RDNA3 cards released by AMD. Navi 31 (the RX 7900 XTX and XT) failed to meet expectations - <a href="https://www.techpowerup.com/review/amd-radeon-rx-7900-xtx/40.html#:~:text=Averaged%20over%20our%20whole%2025,it's%20still%20a%20tremendous%20result.">seemingly both internal to AMD and externally</a> (for various reasons) - this lack of performance also extended to the RX 7900 GRE which, despite lower core clocks, still underperformed compared to where it could be calculated that it <i>should</i> be...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Disappointingly, Navi 33 (the RX 7600) performed <i><a href="https://www.techspot.com/review/2686-amd-radeon-7600/">exactly the same</a></i> as the equivalent RDNA2 counterpart, the RX 6650 XT, showing that there was zero performance uplift gen-on-gen in that lower tier part...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">In the meantime, <a href="https://x.com/XpeaGPU/status/1589075249337085952?s=20">rumours swirled</a> that Navi 32 was <a href="https://www.notebookcheck.net/Dismal-AMD-RDNA-3-refresh-rumor-suggests-all-RDNA-3-RX-7000-SKUs-have-been-canned.684174.0.html">going to be 'fixed'</a>. So, what is the truth of the matter? I intend to investigate a little and get to the bottom of the situation like I did with my <a href="https://hole-in-my-head.blogspot.com/2023/07/the-performance-uplift-of-ada-lovelace.html">Ampere vs Ada Lovelace performance uplift analysis</a>...<span><a name='more'></a></span></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;"><br /></span></h3><h3 style="text-align: justify;"><span style="color: #274e13;">Ours is not to reason why...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Much like the RTX 4070 VS 3070 comparison, there is a large overlap in "equivalent" hardware features between the <a href="https://www.techpowerup.com/gpu-specs/radeon-rx-7800-xt.c3839">RX 7800 XT</a> and <a href="https://www.techpowerup.com/gpu-specs/radeon-rx-6800.c3713">RX 6800</a>.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The on-die cache structure is a bit different, with L0 and L1 doubling in size, and L2 remaining the same. L3 cache is decreased in size from 96 MB on the RX 6800 to 64 MB on the RX 7800 XT, with the caveat that it is also a chiplet design so the cache is not located on the primary die - which increases latency of access and worsens data locality...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Last time, <a href="https://hole-in-my-head.blogspot.com/2023/08/what-went-wrong-with-rdna-3.html">I theorised</a> that the dual striped memory configuration - with the L3 cache split across multiple chiplets feeding from multiple separate accesses to the on-board memory chips would lead to inefficiences and this appears to be confirmed in my testing on the product, with increased latency occurring <i>before</i> the cache size is reached (seen below around 48 MB). The decreased locality and increased management overhead of the data leads to delays in delivery or access to that data. Meanwhile, the RX 6800 with the on-die L3 only begins to increase latency of access once the cache buffer is exceeded.</div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhCtBZrU8rRy0MkNYGg1z_nevT5X677EFeW5t0kxyFtem1wPx788WRZRjwulgZwXZsnJgIFh5Wa58m19Bfhkwv3eExAdEqFA8BHM0Qf7YE6jBTZ7-5I6zZ7l9d_Me3Gb4cIikz89P2q5xEdt3I7HnDoMCqzZ6chKJb5XUUcWZVeQZg3tbot4yJaDran0n8/s874/Latency%20comparison%207800%20XT%20vs%206800.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="449" data-original-width="874" height="328" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhCtBZrU8rRy0MkNYGg1z_nevT5X677EFeW5t0kxyFtem1wPx788WRZRjwulgZwXZsnJgIFh5Wa58m19Bfhkwv3eExAdEqFA8BHM0Qf7YE6jBTZ7-5I6zZ7l9d_Me3Gb4cIikz89P2q5xEdt3I7HnDoMCqzZ6chKJb5XUUcWZVeQZg3tbot4yJaDran0n8/w640-h328/Latency%20comparison%207800%20XT%20vs%206800.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>I experienced some issues with data testing into the GDDR memory access, not sure why. However, we can see the improved latency for the 7800 XT within on-die cache... (<a href="https://beta.nemez.net/projects/gpuperftests/">From Nemez's GPUPerfTest tool</a>)</b></td></tr></tbody></table><br /><div style="text-align: justify;">Other than that, all other elements are similar from being physically present in the device - RAM, number of compute units, etc.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">What has changed in a more significant manner is the dual-issue design in the compute units and the uncoupling of the front-end operating frequency from the compute portion of the die.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">These changes mean that, potentially, there's better throughput of instructions through the compute units in particular scenarios but also that while those compute portions of the die operate at around 2.4 GHz, the front-end operates towards 3 GHz - cutting down on latency of issuing those instructions.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">As outlined in the introduction, we haven't seen either of these innovations <a href="https://www.techpowerup.com/review/amd-radeon-rx-7600/41.html">provide much uplift</a> in the products prior to those based on Navi 32, so lets see if that's still the case with this silicon...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">Setting Up...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The base of my test system is as my recent <a href="https://hole-in-my-head.blogspot.com/2023/11/alan-wake-2-performance-analysis.html">Alan Wake 2 technical analysis</a>:</div><div style="text-align: justify;"><ul><li>Intel i5-12400</li><li>Gigabyte B760 Gaming X AX</li><li>Corsair Vengeance DDR5 2x16 GB 6400</li><li>Sapphite Pulse RX 7800 XT</li><li>XFX Speedster SWFT 319 RX 6800</li></ul></div><div style="text-align: justify;">However, special considerations are given to the two GPUs, here. In order to level the playing field as much as possible, I've used the Adrenaline software to corral the core clocks of each GPU to be as close as possible to 2050 MHz. Unfortunately, the user ability to take control of modern Radeon GPUs is nowhere near as easy or precise as it is for Geforce cards, which means that I have not found a way to lock to a specific frequency - only control within ± 50 MHz. These small fluctuations should not really affect the outcome of this analysis but I wanted to make the note upfront.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Memory is another problem - AMD have the frequency of the memory so locked down that the user cannot force the memory below or above spec without the card placing itself into a safemode (circa 500 MHz core clock lock). This essentially means that the closest I could match the memory between the two cards was with an overclock on the RX 6800 to 17.2 Gbps and keep the RX 7800 XT at 19.5 Gbps (stock).</div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgEr4EfG1XKCZMw1852HoUd_tlVxPek355AQLJtRvmQwYgRo6e3TXbjECD60LPVwTke_kZYZM2Iq6uko3lzy6H8f22Hw-jHqLeOHag5GuFfT-7Xt4cbztyrp9pcgaYmlC9iJ_sO9bqAkxQuf7A6Qw7cHrw8OYnNIbwfWwJtAQBw61olZH5vmyyvdD29sW4/s1060/Memory%20speed_Superposition.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="341" data-original-width="1060" height="206" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgEr4EfG1XKCZMw1852HoUd_tlVxPek355AQLJtRvmQwYgRo6e3TXbjECD60LPVwTke_kZYZM2Iq6uko3lzy6H8f22Hw-jHqLeOHag5GuFfT-7Xt4cbztyrp9pcgaYmlC9iJ_sO9bqAkxQuf7A6Qw7cHrw8OYnNIbwfWwJtAQBw61olZH5vmyyvdD29sW4/w640-h206/Memory%20speed_Superposition.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Both cards increase slightly but the RX 7800 was unable to be clocked high enough to reach a plateau...</b></td></tr></tbody></table><br /><div style="text-align: justify;">With this information in hand, I took a quick look at the effect of memory scaling on the two cards. With the synthetic benchmark, Unigine's Superposition, I saw that both cards had a slight increase in performance with increasing memory frequency. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The 6800 plateaued-off after a 100MHz increase, but I wasn't able to push the memory on the 7800 XT much further, either, as it quickly became unstable. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Thinking it might be a power-related limitation, I re-performed the test with the maximum power limit to the board. The GPU core, itself, was positively affected by this change - showing that the 7800 XT is power-starved, but the performance due to memory scaling did not really improve (though it became more linear).</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">This implies that both cards are working near their optimum memory frequency, as-is.</div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjCKAHCJtMFM42nhSTCgHNyTP7EF0Pq3ol4ug4NvNt2fm8wG6KtgmtAmIqxtaVRLDaehFGPR4AHHeAUzti9KLYHW9ZcOe35zDoxQiHS5p9MvmGbM6LgadK1H5qR-l-7A57ysyEDZdeG6B3l19NGPoPHSK43hGSoeH2Wfz3OkxQwJUz3AePvuCIwQwvaXns/s661/Memory%20speed_Metro%20Exodus%20EE.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="338" data-original-width="661" height="328" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjCKAHCJtMFM42nhSTCgHNyTP7EF0Pq3ol4ug4NvNt2fm8wG6KtgmtAmIqxtaVRLDaehFGPR4AHHeAUzti9KLYHW9ZcOe35zDoxQiHS5p9MvmGbM6LgadK1H5qR-l-7A57ysyEDZdeG6B3l19NGPoPHSK43hGSoeH2Wfz3OkxQwJUz3AePvuCIwQwvaXns/w640-h328/Memory%20speed_Metro%20Exodus%20EE.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Memory speed really has no effect on application performance for the RX 7800 XT...</b></td></tr></tbody></table><br /><div style="text-align: justify;">Translating that into a "real" application, in the form of my old standby, Metro Exodus Enhanced Edition, showed zero performance increase as memory frequency was lifted. In fact, due to errors introduced at the higher frequencies the RX 7800 XT's performance started to drop off.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">I didn't import the graph here but the RX 6800 saw no drop in performance - it was just flat at all memory frequencies...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">Much ado about something...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Moving on, I looked at the performance of the two cards in several gaming applications, limited as mentioned above.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">In the less demanding games, I tested at 1440p - moving to 1080p for the more demanding. However, really and truly, a current generation €600 card should be able to play modern games at native 1440p. We'll see that isn't always the case, which is a shame...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhiZwjEZ85HEK8e1ecuJTGwdzsto6RISqj3UuV83ARQLwiktYZZQcVgmICBKxZinotArEX6Fy94sMMgX7aoTGHDUR7suP-kEeJ_r5Y3_6GYl5UtqXFLqhyphenhyphen89rr6bzTWd9wTvGWHD-yb9wLEoZ-c3XDCXy5WRSLRLzFTc1Qvtu2oPGZCpRD8anGsD9fl0eQ/s1272/Returnal_Spider-man.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="349" data-original-width="1272" height="176" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhiZwjEZ85HEK8e1ecuJTGwdzsto6RISqj3UuV83ARQLwiktYZZQcVgmICBKxZinotArEX6Fy94sMMgX7aoTGHDUR7suP-kEeJ_r5Y3_6GYl5UtqXFLqhyphenhyphen89rr6bzTWd9wTvGWHD-yb9wLEoZ-c3XDCXy5WRSLRLzFTc1Qvtu2oPGZCpRD8anGsD9fl0eQ/w640-h176/Returnal_Spider-man.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"></td></tr></tbody></table><br /><div style="text-align: justify;">Returnal and Spider-man typically show low gains on the order of 6 - 9 %. However, ray-tracing for Returnal has a 15 % increase in 'iso-clock' testing. Both games are considered relatively light on hardware so these increases, while real, are small in raw fps benefit.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">On the other hand, Cyberpunk and Hogwarts Legacy, traditionally graphically heavy games to run, show typically consistent gains above 10 %, with only the broomflight test dipping below that. This is most likely due to a system I/O bottleneck as areas are loaded into memory as the view distance is very much increased by comparison - I find this test more demanding than the typical Hogsmeade test and can see shadows- and LOD pop-in on distant geometry.</div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjJPCceWIW3D6R7DjcERXmSZcNcyaD1vpervTuYmYMbP__0TIZ1B7uB-Hm84ug_GyXyf97UMkvOhbveV0kcPQT0QvmkgQiVbdcvy7orGQd_eww7N3JG6Lv9bxjT8GChbfkGv1bT-aQzGpV280QuaegEvMnESdK4U9nVniYTyyJellCmeuoIxYfta9bZnCg/s1271/Cyberpunk_Hogwarts.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="350" data-original-width="1271" height="176" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjJPCceWIW3D6R7DjcERXmSZcNcyaD1vpervTuYmYMbP__0TIZ1B7uB-Hm84ug_GyXyf97UMkvOhbveV0kcPQT0QvmkgQiVbdcvy7orGQd_eww7N3JG6Lv9bxjT8GChbfkGv1bT-aQzGpV280QuaegEvMnESdK4U9nVniYTyyJellCmeuoIxYfta9bZnCg/w640-h176/Cyberpunk_Hogwarts.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"></td></tr></tbody></table><br /><div style="text-align: justify;">Cyberpunk, especially, seems to enjoy running on the RDNA3 architecture with around a 20 % performance increase over RDNA2 - something not shown before when testing the RX 7600.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Next, we come up against titles which are more strenuous for hardware to run, but not necessarily super performant, either.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Starfield sees another 20 % increase*, likely to do with some of the reasons <a href="https://chipsandcheese.com/2023/10/15/starfield-on-the-rx-6900-xt-rx-7600-and-rtx-2060-mobile/">outlined by Chips and Cheese regarding RDNA2's bottlenecks</a>.</div><blockquote><div style="text-align: justify;"><b><i><span style="color: #274e13;">*This was performed before the current beta update...</span></i></b></div></blockquote><div style="text-align: justify;">Alan Wake 2 also shows a good 15 % increase between the two architectures. </div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgkGR728vxsi9zSLjwjW-SniHBMoiGvsYRacAwpREN8umzZTW0elPIqOGfyUquWtpEA4oBXEyPAyeCiLJWpQKE9yxr_6mhyphenhyphenaPfatcQjgI1JiyK7j4s5q-HgXZPgK0R-U3HbXHc4igXDD7ZaipPZxngi19UKBMSFvSuDK3AO1i5AcwyZO729XPCm6fLkwQo/s1270/Starfield_Alan%20Wake%202_Superposition_Metro.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="349" data-original-width="1270" height="176" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgkGR728vxsi9zSLjwjW-SniHBMoiGvsYRacAwpREN8umzZTW0elPIqOGfyUquWtpEA4oBXEyPAyeCiLJWpQKE9yxr_6mhyphenhyphenaPfatcQjgI1JiyK7j4s5q-HgXZPgK0R-U3HbXHc4igXDD7ZaipPZxngi19UKBMSFvSuDK3AO1i5AcwyZO729XPCm6fLkwQo/w640-h176/Starfield_Alan%20Wake%202_Superposition_Metro.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"></td></tr></tbody></table><br /><div style="text-align: justify;">Finally, Metro Exodus sees an above 10 % improvement, with increasing performance as the difficulty of running the game gets harder at higher settings. This potentially indicates that with heavier workloads, the gap widens between the two architectures when given the same resources.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Superposition doesn't show as big of a gap here but that does make sense, given that we've already shown that it improves with memory frequency and my RDNA2 part is overclocked to narrow the gap...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><strike>One last thing to note. In these iso-clock comparisons, the RX 6800 is consuming around 140 W whereas the RX 7800 XT consumes around 170 W (under-reported by the AMD cards). Yes, you're getting more performance but there is an energy cost associated with that performance! Unfortunately, the increase in power usage is generally higher than the actual performance increase...</strike></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><b><i><span style="color: #274e13;">It's been pointed out to me that I was misunderstanding the power references in HWinfo, so I'm just putting this conclusion on hold until I can be more certain about what the data is actually saying. I will update here when I do!</span></i></b></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">Rationalising the product stack...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">When set at the same core frequency, there's a generational difference in performance that varies, depending on the workload. When averaged accross all the tests, we're looking at a real-world architectural uplift of ~15 % for Navi 32 vs 22 silicon from my own data.</div><div style="text-align: justify;"><b><i><span style="color: #274e13;"><blockquote>*Minus Superposition, as I believe that the memory scaling is not typical in real world applications.</blockquote></span></i></b></div><div style="text-align: justify;">If I take a look at the <a href="https://www.techpowerup.com/review/amd-radeon-rx-7800-xt/31.html">TechPowerUp review data</a>, I find that RDNA 2 performance scaling has a logical progression (taking into account clock speed and compute unit count): we see the expected reduction in ability to make the most out of increased compute resources. On the other hand, RDNA 3 oscillates up and down in performance as it scales in compute resources. At face value, this appears confusing but I think it actually has roots in <a href="https://hole-in-my-head.blogspot.com/2023/08/what-went-wrong-with-rdna-3.html">what I posited last time</a>, and alluded to at the beginning of this article.</div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEifD0lxwQbWzEsugm4Xzwh5quYaoVnD9OqnJ6LF0WVmqlKr0HQKhc6tkP4IbJ0V06fCXg6f1ZlA4dq19SlflK4-2wMDUzWlPO1xvS7nWzmwqIZQCCIkXT-7NM8Vi1vfKuGiMTeyX3Yj5LPoY4Tv0Q1J6NcCV-IkHeMBs27-gvhXujMuAeGD87CORFvryWE/s435/Perf%20scaling.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="169" data-original-width="435" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEifD0lxwQbWzEsugm4Xzwh5quYaoVnD9OqnJ6LF0WVmqlKr0HQKhc6tkP4IbJ0V06fCXg6f1ZlA4dq19SlflK4-2wMDUzWlPO1xvS7nWzmwqIZQCCIkXT-7NM8Vi1vfKuGiMTeyX3Yj5LPoY4Tv0Q1J6NcCV-IkHeMBs27-gvhXujMuAeGD87CORFvryWE/s16000/Perf%20scaling.PNG" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Per compute unit scaling is all over the place for RDNA 3...</b></td></tr></tbody></table><br /><div style="text-align: justify;">As a result of moving the L3 "infinity" cache off-die onto the chiplets, there is a large hit to performance each time one of those chiplets is removed. So, the act of removing the bandwidth granted by otherwise having that chiplet results in below expected performance. There will also be lower data locality in the L3, further increasing latency to access data by having to jump out to the VRAM more frequently - and apparently, the data is implying that there isn't always a high enough hitrate on two of those products in the RDNA 3 stack relative to their compute demands. Hence why the RX 7700 XT and RX 7900 XT have worse performance scaling compared to their compute resources than the "full" chip RX 7800 XT and RX 7900 XTX.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">With that in mind, we can apply the same logic to the already released parts to understand where, in terms of performance, they should be landing if they also experienced the same level of uplift as the RDNA 2 architecture:</div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiJUsuDe2K9zGJqeM3B8Weh_KVgSDQ__YWgfCs_J_BdnvdzLf5MSNV36RzsKgvHxWG8uy0AEcgk0ofOrIxKys01pz_SfDSMWKPB5gdqbulnBzyCKLqZOhBK8tOyaBUYJHZtVTHMRb2flxr2TiEzulypTPdwaWajTgZY8hX7fUX47K4BZPflDUYHHdFKjgM/s537/Perf%20scaling%203.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="327" data-original-width="537" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiJUsuDe2K9zGJqeM3B8Weh_KVgSDQ__YWgfCs_J_BdnvdzLf5MSNV36RzsKgvHxWG8uy0AEcgk0ofOrIxKys01pz_SfDSMWKPB5gdqbulnBzyCKLqZOhBK8tOyaBUYJHZtVTHMRb2flxr2TiEzulypTPdwaWajTgZY8hX7fUX47K4BZPflDUYHHdFKjgM/s16000/Perf%20scaling%203.PNG" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>The main take-away here is that RDNA3 is severely constrained by the L3 cache access at the higher end and by the lack of increased <strike>L0 and L1</strike> front-end clock at the low-end...</b></td></tr></tbody></table><br /><div style="text-align: justify;">The RX 7900 XTX and RX 7800 XT both show a good performance uplift per CU versus their RDNA 2 "counterparts" (18% and 21% respectively). Meanwhile, the RX 7700 XT falls short - with an RDNA 2 equivalent slightly beating the 7700 XT by 3%. The RX 7900 XT does still beat its 'equivalent' part but only by 6%.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">At the lower end of the stack, the RX 7600 is nowhere near meeting any sort of generational performance uplift (as we already know) and since this part has the dual-issue improvement, and L0, and L1 cache size increases*, this cannot only be due to data locality as well. This also implies that the generational performance uplift I'm observing in the RX 7800 XT is mostly due to the front-end clock increase and not the dual-issue FP32 compute units. But let's have another look at the architectural differences, using <a href="https://beta.nemez.net/projects/gpuperftests/">Nemez's microbenchmarking tool</a> to look at specific instructions running on the two different architectures:</div><div style="text-align: justify;"><b><i><blockquote><span style="color: #274e13;">*Incidentally, I've found some glaring errors in the <a href="https://www.techpowerup.com/gpu-specs/radeon-rx-7600.c4153">information provided</a> on <a href="https://www.techpowerup.com/gpu-specs/radeon-rx-7800-xt.c3839">TechPowerUp's GPU database</a>. It appears that they have erroneously indicated that the L0 and L1 caches were not increased on the N32 and N33 products. However, looking into <a href="https://chipsandcheese.com/2023/06/04/amds-rx-7600-small-rdna-3-appears/">Chips and Cheese's benchmarking</a>, (and my own testing for N32) it seems clear from the results that the cache sizes are indeed matching N31's. It is only the L2 and L3 caches which are reduced in size for each Navi die and via the infinity cache/memory interface...</span></blockquote></i></b></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiBCuMcnevqs1wHlNxNjwmHq_rB58sMkCBrInT9HPqSuRONVPtLl1Rk0eYP07p1NN1dl_zacYtgrlRtLu1I-yuKos0YVxCJxgYAbaS-9AAyc1rlHALqs9OMuyFvRGX8oXEty_zSz2AQpvznwo5dd6sHEP-Qr6uGrmY2SMM0xyl9lAuJhNp94zRJV0SjM7w/s993/Nemez_1.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="596" data-original-width="993" height="384" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiBCuMcnevqs1wHlNxNjwmHq_rB58sMkCBrInT9HPqSuRONVPtLl1Rk0eYP07p1NN1dl_zacYtgrlRtLu1I-yuKos0YVxCJxgYAbaS-9AAyc1rlHALqs9OMuyFvRGX8oXEty_zSz2AQpvznwo5dd6sHEP-Qr6uGrmY2SMM0xyl9lAuJhNp94zRJV0SjM7w/w640-h384/Nemez_1.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"></td></tr></tbody></table><br /><div style="text-align: justify;">For these tests, I'm once again keeping the core clocks locked at 2040 - 2060 MHz on the RX 6800 and keeping the core clock 'mostly locked' between 2000 - 2060 MHz on the RX 7800 XT - AMD's software just doesn't control it that well!</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Straight away, the improvement to the FP32 pipeline is immediately obvious in almost all instruction types - though some have a greater improvement than others. What is interesting to me is the Int 16 improvement in RDNA 3 which I have not seen mentioned anywhere. An additional curiosity is the lack of gains in FP64 (not that it's really that useful for gaming?) given that I've seen it said that the dual-issue FP32 can run as an FP64 instruction as long as the driver is able to identify it. So, maybe this is purely the way this programme is written.</div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjfEzpkvAPri2gg6zgprjWQhqqzoK2UsTmM-CLhG9Raanpf1lVB7pWtBKbYOVynnUoMbMS9scwNg3zH2AlKde20VRTZfW9yQ9cv-_XxDro2HZuGmtVB-5AjykSH8sWCQi1Aqdd3CubiuJatbbammoH6LwEo8EOEfgtUFD-l_iV5cQ9muvDRnFw4tN_w2gw/s991/Nemez_2.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="597" data-original-width="991" height="386" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjfEzpkvAPri2gg6zgprjWQhqqzoK2UsTmM-CLhG9Raanpf1lVB7pWtBKbYOVynnUoMbMS9scwNg3zH2AlKde20VRTZfW9yQ9cv-_XxDro2HZuGmtVB-5AjykSH8sWCQi1Aqdd3CubiuJatbbammoH6LwEo8EOEfgtUFD-l_iV5cQ9muvDRnFw4tN_w2gw/w640-h386/Nemez_2.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"></td></tr></tbody></table><br /><div style="text-align: justify;">Elsewhere, the RDNA 2 card matches the performance quite well and even beats the RDNA 3 card in some instruction throughput - but these are generally not useful for gaming workloads.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Moving over to looking at the available bandwidth, we again see how the RX 7800 XT outperforms the RX 6800 - storming ahead between 20 KiB and 2 MiB, and 12 to 64 MiB sizes. The RDNA 3 part clearly has a big advantage even at "iso"-core clocks. However, once you have the clock frequencies cranked up to stock (or higher) there's just no competition between the two cards - the RX 7800 XT matches or outperforms the RX 6800 in every metric except for Int 64, which really isn't relevant to gaming.</div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhuXXm8Tk3uJIzv-YXfCb2EMcnAFhLAWyRg9x4eBUyOcDOLMfrlZtifMsNzXvrif3Bh_CUbPT1kcx51k-zEF7MHoFHcxqIbgVQHs7Sj4L3-HPr1yOAe6nmKhnNIjoS2x6e5pF3IKlXJkeqWQi2Nvy3f5PLwkIOJKzS5SG-MqZZauMFihfteuxYLuNxpIeQ/s873/Nemez_3.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="449" data-original-width="873" height="330" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhuXXm8Tk3uJIzv-YXfCb2EMcnAFhLAWyRg9x4eBUyOcDOLMfrlZtifMsNzXvrif3Bh_CUbPT1kcx51k-zEF7MHoFHcxqIbgVQHs7Sj4L3-HPr1yOAe6nmKhnNIjoS2x6e5pF3IKlXJkeqWQi2Nvy3f5PLwkIOJKzS5SG-MqZZauMFihfteuxYLuNxpIeQ/w640-h330/Nemez_3.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Comparison between RDNA 2 and 3 bandwidths per data size at the same core (shader) clocks...</b></td></tr></tbody></table><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Speaking of operating frequencies, we see an interesting behaviour in the RDNA 3 part - at lower core (shader) clocks, the front-end is essentially equal. Whereas, as the shader frequency increases, the front-end frequency moves further ahead so that, by the time the shader clock is around 2050 MHz, the front-end is ~2300 MHz. Additionally, though I've not shown it below, at stock, the front-end reaches ~2800 MHz when the shader clock is ~2400 MHz.</div></div><div style="text-align: justify;"><br /></div><div style="text-align: left;"><div style="text-align: justify;">This seems like a power-saving feature to my eyes - and it's not necessary to raise the front-end clock when the workload is light or non-existant!. There's no benefit!</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjvoeefEdNJXCnr4UF8Oo2iINV7P2UFeQ1cLXGJghbWC7HuG-nqssSOoP9KqX6C69vGUcF_xjy7-zSAmzcpKHiMKYoV1_DAGA5RJKgR0cQVLaN4nqfi7TjgUFQC8vo_6nLUPxV5iQ-fUDXhOWGHs5qgTpCbxEZXomg9m6VadtEH2okxktMg0rsP-4B0K4k/s1063/front-end_core%20frequency.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="326" data-original-width="1063" height="196" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjvoeefEdNJXCnr4UF8Oo2iINV7P2UFeQ1cLXGJghbWC7HuG-nqssSOoP9KqX6C69vGUcF_xjy7-zSAmzcpKHiMKYoV1_DAGA5RJKgR0cQVLaN4nqfi7TjgUFQC8vo_6nLUPxV5iQ-fUDXhOWGHs5qgTpCbxEZXomg9m6VadtEH2okxktMg0rsP-4B0K4k/w640-h196/front-end_core%20frequency.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>The core clocks for the RX 6800 and the core vs front-end frequencies for the RX 7800 XT in Metro Exodus...</b></td></tr></tbody></table><br /><div style="text-align: justify;">What's interesting here, is that <a href="https://chipsandcheese.com/2023/06/04/amds-rx-7600-small-rdna-3-appears/">Chips and Cheese documented</a> that the cards based on N31 also use this same trick whereas the N33-based RX 7600 actually clocked the front-end consistently lower than the shader clock, whilst also having lower latency than the N22 (RDNA 2) cards it was succeeding. Implying that there's some architectural improvement in how the caches are linked.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">All of this information is then quite confusing when taken as a whole. For the higher-end parts, it seems that bandwidth is vitally important to the L3 cache in order to operate efficiently, along with L3 cache size. It also appears that the increased front-end clock frequencies help a lot. I <i>think</i> that increasing the size of the L0 and L1 caches should help with data locality which <i>should</i> help with instruction throughput and specifically with improving the use of the dual-issue FP32 capabilities of the re-worked compute units. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">However, looking at the performance of N33, we see that the increase in L0 and L1 caches has had no apparent performance increase, that the card is data-starved due to the small L3 cache and that the lack of front-end core clock increase is also hamstringing the card compared to N23.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">All of these design choices negate some amount of potential performance gains through the large latency and energy penalty for more frequently accessing not only the chiplets but further out to VRAM.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">Conclusion...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">In this very empirical overview, it is clear that, ignoring the increase in core (shader) frequencies, the RX 7800 XT has an architectural performance increase over the RX 6800. This also extends to the full N31 product (7900 XTX) as well. However, AMD's choice in reducing the L3 cache sizes for Navi 31 and Navi 32 appears to significantly hinder their overall performance. Additionally, the choice to move that L3 cache onto chiplets has resulted in a significant increase in energy use, and an over-dependence on the bandwidth to those chiplets. It also appears to be the case that there is an overhead for fully utilising the L3 cache, with performance dropping even before that limit is reached.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">I didn't mention it in this blogpost, but the RX 7600 also doesn't have N31 and N32's increased vector register file size (192 KB vs 128 KB). However, since I don't have an understanding of how I could measure the effect of this on performance, I have decided to gloss over it - especially since Chips and Cheese <a href="https://chipsandcheese.com/2023/06/04/amds-rx-7600-small-rdna-3-appears/">do not appear to be overly concerned about it</a> affecting N33's performance due to its lower CU count and on-die L3 cache.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">What does appear to affect performance negatively is the choice to not clock the front-end higher on N33 and this is likely the source of a good amount of the observed performance bonus between the RDNA 2 and 3.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h4 style="text-align: justify;"><span style="color: #274e13;">So, where does this leave us?</span></h4><div style="text-align: justify;"><br /></div><div style="text-align: justify;">From my point of view, it appears that AMD made some smart choices for RDNA 3's architectural design which are then heavily negated by the inflexibility caused by going with the chiplet design and the need to bin/segregate in order to make a full product stack. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Moving to chiplets has also had the knock-on effect of increasing power draw (and likely heat), which has a negative impact on the ideal operating frequencies that each design can work at which has hindered the performance of the card. Just looking back at Metro Exodus, increasing the stock settings on the RX 7800 XT to +15 % power limit increases performance by 4 % (though this is only 3 fps!) showing that the card is still power limited as-released and may potentially see a bigger benefit to reducing operating voltage than RDNA 2 cards did.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Additionally, the RX 7600 appears hamstrung by the lack of increased front-end clock - perhaps due to power considerations? - and it is the choice to decouple front-end and shader clocks that seems to me to be the biggest contributor of RDNA 3's architectural uplift as it is this aspect which appears to allow the other architectural improvements to low-level caches and FP32 throughput to really shine.</div></div>Unknownnoreply@blogger.com10tag:blogger.com,1999:blog-7560610393342650347.post-35007633018656064512023-11-02T19:14:00.001+00:002023-11-02T19:14:23.263+00:00Alan Wake 2 Performance Analysis...<div style="text-align: left;"><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhlnMJUWBsUZnqc0-GqeilpQ_3YZfaU2pO9OET36xcTAvt7yQUGeWLPHlVuruuj_bq05VFaFjHJGjEXDOpc5qZ-GrKQPC7OmuNgvYu7JGDYyoySWnbd-zVyM_TlQrSxKsXYKAg3QbNWvIAS3YUtq3HRgdS6KrN-hitNQEN664MN7ltuOfnxsRmQDKf_ltA/s1920/Title.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="1080" data-original-width="1920" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhlnMJUWBsUZnqc0-GqeilpQ_3YZfaU2pO9OET36xcTAvt7yQUGeWLPHlVuruuj_bq05VFaFjHJGjEXDOpc5qZ-GrKQPC7OmuNgvYu7JGDYyoySWnbd-zVyM_TlQrSxKsXYKAg3QbNWvIAS3YUtq3HRgdS6KrN-hitNQEN664MN7ltuOfnxsRmQDKf_ltA/w640-h360/Title.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"></td></tr></tbody></table><br /><div style="text-align: justify;">Alan Wake 2 is the new hotness in the games industry. Love it or hate it, you cannot deny the impact that it has had on the general conversation with regards to hardware, software, and game development. While I'm not as on-board with the near universal, unfettered praise for the title as most reviewers appear to be, I do find the discussions surrounding the hardware requirements and performance of that hardware near and dear to my heart.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">So, without further ado, let's take a look at how current mid-range hardware performs in this game and whether that actually lines up with the hardware requirements that the developers put out <i>just before </i>the game's release...<span><a name='more'></a></span></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">The test setup...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The test system is what many would likely call "high end" but it lies squarely in the middle of mid-to-high-mid-range for currently available hardware:</div><div style="text-align: justify;"><ul><li>Intel i5-12400</li><li>Gigabyte B760 Gaming X AX</li><li>Corsair Vengeance DDR5 2x16 GB 6400</li><li>Sapphite Pulse RX 7800 XT</li><li>MSI RTX 4070 Ventus 2X</li><li>XFX Speedster SWFT 319 RX 6800</li><li>Zotac Twin Edge OC RTX 3070</li></ul></div><div style="text-align: justify;">The AMD drivers used were 23.10.2 (since no new WHQL were released for this release so far).</div><div style="text-align: justify;">For Nvidia, I used the 545.92 drivers.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Regarding the settings, all settings used were at 1080p 'native' unless otherwise specified</div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiIrxfvN8lM4RnfMoEIkrsxevbqXn0oxI9ZbXvoV6NkY37yVOVjlB7AC7peYG_xmfQNnxB29lQ_3vJrQ4v0wekI1fcFNhBvVCo5kwKUGLwkpTfIosBABeBmbxgxqW60ZrwyMz1_zm-2-wtd2AYT9ydaYMVhlx0uFTNfrISiOL4xwc5YCxeynA7HqtYFfxM/s1920/header.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="1080" data-original-width="1920" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiIrxfvN8lM4RnfMoEIkrsxevbqXn0oxI9ZbXvoV6NkY37yVOVjlB7AC7peYG_xmfQNnxB29lQ_3vJrQ4v0wekI1fcFNhBvVCo5kwKUGLwkpTfIosBABeBmbxgxqW60ZrwyMz1_zm-2-wtd2AYT9ydaYMVhlx0uFTNfrISiOL4xwc5YCxeynA7HqtYFfxM/w640-h360/header.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"></td></tr></tbody></table><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">Asleep at the Wheel...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Everyone and their dog has been claiming that AW2 is one of the best looking games ever released thus far. From a art direction perspective, I'd say it's very high on the list of games of all time but I just cannot agree from a technical art perspective.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">During the time I've played I've felt significantly let-down based partially on the hype but also in comparison with other games on the market. Aside from copious amounts of pop-in on small objects (at max setting), along with world and object mesh geometry LOD being WAY too close to the camera many times (especially noticeable in the forest during Saga's missions there), the big no-nos from my point of view were the grainyness and lack of clarity in the shadows, reflections and the softness of the anti-aliasing solutions.<br /><br /></div><div style="text-align: justify;">Addressing the last one first, I found the image to be quite 'soft' with regards to the general presentation. I guess that's a purposeful choice from the developer but it's dissappointing when you increase the resolution and do not really get a clearer image. In a way, the cynic in me would say that it's like this because it masks potential softness introduced when using an upscaler - so there is not a large difference between 1440p Native and with Balanced or Performance upscaling at that output resolution...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">As a result, Alan Wake 2 looks the "same" at whatever resolution you run it at. I could not tell the difference between native 1080p and 2160p FSR/DLSS performance (1080p render with upscale) or even balanced/quality upscaling... there was not really any improvement in clarity. Which, well, you can argue that if everyone's experience is the same then that's an artistic choice and a 'great leveller' but if you have more powerful hardware, you'd want to see a benefit.</div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgxElvjngwC2YRhdQxiij19SDEVVU9wqgkUWJdvs_3a0lkv4EBeQ-JGkQGwp6zDS1NdUHAUeeUsetJOYJbJrYAoeUtGL2scup_rVhTEaUFK2L1h9jUaENHfO64Zv0B5ENcUaPMU5EEYHNNs4otACBdn2uRlRjy1X6OvFTOR1QsYFHe1VSXvi5ghBySYdtQ/s1920/AlanWake2_2023_10_29_18_49_52_574.png" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="1080" data-original-width="1920" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgxElvjngwC2YRhdQxiij19SDEVVU9wqgkUWJdvs_3a0lkv4EBeQ-JGkQGwp6zDS1NdUHAUeeUsetJOYJbJrYAoeUtGL2scup_rVhTEaUFK2L1h9jUaENHfO64Zv0B5ENcUaPMU5EEYHNNs4otACBdn2uRlRjy1X6OvFTOR1QsYFHe1VSXvi5ghBySYdtQ/w640-h360/AlanWake2_2023_10_29_18_49_52_574.png" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>There's no ability to affect the sharpness of the image in the graphics settings menu...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The lack of clarity in the shadows and the way that transparencies are handled was really a big disappointment for me. Shadows are grainy - even at "native" resolution. It's harder to observe in the forest but the problem still persists. It <u style="font-style: italic;">does</u> go away when utilising Nvidia's ray reconstruction but that's just a terrible cop-out. I've seen people say that the game is using signed distance fields in order to handle some of these things but I've played other games using that technology and the performance and visual presentation was better.</div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgN4TRKsmUU6Ak5h_gQhOwgSVLYqN650-7ud8SNKH5Tv9TAMpOLi-Ph95x8cEOKYauDBF4zSf_wp6lobCwNJdFtqLFP2xq-Kx07FhqhInkDqHkPiDijC73ipSGxOPaq0ztFvThFJVmSL_sBj1AOa82gLwInRrIDPS6hyphenhyphenSIUYATkiQCMzgqgeTH5ICmrHtQ/s3551/shadows_transparency.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="1080" data-original-width="3551" height="194" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgN4TRKsmUU6Ak5h_gQhOwgSVLYqN650-7ud8SNKH5Tv9TAMpOLi-Ph95x8cEOKYauDBF4zSf_wp6lobCwNJdFtqLFP2xq-Kx07FhqhInkDqHkPiDijC73ipSGxOPaq0ztFvThFJVmSL_sBj1AOa82gLwInRrIDPS6hyphenhyphenSIUYATkiQCMzgqgeTH5ICmrHtQ/w640-h194/shadows_transparency.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Screenshots taken at high settings, no RT or PT... Predator glasses, stippled shadows, completely unrealistic shading and transparency on that water bottle (which is a blue plastic when seen head-on without a background light).</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;">I think a lot of the big excitement around this game from a visual perspective is from people taking screenshots at max settings, max resolution, with max path tracing, probably with ray reconstruction on and with DLSS upscaling along with frame generation. That's not a realistic playing scenario for the vast majority of players in the here and now. Yes, I agree - it does looks amazing and impressive, but that's not the game most people will experience.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><div><br /></div></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj6eTYKVii6CT2JRUy67jVJ3Zi7vf00CI6O9q4R6Ci3_IicAFu2FjwYH3xsuOm2PeiqBaLzv_LrNpU3c6FAF-mU5r0FzCDq0cRRoqubIHT_OPNFbggeJdt_Zo0pNcDWsJw2hJ4j208rYa4vL9f_QwvMTB38zy_2v72XSHl3oXnM-tA0u0xmOxiTV9nqtfU/s3388/relections_shadows.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="1440" data-original-width="3388" height="272" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj6eTYKVii6CT2JRUy67jVJ3Zi7vf00CI6O9q4R6Ci3_IicAFu2FjwYH3xsuOm2PeiqBaLzv_LrNpU3c6FAF-mU5r0FzCDq0cRRoqubIHT_OPNFbggeJdt_Zo0pNcDWsJw2hJ4j208rYa4vL9f_QwvMTB38zy_2v72XSHl3oXnM-tA0u0xmOxiTV9nqtfU/w640-h272/relections_shadows.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>More issues with reflections and shadows...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;">FSR and DLSS are a bit of a disaster in this title. They accentuate the grainy, undefined look of the image presented to the screen when utilising ray tracing modes, pressenting the user with a pixellated mess. However, the newly implemented "ray reconstruction" technology from Nvidia clears these issues up very nicely. I presume that this is solely due to the low quantity of rays traced within a scene to draw it out and light/reflect it? Or possibly, the default denoising algorithms just aren't very good/clean...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">So, I find myself questioning this state of affairs on two fronts - when the game was being developed, the RTX 4090 didn't exist, nor did ray reconstruction (at least it was not announced and I refuse to believe that Remedy were made aware of the tech in any reasonable time in advance of the release date - i.e. more than a year out). Does this mean that the game honestly looked pretty terrible from a technical standpoint and that was how they intended to release it?</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The other side of the coin is did they just not bother trying to optimise raytracing for AMD cards? What role is the release playing on the console side of the equation? It seems it's adding no benefit to RT*.</div><div style="text-align: justify;"><b><i><span style="color: #274e13;"><blockquote>*Of course, I don't expect AMD cards to perform as well as Geforce cards in ray tracing due to the lack of physical silicon dedicated to the feature but other studios and engines have shown that it is possible to have better performance!! The RT mode performs worse than other, equivalent titles on AMD hardware...</blockquote></span></i></b></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiWCu9Qtd1DrNl39p7Z-pQ8Kz6X-I6_XKyu9NYhualY_OuVQwcDIfBr_n7_RMIMNXkadhjQfsBpsrDSTJ7Mx1vwBAp-JPOFxv1MxlaJ40zWNwOFkNmMLljJvdHe64vsmmoMU79zg6fifiyvbhN8N5rOUw0U0nk8Xm_ArmaBlWFH6xS9ZUpvI67fNTCn_Gs/s1920/ghosting_temporal%20instability.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="1080" data-original-width="1920" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiWCu9Qtd1DrNl39p7Z-pQ8Kz6X-I6_XKyu9NYhualY_OuVQwcDIfBr_n7_RMIMNXkadhjQfsBpsrDSTJ7Mx1vwBAp-JPOFxv1MxlaJ40zWNwOFkNmMLljJvdHe64vsmmoMU79zg6fifiyvbhN8N5rOUw0U0nk8Xm_ArmaBlWFH6xS9ZUpvI67fNTCn_Gs/w640-h360/ghosting_temporal%20instability.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>It's hard to capture but, in motion, Framegen causes intense banding on the blacks in the image, along with very strong ghosting. Additionally, even with medium raytracing settings, you're getting terrible temporal instability - just look at the edges of the character on the still image of Saga...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><div>Aside from the PC requirements (above) being very high - there were commentators saying that the "low" settings on one game are not equivalent to another. Well, that is true. However, it conveniently side steps the fact that <i>typically</i> low settings look pretty poor. If 99 out of 100 games work like this, <a href="https://hole-in-my-head.blogspot.com/2023/10/in-defence-of-older-hardware.html">it is a ridiculous expectation</a> that developers of a single game that decides not to do this would think that gamers would understand the settings nomenclature in a different manner. That's just deluded crazy talk.</div><div><br /></div><div>In my opinion, Remedy had terrible communication regarding the system requirements; how good they are; the expected image quality; <i>AND</i> the requirement for mesh shaders - which took the industry by surprise and which should have been front and centre on the system requirements image. Hopefully, they will learn from this experience and improve that point going forward...</div><div><br /></div><div><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhAUrzNTvwa_TFDy3nzTy-A3qjXw38iZnESAJdKeMYG7G6RUAHrTO0vXQfGat08fm4-wiYPVeKnrAitEBoaPnAO3D6zr-in47GpECqm73aYSffDdzmveDS1Jv5Xr1D0oGstAewnjhq-kMjFqtc-3cAbGcm57UqPAs3dEnhDElOuwswds5LqxqaT2vAqXPQ/s1920/Ray%20reconstruction%20comparison.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="1080" data-original-width="1920" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhAUrzNTvwa_TFDy3nzTy-A3qjXw38iZnESAJdKeMYG7G6RUAHrTO0vXQfGat08fm4-wiYPVeKnrAitEBoaPnAO3D6zr-in47GpECqm73aYSffDdzmveDS1Jv5Xr1D0oGstAewnjhq-kMjFqtc-3cAbGcm57UqPAs3dEnhDElOuwswds5LqxqaT2vAqXPQ/w640-h360/Ray%20reconstruction%20comparison.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Ray reconstruction (right) looks significantly better than without (left). Of course, AMD doesn't have this tech and, how did the game look before this feature was even shared with the developers...?</b></td></tr></tbody></table><br /><div><br /></div><div>Anyway, with that pretty negative look at the visual quality of the game, let's move onto a more positive (?) look at the frame presentation performance...</div><div><br /></div><div><br /></div><h3><span style="color: #274e13;">Banging on the door...</span></h3><div><br /></div><div>One of the benefits of targeting mid-range hardware in my reviewing, is that a lot of these modern titles are not holding back in their visions or, conversely, not able to optimise enough so the types of graphics cards and CPUs they are targetting lie <i>directly</i> within my sphere. </div><div><br /></div><div>Let's take a look back up there at the system requirements:</div><div><ul><li>Recommended: Medium settings, performance upscaling, 1080p60fps, R7 3700X + RTX 3070/RX 6700 XT</li><li>Ultra: High settings, performance upscaling, 2160p60fps, R7 3700X + RTX 4070/RX 7800XT</li><li>Low RT: Medium settings, RT low, Quality upscaling, 1080p30fps, R7 3700X + RTX 3070</li><li>Medium RT: Medium settings, RT medium, Quality upscaling, 1080p60fps, R7 3700X + RTX 4070</li></ul></div><div><br /></div><div><div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgLoylxEJTJoL5mlwjU9FQnTlQcgsg4pxZZpKAn84UsTa-BZFABVNtNd3Yau-b28VDG4Ht_7eZMeJaUq5xVXoAtrfBUedHLQ4fQNeBIz2nePK6gRDX88TEYa3_X9cLUzdrds52OTrfqRj3eVBB4iAOauMbeotvCzKRuJWFYLfsK3Y9PPo3Fw8IDmznSCs8/s508/Requirements%20comparison.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="248" data-original-width="508" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgLoylxEJTJoL5mlwjU9FQnTlQcgsg4pxZZpKAn84UsTa-BZFABVNtNd3Yau-b28VDG4Ht_7eZMeJaUq5xVXoAtrfBUedHLQ4fQNeBIz2nePK6gRDX88TEYa3_X9cLUzdrds52OTrfqRj3eVBB4iAOauMbeotvCzKRuJWFYLfsK3Y9PPo3Fw8IDmznSCs8/s16000/Requirements%20comparison.PNG" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Two of the specified settings appear to be good but for some reason the RTX 4070 never met the fps target set out in the requirements table. I am not sure why!</b></td></tr></tbody></table><br /></div><div><br /></div><div>I honestly don't know why outlets don't test the recommended requirements to corroborate what was communicated by developers. Just looking at the results above, we can see the testing for the RTX 3070 shows good results but weirdly, the settings for the RTX 4070 don't match what was indicated. Bear in mind that this is when using a system that is, in theory, better than what was listed in that requirements matrix.</div></div><div><br /></div><div>I don't know if this is a fault of the Nvidia driver or if there is an issue in the game for the RTX 4070 that hasn't been widely reported as yet...</div><div><br /></div><div><br /></div><h3><span style="color: #274e13;">Absolutisms...</span></h3><div><br /></div><div>With that out of the way, let's take a look at performance scaling for these four cards:</div><div><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj9FZ3Dl3b7mHSjkyC6PTa7JvzboYkXLL6UTybv5tC7sd51B-3ASnmqBDC49t54J8FzRQGWFWKApf3N0wAl3T_f82DBjMDkv1u8S2AKELlYAmyQLd0YAE0_9XisjUOhqzy6C2-IRI5fWw1-NumnOnS03e9pE7A4CrgkkGwT6yS3A2c1Cjg1Z_P906TokHY/s1199/RTX%20performance%20scaling.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="358" data-original-width="1199" height="192" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj9FZ3Dl3b7mHSjkyC6PTa7JvzboYkXLL6UTybv5tC7sd51B-3ASnmqBDC49t54J8FzRQGWFWKApf3N0wAl3T_f82DBjMDkv1u8S2AKELlYAmyQLd0YAE0_9XisjUOhqzy6C2-IRI5fWw1-NumnOnS03e9pE7A4CrgkkGwT6yS3A2c1Cjg1Z_P906TokHY/w640-h192/RTX%20performance%20scaling.png" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>This was the toughest segment of the game I have encountered thus far but both cards can have settings which would provide 60 fps...</b></td></tr></tbody></table><div><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg6a1QvzLKFrbCNpJXQLZpqeCbHzpP9j2k9Y7ZlZFOUU6Jf_D2ADMb6hvfJC4XWHo09ca3yC46V1a8yRP7bsXCMMSN2Np3JRFXhZ8weBDIwm6ZgWjROlDWXrBh1Sw0dE0Wqeqqxh-OrjOyTJwony3XD14tUgWHv_dIMEfs2xkqrMPX_eLXmYsvnxcqnNhg/s1203/RX%20performance%20scaling.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="358" data-original-width="1203" height="190" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg6a1QvzLKFrbCNpJXQLZpqeCbHzpP9j2k9Y7ZlZFOUU6Jf_D2ADMb6hvfJC4XWHo09ca3yC46V1a8yRP7bsXCMMSN2Np3JRFXhZ8weBDIwm6ZgWjROlDWXrBh1Sw0dE0Wqeqqxh-OrjOyTJwony3XD14tUgWHv_dIMEfs2xkqrMPX_eLXmYsvnxcqnNhg/w640-h190/RX%20performance%20scaling.png" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Interestingly, the AMD cards perform better than their counterparts in rasterisation-only testing...</b></td></tr></tbody></table><br /><div><br /></div><div>Performance scaling, with respect to settings adjustment appears to be very limited in this title and this is made worse by the observation I made above regarding the visual muddiness of the game meaning that most of the settings really don't have that big of an impact on the quality of the presented image.</div><div><br /></div><div>We're looking at a maximum of around 17 fps difference between low and high settings for the RTX 4070 and RX 7800 XT - not a massive win, that's for sure! What <i>is</i> interesting here is that it does appear that there is a generational difference for both Nvidia and AMD cards: the current generation are performing better (relatively) than you would expect in this game compared to the <a href="https://www.tomshardware.com/reviews/gpu-hierarchy,4388.html">average difference across a range of titles</a>.</div><div><br /></div><div>Normally, you'd be looking at a ~10 - 15 % difference between the 6800 and the 7800 XT but here we're seeing a 25 - 30 % uplift in rasterised settings. For the 3070 to 4070, we'd expect a ~ 20 - 25 % uplift, whereas we observe a 30 - 40 % uplift. This might be due to more heavy use of compute resources on the cards <a href="https://x.com/Dachsjaeger/status/1719421701430116373?s=20">as postulated by Digital Foundry's Alex Battaglia</a> - which could be a fair assumption given the architectural changes in the RX 7000 series from AMD. It's possible that Nvidia's increase in the L2 cache is helping on their side of the fence given that AMD already increased their L3 cache last generation...</div><div><br /></div><div><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgdYM7FfhJ2NvmOAX-gXhMhcB9P2BwR8ijTYxLTqMNGV3VT8r0hR9vvMukesG4BBrV274XAFE2QagmyLCUadsLQBHls5J5d7RTxwuqnfCKRhw__gizusHltGpR75OSMNQKzYZwEuroP_q9enPindT9l-wFdkadglUU8DqA520sg5dWaimaRCVG38tJBHl0/s977/Frametimes_high%20settings.PNG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="572" data-original-width="977" height="374" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgdYM7FfhJ2NvmOAX-gXhMhcB9P2BwR8ijTYxLTqMNGV3VT8r0hR9vvMukesG4BBrV274XAFE2QagmyLCUadsLQBHls5J5d7RTxwuqnfCKRhw__gizusHltGpR75OSMNQKzYZwEuroP_q9enPindT9l-wFdkadglUU8DqA520sg5dWaimaRCVG38tJBHl0/w640-h374/Frametimes_high%20settings.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>There are some strange frametime spikes on the AMD side of things, <a href="https://www.hardwaretimes.com/alan-wake-2-pc-performance-nvidia-rtx-4090-is-up-to-4x-faster-than-the-amd-rx-7900-xtx/">as was noted by Hardware Times</a>, though I am experiencing them to a much lesser extent...</b></td></tr></tbody></table><br /><div><br /></div><div>Normally, I'd include frametime data in the bar charts, along with power usage but, for one, my RX 7800 XT doesn't (or won't) report power metrics to Frameview and, if you'll take a look up at the frametime graph examples above, the consistency of the sequential frame presentation is very good in this title - so I didn't particularly feel the need.</div><div><br /></div><div>I do want to point out those small spikes in the AMD graphs. I didn't find them noticeable at all and they didn't detract from the experience. The spikes are much smaller in magnitude and frequency thatn those reported by Hardware Times and this could possibly be explained by two issues cropping up at the same time in their testing. <a href="https://x.com/Duoae/status/1719047982929097212?s=20">I observed similar spikes</a> to the ones they reported due to an issue where the mouse input would frequently try and override the controller input (I mostly play with controller, these days due to RSI caused by heavy Quake 3 Arena playing in the early 2000s). </div><div><br /></div><div>If you're getting constant stuttering, try unplugging or turning your mouse over...</div><div><br /></div><div><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgG-o1ja0WA8JE6wBdP_5fsRUxWNK5CQP0v-W-re5koa8xCokX4Yn5jsoDh8t7zzYA0IzwSoKpsWOjCR16vBNBcGESmHYSJWX_i1BNKuUAHuX_QtUOqS7TqYii0sn4lAJjQgMmdA1R-R67-ICPh7hn519wRGV0S80aT51IAKbmWV14y-8fSHR_XQjI9lck/s1043/DLSS%20scaling.PNG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="311" data-original-width="1043" height="190" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgG-o1ja0WA8JE6wBdP_5fsRUxWNK5CQP0v-W-re5koa8xCokX4Yn5jsoDh8t7zzYA0IzwSoKpsWOjCR16vBNBcGESmHYSJWX_i1BNKuUAHuX_QtUOqS7TqYii0sn4lAJjQgMmdA1R-R67-ICPh7hn519wRGV0S80aT51IAKbmWV14y-8fSHR_XQjI9lck/w640-h190/DLSS%20scaling.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>I didn't have time to also test the RX 6800 and RX 7800 XT for FSR scaling but I presume it will be similar...</b></td></tr></tbody></table><br /><div>Finally, let's take a look at the effect of upscaling on performance.</div><div><br /></div><div>Despite the numbers being smaller, there is actually better scaling on the 30 series card over the current generation card - with an uplift of 54 % for the 3070 compared to 41 % for the 4070.</div><div><br /></div><div>When combining all the possible tools at the user's disposal, this means that 60 fps can be achieved on the RTX 3070 at high settings with 1080p DLSS Quality or balanced, though the image will look a little softer.</div><div><br /></div><div>There is one last point I would like to raise - the cost of using the available technologies to improve performance. </div><div><br /></div><div>Ray reconstruction typically had a frametime cost of around 4 fps, or around 5 - 10 % compared to simple denoising at the frame rates I was achieving with the RTX 4070. There was similarly a cost for utilising upscaling, too, with a ~20 % lower framerate going from 1080p native to 1260p Quality upscaling, and a ~30 % lower framerate going up to 2160p Performance upscaling. These are, in theory, all the same render resolution but the frametime cost for doing so is quite extreme... As a result, I think it is better to use native settings rather than rely on upscaling to a higher resolution, if you are able.</div><div><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh48UfN6SHI_a37N6GNVf8ImirOkGW1JW2G5sLAiC6Nloo1Uu9R-XHtXoLunSWrVE1YBDmkBcKvksNG9f9oSHGQUVXUvlOXskqOQEAk5R5sh6Oe4Th9e9iUUNSkAoOyDL_MiwQX7Wo9xHTfgGRErZTbHt0awSpNnSgDp1Ivbinw8jak3b-Pw4FBcDWdJEQ/s521/DLSS%20cost.PNG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="310" data-original-width="521" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh48UfN6SHI_a37N6GNVf8ImirOkGW1JW2G5sLAiC6Nloo1Uu9R-XHtXoLunSWrVE1YBDmkBcKvksNG9f9oSHGQUVXUvlOXskqOQEAk5R5sh6Oe4Th9e9iUUNSkAoOyDL_MiwQX7Wo9xHTfgGRErZTbHt0awSpNnSgDp1Ivbinw8jak3b-Pw4FBcDWdJEQ/s16000/DLSS%20cost.PNG" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>The cost in frametime for DLSS becomes more pronounced as the output resolution increases relative to the rendered resolution...</b></td></tr></tbody></table><br /><div><br /></div><div><br /></div><div><br /></div><h3><span style="color: #274e13;">Conclusion...</span></h3><div><br /></div><div>As an experience, so far, Alan Wake 2 is up there with Control for atmosphere, world-building and visual style. Aesthetically, it's a beautiful game which is, in my opinion, marred by some choices in technical visual presentation. Perhaps, I have some setting wildly wrong on my monitor or I am doing some other random thing incorrectly but I find the presented image very soft.</div><div><br /></div><div>Additionally, the game does not perform well on, what many would consider, pretty strong hardware. At low settings at native 1080p, the best I was able to achieve was 90 - 95 fps which, while is acceptable from a performance standpoint, falls far short of where I'd expect these cards to land in other, similarly good looking titles (yes, other games' high settings are as good or better than this game's low settings).</div><div><br /></div><div>Unfortunately, I did see this coming but it does disappoint me to say so - I picked up an RTX 3070 for 1080p high gaming. The same with the RX 6800. I never intended these GPUs to be targetting 1440p or 2160p gaming at all because, to be honest, I grew up playing games on low and medium graphics settings because I couldn't afford the higher end hardware and because I couldn't upgrade as often as I would have liked. It worked out okay but now that I have more disposable income I actually dislike that games and hardware are completely out of sync when it comes to the expectations of the consumer. </div><div><br /></div><div>On the hardware side, I expect a €500 - 700 graphics card to be able to manage 1440p60 high settings, without RT and upscaling. On the software side, I expect there to be fall-back systems in the engine to handle the cases where the best-of-the-best rendering is not able to be performed - that's the way it always was. </div><div><br /></div><div>In this instance, Alan Wake 2 sets up the future poorly and doesn't provide enough to the past. We are not expecting big gains in future GPU hardware generations because we have not received them over the last 1-2 (depending on SKU). So, sure, the developers can say they're targetting feature sets that can be enjoyed in the future but I am not so sure that will be the case. The visual quality of the game they've supplied in the here and now on mid-range hardware is not that much greater than Control (which I still think looks good!) and in some cases looks much worse, with much more powerful hardware. </div><div><br />The performance scaling of the game with respect to the options is pretty poor with a miserable 13 - 17 average fps between the low and high settings (which can be further tweaked to add around 20 - 30 fps with upscaling) on all four of the cards tested here... and those averages scale from 55 - 77 average fps, which is not extravagant in terms of user experience.</div><div><br /></div><div>In my opinion, the upscaling looks okay - it looks similar to the native image but that's because the developers chose to present the native image in a soft format, which is a negative in my opinion. <a href="https://www.techpowerup.com/review/alan-wake-2-performance-benchmark/4.html">There are already .ini tweaks</a> to improve the native visual output of the game and it wouldn't surprise me if those expand over the coming months.</div><div><br /></div><div>The quality and performance of ray tracing in this game is pretty poor considering the visual upside is incredibly minor. Path tracing does look good but <i style="font-weight: bold;">absolutely tanks</i> the performance to a level where Nvidia's frame generation is required, which also introduces graphical artefacts which, in my opinion, are not worth the trade.</div></div></div>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-7560610393342650347.post-3978201284989576502023-10-27T16:37:00.001+01:002023-10-27T16:43:24.560+01:00In Defence Of: Older Hardware...<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg2FfYgA9DuJP_PLKvHmk1S3Gm3aPwv4cmRPMliuvVpVnoboAvx4FUks7EHrykVYjhDY3IrDUZ5oBu1JvmcqSn5pDKqAAZbvycKnCavwuNfwvBE0QQYlj2vMoQ7Drmsv93hO9PLQ3qYm6dRNOSMYWTGqLe4iIXqkWz6yNqvpQrbaz_Ek_ImsiNSpr7WsXA/s1920/header.jpg" style="margin-left: auto; margin-right: auto; text-align: center;"><img border="0" data-original-height="1080" data-original-width="1920" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg2FfYgA9DuJP_PLKvHmk1S3Gm3aPwv4cmRPMliuvVpVnoboAvx4FUks7EHrykVYjhDY3IrDUZ5oBu1JvmcqSn5pDKqAAZbvycKnCavwuNfwvBE0QQYlj2vMoQ7Drmsv93hO9PLQ3qYm6dRNOSMYWTGqLe4iIXqkWz6yNqvpQrbaz_Ek_ImsiNSpr7WsXA/w640-h360/header.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b><a href="https://x.com/alanwake/status/1715412623468970091?s=20">Via Twitter</a>...</b></td></tr></tbody></table><div style="text-align: left;"><br /></div><div style="text-align: left;"><div style="text-align: justify;">The release of <a href="https://www.eurogamer.net/alan-wake-2s-pc-specs-are-here-and-theyre-as-scary-as-the-game-itself">the required PC hardware specifications</a> of Alan Wake 2, along with the revelation that RX 5000 series and GTX 10 series cards <a href="https://www.pcgamer.com/alan-wake-2s-pc-requirements-may-leave-amd-rx-5000-series-and-nvidia-rtx-10-series-users-high-and-dry/#:~:text=Eagle%2Deyed%20Redditors%20have%20been,don't%20support%20mesh%20shading.">would not be supported</a>* caused quite a stir in various online fora. I, myself, have not been overly happy with them but it may not be for the typical reasons that proponents of <i style="font-weight: bold;">advancing technology</i>(!!) would like to paint. At the same time, despite the hyperbole on both sides of the equation, I think there is room for reasoned discourse on the topic and platforms like Twitter and Reddit don't tend to promote or facilitate that. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">So, here goes...</div><span style="color: #274e13;"><div style="text-align: justify;"><b><i><blockquote>*They can run the game, just not to their normal relative performance envelope compared to other cards due to the fact that they do not support DX12 Ultimate mesh shaders.</blockquote></i></b></div></span><span><a name='more'></a></span><div style="text-align: justify;"><br /></div><span style="color: #274e13;"><h3 style="text-align: justify;">The times, they are a'changing...</h3></span><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Belive it or not but I'm generally quite a proponent of people upgrading their PC hardware. I <i>want</i> new games to push the technical boundaries and adopt new technologies (<a href="https://hole-in-my-head.blogspot.com/2023/01/yearly-directstorage-rant-part-3.html">where I feel it makes sense!</a>). In fact, I have a spiritual sibling article for this piece that I never published (see below).</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">I do, however, believe in balance and I feel that the proponents of "<i>games pushing the technological limits</i>" are missing or ignoring some very valid points - despite some of them actually stating them very obviously in their arguments but apparently completely missing the logical conclusion of those points... I wish to address those and other points in this blogpost.</div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjd9hNCiDnr1I85oGr-MbDOOKs0AI3H-_tQiSeeMBrdfVUSYD5t8UoDTWsa-iXp2QHP_wzWNdux9x5ZaXkbtgua2pAvmklI-dW_TTovnzbaU9QJfyniEj1CexhWyZuQgZU_iIQQRMxtta6q21xa_mz5vjIX7kPfy7DonUr56zG5PvvjmJr8vPRKpiXVHSE/s657/Unpublished%20post.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="385" data-original-width="657" height="234" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjd9hNCiDnr1I85oGr-MbDOOKs0AI3H-_tQiSeeMBrdfVUSYD5t8UoDTWsa-iXp2QHP_wzWNdux9x5ZaXkbtgua2pAvmklI-dW_TTovnzbaU9QJfyniEj1CexhWyZuQgZU_iIQQRMxtta6q21xa_mz5vjIX7kPfy7DonUr56zG5PvvjmJr8vPRKpiXVHSE/w400-h234/Unpublished%20post.PNG" width="400" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>A post that I may finally get around to publishing soon...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">The Past...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Proponents of the higher technical speifications of games like Alan Wake 2 like to spout that '<i>in the past we had to walk both ways uphill to school and back, in the snow - with no shoes!</i>'</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Yes, there is certainly an element of truth to this - in the 90s we had to deal with a lot of tinkering to just get games working and, very often, games might not even work on your specific combination of harware (either properly, or at all!). In addition to this, we had to upgrade our hardware - practically yearly - to keep up with the frenzied output of game developers who were pushing the envelope on the (back then) <u style="font-style: italic;">very small</u> PC gaming landscape. That is, of course, if we wanted to play the new games. Very often, we did without.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">In the 2000s, we had to upgrade our hardware less often but as I can certainly attest: it was still fairly frequent. You can see my personal cards from the year 2000-onward in the table below. Although many commentators state that the slowdown in game spec requirements happened in the 2010s, I actually found that the X1950 Pro (bought in 2006) lasted me through to 2010 - at which point it was beginning to become unusuable on new releases. So, for me at least, the slowdown occurred even before 2010...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">So, I have lived and gamed through all of that period. Is this an argument that we should move back to those expectations? I don't think so, but then I don't know what sort of point is being made because what happened in the past is irrelevant to what is happening today for multiple reasons...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhv9Oeymf46Dh8Tl7LWxc-iRNToIaSuxoG7PqX91JwD3vToE0Yqv1xaXdLVp4ilJP_YBmFaqUA6ZgKkKMO1VDn0G3A7DXroVYX25125GMTY_L_hmfkYCuG0CbT2G10Vx10rsQ_Z5AqC2EqJo97PXxL6QcioqsFTkXoM4bQn28pyux36kZIPYJdxm18QfPo/s365/GPUs.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="351" data-original-width="365" height="308" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhv9Oeymf46Dh8Tl7LWxc-iRNToIaSuxoG7PqX91JwD3vToE0Yqv1xaXdLVp4ilJP_YBmFaqUA6ZgKkKMO1VDn0G3A7DXroVYX25125GMTY_L_hmfkYCuG0CbT2G10Vx10rsQ_Z5AqC2EqJo97PXxL6QcioqsFTkXoM4bQn28pyux36kZIPYJdxm18QfPo/w320-h308/GPUs.PNG" width="320" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>The list of shame... or pride? Oh, okay - the latter cards don't count because I'm not buying them to use like I did in the past...</b></td></tr></tbody></table><br /><br /><div style="text-align: justify;">First up is that market conditions are completely different. The X1950 Pro was circa $200 - $250 on release. Similarly, this price point or below applies to almost every one of those cards up to the GTX 1060 with the notable exception of the Radeon 9800 Pro. It's also important to note that the release cadence for graphics cards was <i>insanely</i> fast! The venerable Anand was complaining about the 6 month cadence back in the early 2000s.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Of course, we know that release cadence slowed to once every year over the next decade and then to approximately every 2 years by the late 2010s and, now, it's inching towards every 2.5 years between the lower-end cards in the stack. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">On top of this, the expense of cards has just gone up and up - a $100 is now $300, a $200 is now $350 and their relative relative to the top end card in the stack has also decreased, too - so the performance increase per generation is decreasing (<a href="https://hole-in-my-head.blogspot.com/2022/02/the-rate-of-advancement-in-gaming.html">as I showed, here</a>).</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjqxarr9og9G92yUqTxfEAGiXXnGQxFGf1RCv2YHU_BxynlmvezW00OFCJglCqhwRmgtOzufAO8Bf_aNozFWRQjTYvC_OwjZYYPxsPoMPSkuOe1GuSSa1cv4CyuQz-7BWODy1AR08iuQRPnAtEkq70AbkS_IvbzkgJwWUy54OBMTrh7lCYYbr85S5aEJ-A/s862/6%20month%20product%20cycle.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="359" data-original-width="862" height="267" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjqxarr9og9G92yUqTxfEAGiXXnGQxFGf1RCv2YHU_BxynlmvezW00OFCJglCqhwRmgtOzufAO8Bf_aNozFWRQjTYvC_OwjZYYPxsPoMPSkuOe1GuSSa1cv4CyuQz-7BWODy1AR08iuQRPnAtEkq70AbkS_IvbzkgJwWUy54OBMTrh7lCYYbr85S5aEJ-A/w640-h267/6%20month%20product%20cycle.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Anand and other reviewers were critical of the GPU manufacturers - ensuring that they held them to task if they didn't provide enough performance throughout the stack per generation... (<a href="https://www.anandtech.com/show/570">AnandTech</a>)</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;">In this context, time doesn't mean the same thing as it did back then. Sure, some people say that in the past tech got outdated super fast and was unable to play games... Well, guess what? Your new hardware was doing 200% more performance compared to your old one because of the rate of improvement of the GPU generations. <a href="https://www.techspot.com/review/2686-amd-radeon-7600/">Nowadays</a>, <a href="https://www.techspot.com/review/2701-nvidia-geforce-rtx-4060/">at the low-end</a> <a href="https://www.techspot.com/review/2685-nvidia-geforce-rtx-4060-ti/">we're not getting improvements</a>! (And the price is still slowly increasing.)</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">So, between the fact that the gap beween graphics card generations is getting longer and that performance increase at the low to mid-range is getting smaller the only way to improve performance is to move up the GPU stack - to much more expensive cards. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">In this light, I don't buy the "suck it up" and "you are the problem" mentality of some of these commentators - 7 years in the 2000s is very different to 7 years in the 2020s. We're talking two, <u style="font-style: italic; font-weight: bold;">yes, </u><u style="font-style: italic; font-weight: bold;">TWO</u>, GPU generations versus around five to six (at a minimum).</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><blockquote class="twitter-tweet" style="text-align: left;"><p dir="ltr" lang="en">I bought an x800xt in 2004 and in 2007 it could not play any UE3 game/almost all ps360 titles on PC (dx 9.0b). Just 3 years later. As Doc has said, the long xbox360 and - to a degree - the ps4 gen broke some pc gamer's minds about PC part longevity due to the consoles low perf. <a href="https://t.co/os7PL5tb0j">https://t.co/os7PL5tb0j</a></p>— Alexander Battaglia (@Dachsjaeger) <a href="https://twitter.com/Dachsjaeger/status/1716684986303033715?ref_src=twsrc%5Etfw">October 24, 2023</a></blockquote><div style="text-align: left;"><br /></div><h4 style="text-align: left;"><span style="color: #274e13;">Perspective on resolve...</span></h4><div style="text-align: left;"><br /></div><div style="text-align: left;">There's also another aspect* to this difference in the time period, too: resolution.</div><div style="text-align: left;"><b><i><span style="color: #274e13;"><blockquote>*This was unintentional...</blockquote></span></i></b></div></div><div style="text-align: justify;">Mainstream gaming monitors did not really advance <i>at all</i> during the period of 1995 - 2008/2010. 99.99% of gamers were playing on 1024x768 CRTs and later 768p and 900p LCD screens. HDTVs were not appropriate for playing PC games on (for various reasons). CRT monitors are less sensitive to variable frame presentation and also odd frame rates (i.e. you can play at whatever output your card can handle and you don't need to implement a frame limit) and the LCD monitors of the late period of that time were a maximum of 60 Hz, meaning that graphics cards didn't have to push all that hard to achieve comfortably playable experiences.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Just looking back at the reviewsphere - GPU reviews didn't reliably start testing 1080p as a resolution <a href="https://hole-in-my-head.blogspot.com/2021/08/the-relative-value-of-gpus-over-last-10.html">until around 2010 - 2012</a> and Steve Walton (of Hardware Unboxed fame) was testing in <a href="https://www.techspot.com/review/359-nvidia-geforce-gtx-560ti/page9.html">weird 16x10 resolutions</a> (though I couldn't tell if this was on a CRT or LCD).</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Similarly, during the 2010s, 1080p60Hz was the stagnant resolution to target and gamers have benefitted from that stagnation just as they did in the 2000s.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Nowadays, in 2023, 1080p60Hz is a low-end display that, ideally, we should be moving away from as a target. In fact, due to prices beginning to fall, many gamers buying new monitors are getting 1440p and 4K monitors and TVs at 60 /120 Hz - often with variable refresh rate technology. Graphics cards should be targetting these resolutions going forward, with 1440p as a minimum and 4K moving into the crosshairs relatively soon.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">These are not trivial requirements and most low and mid-range graphics cards in both manufacturers current lineups are not up to the task, especially with the recent requirements coming from game developers!</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Yes, this is a GPU manufacturer problem - in the sense that they are not getting the advancements in process node that they used to get, or in architectural low-hanging fruit, but also in the sense that they are trying to cut costs and provide the minimum viable product to consumers, which means smaller dies and less VRAM and narrower memory buses for more money.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">That's not a good combination...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">So, again, I'm coming back to the point that time is not the same - saying that a 7 year old GPU is not a viable product any more is not as meaningful as it was in the past, when the high-end products can <a href="https://www.techspot.com/review/2525-geforce-gtx-1080-ti-revisit/">still keep up with or outperform</a> the lower end of last generation's graphics cards... and this is, of course, ignoring the fact that the RX 5000 series is only 4 years old.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">Pushing the boundaries...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><a href="https://www.reddit.com/r/AlanWake/comments/17g3yoa/alan_wake_2_pc_requirements_continue_a_remedy/">People are also pointing to prior Remedy titles</a> which moved the bar higher by implementing cutting-edge technical solutions, such as DLSS and ray tracing in Control, but this comparison rings hollow when we look at the actual context:</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">This isn't a case of <a href="https://www.eurogamer.net/digitalfoundry-2023-alan-wake-2-on-playstation-5-remedy-raises-the-bar-for-visual-accomplishment-once-again">moving the quality bar up</a> by implementing graphical or technical options at the high-end, this is lifting the floor by implementing a feature without any fall-back option for consumers with graphics cards which <i>should</i> be able to run the game at low settings.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">In contrast, moving back to that example of Control, both DLSS and ray tracing had those fall-back rendering options that consumers could (and did) avail themselves of in order to experience the game. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Even to this day, only a handful of titles are exclusively ray-traced. Almost all games have a traditional raster pipeline that they fall back on. Now, I understand that <a href="https://youtu.be/M5OGZ7qiYds?si=mibhuzQ34sVH9wDd&t=2233">development resources are limited</a> and that time and money are not infinite - this isn't a case of me trying to rake Remedy over the coals, here. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">What I <b style="font-style: italic;">am</b> trying to counter is this, quite frankly, ridiculous and dismissive notion that gamers are a problem because they can't play games that they would like or that older hardware shouldn't be supported, especially given that hardware is advancing at a much slower pace, now.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">Concluding remarks...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Given the huge costs of high-end hardware, what's going to happen in 5 -7 years? Are we saying that a $1500 card should be dropped because some new feature came out, despite it outperforming the low end new cards in most other titles? What about a $500 - $700 card every 3-4 years?</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Is time really a good metric to be using to justify support (or not) of a product? Or is it just a convenient excuse to dismiss legitmate concerns?</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Personally, although I can understand why Remedy has chosen not to optimise for cards that don't support the mesh shader tech (and let's be honest, this issue really doesn't affect me), I find the fact that they are able to release it on the Xbox Series S with its paltry GPU and memory combination but not an RX 5700 XT or a GTX 1080 Ti a little hard to swallow...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">I firmly believe that we should be supporting hardware as much as possible for the reasons above as well as many more.</div></div> <script async="" charset="utf-8" src="https://platform.twitter.com/widgets.js"></script>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-7560610393342650347.post-62679018328320008502023-10-14T10:50:00.003+01:002023-10-14T10:50:46.090+01:00RTX 40 series - aka "Did Nvidia jump the shark"...<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg_4fgrGUx6KT1MRICCn5YVPHMP9Xs6TFXRbVdh02snYkDeCAXaQXjEhB4apPhBns5hyphenhyphenoyaaJKBWZsGi3yrU8Q6e27hi8XZXgKaATRkSEDhb3sf0-w12MjdnD6RdsYjesMgn3OyfdIsN010iqJupGSNKbdtxGPGVc9tpkzjbVQFyDHsv3402pbHJxrdw9Q/s1920/Header.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="1080" data-original-width="1920" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg_4fgrGUx6KT1MRICCn5YVPHMP9Xs6TFXRbVdh02snYkDeCAXaQXjEhB4apPhBns5hyphenhyphenoyaaJKBWZsGi3yrU8Q6e27hi8XZXgKaATRkSEDhb3sf0-w12MjdnD6RdsYjesMgn3OyfdIsN010iqJupGSNKbdtxGPGVc9tpkzjbVQFyDHsv3402pbHJxrdw9Q/w640-h360/Header.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Yes, I *splashed out*...</b></td></tr></tbody></table><br /><div style="text-align: left;"><br /><span style="text-align: justify;">Now that Nvidia have essentially completed their consumer RTX 40 series cards, it's time to look back at the releases and take stock of what's happened. </span></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">We've seen the now usual range of responses: from cynical and jaded gamers and games media, to acceptance from those who are coming to terms with the price to performance ratio Nvidia is now asking. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Bundled up in all of this has been, for me, the question of whether Nvida pushed too far, too fast? Let's take a look...</div><div style="text-align: left;"><span><a name='more'></a></span></div><div style="text-align: left;"><br /></div><div style="text-align: left;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiaD4gluhOuFUtPmNbKRFmXiZ4M4nnycz8v-DANPaOSG_RElzXWLO-GlLMkSaXuIumlZ77cqhiYyxzCXYVArX90_ivaHQcqBEXYI2oaFcToRd51b53NSHOvMyAWnC9dhsgHRZ1rC_w35lTR2loRY4yLO_etHFyv0cf3cydxhCc-wTrEFbciEhaZxPvEwrE/s1156/Generations_summary%20perf.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="1124" data-original-width="1156" height="622" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiaD4gluhOuFUtPmNbKRFmXiZ4M4nnycz8v-DANPaOSG_RElzXWLO-GlLMkSaXuIumlZ77cqhiYyxzCXYVArX90_ivaHQcqBEXYI2oaFcToRd51b53NSHOvMyAWnC9dhsgHRZ1rC_w35lTR2loRY4yLO_etHFyv0cf3cydxhCc-wTrEFbciEhaZxPvEwrE/w640-h622/Generations_summary%20perf.png" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Performance per generation relative to the 'top' card. Data taken from <a href="https://www.tomshardware.com/reviews/gpu-hierarchy,4388.html">Jarred Walton's excellent performance charts</a> over at Tom's Hardware...</b></td></tr></tbody></table><div style="text-align: left;"><br /></div><div style="text-align: left;"><br /></div><h3 style="text-align: left;"><span style="color: #274e13;">The Backlash...</span></h3><div style="text-align: left;"><br /></div><div style="text-align: justify;">Although there was a big shock regarding the prices of the RTX 20 series relative to their raster performance compared to the prior generation parts, there was a good portion of the user base that were excited about ray tracing and what it could bring to gaming. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">With the RTX 30 series, people were happy about the general uplift in performance for all three cards announced at launch - the 3070 giving the performance of the 2080 Ti for half the price was especially interesting, if not the best price we've ever seen historically for that class of card...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">However, with the RTX 40 series, I think we have <a href="https://www.youtube.com/watch?v=_vke26sQK_c">seen</a> <a href="https://www.youtube.com/watch?v=qFMSgzJlzFI&t=460s">almost</a> <a href="https://www.youtube.com/watch?v=bEr9AmEkImg">the</a> <i><a href="https://www.youtube.com/watch?v=1VacJ7rHNAw">entire</a></i> <a href="https://www.youtube.com/watch?v=mGARjRBJRX8">media </a><a href="https://youtu.be/F7TRFK3lCOQ">industry</a>* and online vocal gamers shocked and a little outraged over what Nvidia have presented to them... and I believe this reaction is entirely warranted. Especially given the cash grabs <a href="https://www.youtube.com/watch?v=WLk8xzePDg8&ab_channel=HardwareUnboxed">lower</a> <a href="https://www.youtube.com/watch?v=Y2b0MWGwK_U&ab_channel=GamersNexus">down</a> <a href="https://www.digitaltrends.com/computing/nvidia-rtx-4060-ti-review/">the</a> <a href="https://www.tomshardware.com/reviews/nvidia-geforce-rtx-4060-ti-16gb-review">stack</a>.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">I don't think we've seen such a consensus on how poor the relative value a new generation of GPUs is over the prior for quite a while**. However, what we might be seeing is that this might have been the time that Nvidia have pushed too far... their built-up goodwill spent cheaply.</div><div style="text-align: left;"><blockquote><b><span style="color: #274e13;"><i>*There were some <a href="https://youtu.be/_knH7jCwcUY">less critical looks</a> at the various announcements, drawing conclusions which, I believe, will change once the products are in reviewers' hands.</i></span></b></blockquote><p><i></i></p><blockquote><i><b><span style="color: #274e13;">**People really didn't like the RTX 2080 and 2080 Ti release but driver improvements have shown that those products are better than they were at release - indicating that they were probably released before they were truly done and dusted! The RTX 2080 is <a href="https://www.tomshardware.com/reviews/gpu-hierarchy,4388.html">now posting a 25% lead</a> over the 1080 Ti, whereas at launch it was <a href="https://www.youtube.com/watch?v=dLjQR0UFUd0&ab_channel=HardwareUnboxed">basically neck and neck</a>... Essentially, the RTX 20 series had improvements to the structure of each SM over the GTX 10 series but the RTX 40 series has no such improvements that could be expected to be optimised for over time in the software/driver!</span></b></i></blockquote><p></p></div><div style="text-align: justify;">A little while ago, <a href="https://hole-in-my-head.blogspot.com/2022/07/inflation-reality-check.html">I predicted</a> that this generation of cards would increase in cost by a tier +20-30%. That has come to pass with a couple of these cards: the RTX 4070 is 20% more expensive than the RTX 3070 and the 4070 Ti is 33% more than the prior generational equivalent - though cards with names matching those in the prior generation above and below those points scale in a random manner in either direction. I might say, if I was so-inclined, that I wasn't that far from the truth...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The problem here is that it isn't the case when you look at the actual relative performance and hardware on each product!! When you look at the actual hardware per SKU as a percentage of the '90 class card in each generation, the price increases by 50 - 100% per card tier under the 4090. Yes, it was all fun and games when people (including myself) said that the RTX 4070 should have been a 4060 Ti but it's worse than that on the technical front - most cards in the 40 series are <i style="font-weight: bold;">two</i> tiers down in performance whilst being a tier up in price.</div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgPoKbWhzF2qeYSfVfCEWPg0iwtL3DjodbjwcPKUXFe0bQBWzn6q552eJLATYlXil9Ii2FSFr8n220d1A5wU5kJFax805FIFrCSA9mYBwtT4UhW1Ozm3z82rifAYAfOhEYqTCcUXVLr1B0cmJQa4CfWoW8oatv8JTmN9iOZDC7dVjoaReKS53A9u5gomY4/s1152/Generations_shaders.PNG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="370" data-original-width="1152" height="206" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgPoKbWhzF2qeYSfVfCEWPg0iwtL3DjodbjwcPKUXFe0bQBWzn6q552eJLATYlXil9Ii2FSFr8n220d1A5wU5kJFax805FIFrCSA9mYBwtT4UhW1Ozm3z82rifAYAfOhEYqTCcUXVLr1B0cmJQa4CfWoW8oatv8JTmN9iOZDC7dVjoaReKS53A9u5gomY4/w640-h206/Generations_shaders.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Relative specs of each card to the 90 class in the generation...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Sure, <a href="https://hole-in-my-head.blogspot.com/2022/02/the-rate-of-advancement-in-gaming.html">I've been saying</a> that the increase in performance relative to the top card per gen has been increasing and I was expecting that to come with an associated increase in price as well... with the lower end cards stagnating to a terrible extent. However, the RTX 40 series release has completely upended that trend: we are in a generation where the top-end card is around 1.60x the 3090/3090 Ti in performance at 4K resolution but the "80 class" card (in the form of the 4080) has taken a HUGE dive to around the 70 Ti class level of performance and that hasn't been seen for the last few generations from Nvidia*.</div><div style="text-align: justify;"><i><b><span style="color: #274e13;"><blockquote>*I've gone back to the GTX 9 series in this analysis and the RTX 40 series is the worst performance uplift per generation when looking at the whole stack.</blockquote></span></b></i></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi3KVpf-EnU3W5S7ZQtV6A1M1ARQnJpWafyoFqWp7Ue_uh_Nt7J6lzciXQM8dndb9sXwPSYrmfzgibky74n37LCZKV6TwxRaHaHDtoJD4-QqPS5mt79yRn3WKgYUghHHL9GQiIvqxunqSq3si7CUELY3lcqEA8-E8YhmkdZLn2olDrTOdxpgJ9aCCwaJZ4/s1184/Class_summary%20perf.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="1011" data-original-width="1184" height="546" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi3KVpf-EnU3W5S7ZQtV6A1M1ARQnJpWafyoFqWp7Ue_uh_Nt7J6lzciXQM8dndb9sXwPSYrmfzgibky74n37LCZKV6TwxRaHaHDtoJD4-QqPS5mt79yRn3WKgYUghHHL9GQiIvqxunqSq3si7CUELY3lcqEA8-E8YhmkdZLn2olDrTOdxpgJ9aCCwaJZ4/w640-h546/Class_summary%20perf.png" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>The performance uplift per Nvidia defined class of card is the lowest this generation though it has been dropping each generation since the GTX 10 series in 2016...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Unfortunately, when you look at the RTX 40 generation in this manner (and many others have before me - I'm not really doing anything new at this point), we can see that Nvidia have "downgraded" the performance of each class of card below the RTX 4090 to one or two tiers below where they should be. It's only because of the amazing performance uplift and effciency improvement in power usage and architecture* we have any uplift to speak of at all!</div><div style="text-align: justify;"><i><b><span style="color: #274e13;"><blockquote>*That ~1 GHz improvement in clockspeed really appears to be doing wonders, along with that larger L2 cache!</blockquote></span></b></i></div><div style="text-align: justify;">What we <i>can</i> see is that if you purchase an expensive card, you really are not getting the best uplift in performance unless you shift up the resolution to 1440p or 4K. This is disappointing from my perspective because it was <a href="https://hole-in-my-head.blogspot.com/2021/08/the-relative-value-of-gpus-over-last-10.html">only two years ago</a> that I was of the opinion that 1080p gaming <i>should</i> be dying by now and all cards within a generation should be targeting 4K as a standard. Now, I guess that <i>technically</i> this is true with the RTX 40 series but we're really scraping by and those are average FPS numbers, meaning that the minimum fps expereinced by the player** is going to be below that half the time.</div><div style="text-align: justify;"><b><i><span style="color: #274e13;"><blockquote>**Beaing in mind that I mean the minimum of the fps metric and not the <a href="https://hole-in-my-head.blogspot.com/2023/03/we-need-to-talk-about-fps-metrics.html">incorrectly converted minimum frametime value</a>!</blockquote></span></i></b></div><div style="text-align: justify;">If we switch this generational performance analysis around and instead look at where each card "should be", with regards to resource tier, we see that this generation really could have been a good one, indeed! Yes, the RTX 4090 looks weak at both 1080p and 1440p but there are a couple of reasons for that:</div><div style="text-align: justify;"><ol><li>At 1080p the GPU will be heavily CPU bottlenecked, meaning that users of this SKU should be upgrading their platform for the next couple of generations and seeing performance uplifts.</li><li>The RTX 4090 doesn't scale as well with resources as it should do. What I mean by this is that looking at the number of shaders it should be performing another 20 - 30 % faster than it is in real-world scenarios. This means three things to me:</li></ol><div><ul><ul><li>It's power-limited.</li><li>It's frequency-limited.</li><li>It's voltage-limited.</li></ul></ul></div></div><div style="text-align: justify;">I've seen various commentators <a href="https://twitter.com/MeyerRants/status/1709322314909626743">doubting that the rumoured Blackwell GB102 die could possibly improve performance by any reasonable margin</a>* but the fact is that the RTX 4090 was potentially planned to use much more power and generate more heat (hence the over-specced heatsink assemblies on the finally launched product). Originally, everyone - seemingly including AMD since their originally projections didn't match the performance of the released product - thought that the RX 7900 XTX was going to perform <i>way</i> better than it actually did and so Nvidia prepared, with their partners, a top-end GPU that would wring every last little bit of performance out of the silicon, efficiency be damnned! At least, these are the rumours of the sequence of events...</div><div style="text-align: justify;"><b><i><span style="color: #274e13;"></span><blockquote><span style="color: #274e13;">*Just one example...</span></blockquote></i></b></div><div style="text-align: justify;">However, in today's world, we know that Nvidia scaled-back the performance targets of at least the 4080 and 4090 and so <a href="https://edgeup.asus.com/2023/proart-geforce-rtx-4080-and-rtx-4070-ti-graphics-cards/">the larger</a> <a href="https://wccftech.com/msi-geforce-rtx-4090-rtx-4080-gaming-slim-gpus-drop-the-weight-just-3-slots-thick/">coolers were</a> <a href="https://wccftech.com/msi-makes-geforce-rtx-40-gpus-slimmer-slim-lineup-coming-in-rtx-4090-4080-flavors/">never required</a>. Of course, it's also difficult to separate out the "heatsink inflation" caused by the AIBs from Nvidia requirements but given the fact that they routinely utilise coolers between products with different power limits and thermal output, it seems fair to say that it's a combination of both... </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhwTB_vRLE9YUGdw323ccSIWpYxqa6QQUXIrk2ucypeq31Lo0pmL3FJHqsmGCHqIocRpyNS4gsGYMktE7eHY4tuVQ9yro07qVJGsv5-X-rC_yDlq_hH_5QdIhl7XoeJqABTpf_hPqblEDiDXy3OZprjPVwV2-1Tk1pSJeblqZLkpHun7WUeOs-_u973gjo/s1185/Resource_summary%20perf.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="1185" data-original-width="1184" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhwTB_vRLE9YUGdw323ccSIWpYxqa6QQUXIrk2ucypeq31Lo0pmL3FJHqsmGCHqIocRpyNS4gsGYMktE7eHY4tuVQ9yro07qVJGsv5-X-rC_yDlq_hH_5QdIhl7XoeJqABTpf_hPqblEDiDXy3OZprjPVwV2-1Tk1pSJeblqZLkpHun7WUeOs-_u973gjo/w640-h640/Resource_summary%20perf.png" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Looking at the uplift we <i>could have had</i> if FP32 resource tier was respected this generation and it could have possibly been the best generation ever released for lower-end gamers... for higher-end gamers, the uplift is basically as expected.</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">The third act switcharoo...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Nvidia's behaviour in this aspect is not surprising: People have been saying for a long time that Moore's Law is dead or dying and that free performance deriving from manufacturing process improvements is decreasing with each generation of hardware to be released, while also increasing the cost to manufacture the chips in the first place.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">I've also separately noted several times that the best "value" purchases are either super high-end or super low-end graphics cards (I don't mean premium models, I mean the class of card). This is because a cheap card can be replaced often and you get relatively a good bang for the buck. The expensive card, while terrible "value" in the moment, will keep that value for longer and have more scaling and resources to throw at future game titles.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Now, I'm the epitome of "listen to what I say and don't watch what I do"... with the money I've spent on mid-range test systems, I could have just bought an RTX 4090 outright. But then I can't really do any analysis on a 4090-based system... and it's what <i>everyone and their dog</i> is presenting when they do performance analysis. So... yeah, there's that!</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">But here's the kicker that I think that everyone is missing here and I've not seen this mentioned a single time in any mainstream press or technical circles:</div><div style="text-align: justify;"><br /></div><div style="text-align: center;"><b><u><span style="color: #274e13;">Nvidia have set up the RTX 50 series to be amazing.</span></u></b></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Seriously, I'm not even joking!</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Unfortunately, I need to admit that Nvidia is potentially being incredibly canny here: they have taken the huge performance uplift with the Ada Lovelace designs and production silicon and parlayed that into products which still (mostly) provide a small-ish performance uplift over the prior generation whilst simultaneously resetting <i>which</i> silicon goes into which product. This helps them save money because they're now getting more usable chips per SKU* and each chip will cost less to produce - even going into the future.</div><div style="text-align: justify;"><b><i><span style="color: #274e13;"><blockquote>*Because smaller dies are more effectively harvested from the silicon wafers they are produced from...</blockquote></span></i></b></div><div style="text-align: justify;">Realistically, Nvidia don't have much more space to further decrease the size of the chips they assign to each class of GPU so, we are going to end up with the next generation providing the "normal" uplift in performance that we are used to seeing. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">In summary - Nvidia are geniuses: They have "reset" their consumer GPU price and performance expectations with the RTX 40 series, allowing them to have a typical generational uplift with the RTX 50 series whilst saving themselves a lot of money in the process.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">AMD have gone the other way - they've also moved to manufacturing smaller chips but they're focussing on combining the chips to make up their products. That adds expense and causes headaches in implementation in both hardware and software, which means that some of that cost saving is then lost.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">As a result, with a monolithic RTX 50 series, Nvidia are positioned to do incredibly well and, if by some miracle, AMD pulls a rabbit out of the hat in terms of performance, Nvidia can just switch back to the more performant silicon dies they would have traditionally assigned to each class of card, granting them a doubling of the performance they would give the consumer. They are in a win-win situation.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Very savvy!</div>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-7560610393342650347.post-83723691571394961342023-09-17T13:00:00.009+01:002023-09-18T18:41:49.720+01:00The problem with 'dumb' metrics (An argument against using GPU busy)...<div style="text-align: left;"><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgRYeY63x4z2ZoDt62WTmpvbFPrnnQnA28K-QVzzoTCLGx2RknnESmFkgXh-Qm5LPBi9u68YH2JkUSqSPSwQhoZ53qrDuIAUVYHW_m0BkC_XUgn-zU2HogIagK6zcvYbWQ6NqWtJz3pbIhrCRAIDs9dkyCUe5-xTSSYWWX4VZ5zqy3JTSnS9GlZUFTrFDQ/s1598/intel-arc-q3-23-update-slide-15.png" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="896" data-original-width="1598" height="358" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgRYeY63x4z2ZoDt62WTmpvbFPrnnQnA28K-QVzzoTCLGx2RknnESmFkgXh-Qm5LPBi9u68YH2JkUSqSPSwQhoZ53qrDuIAUVYHW_m0BkC_XUgn-zU2HogIagK6zcvYbWQ6NqWtJz3pbIhrCRAIDs9dkyCUe5-xTSSYWWX4VZ5zqy3JTSnS9GlZUFTrFDQ/w640-h358/intel-arc-q3-23-update-slide-15.png" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"></td></tr></tbody></table><br /><div style="text-align: justify;">With the recent release of the new updated beta version of Presentmon by Intel to include their new metric, GPU Busy, many people have become excited about the potential for it to have some sort of positive impact on the game review and user diagnostic testing landscapes.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">However, if you follow me on Twitter, you may have noticed how I've not been impressed by the execution surrounding this metric - something which I think many people might have missed in the hubbub for something new and meaningful in system performance assessments.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">This is the story of how dangerous undefined metrics can be in the wrong hands...<span><a name='more'></a></span></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">Between Two Ferns...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">At its core, the concept of introducing GPU Busy is to provide the user with an idea of the deviation between when a frame is presented and how much time the GPU is specifically working on that frame.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Let's put aside one thing first - I do not dislike the idea of GPU busy. Knowing how much time it takes an application to render a frame can be a very useful diagnostic tool which, for people unable to use or unwilling to learn the various profiling type tools, could provide some benefit.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">However, the caveat here is that this pure number does not give any sort of logistical clarity on the reason for that specific length of time consumed. Is the difference due to API handling, driver issues, a particular application inefficiency, or some other bottleneck within the I/O system?</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Seriously, which is it? GPU Busy doesn't know because GPU Busy is a 'dumb' metric - it is not informed by anything other than time and number of frames produced.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Thus, there is something intrinsically wrong with taking that blunt instrument and using it to infer the reasons for lengths of time in the render pipeline surrounding the actual GPU processing step. Worse, marketing it as some sort of diagnostic tool when it has very little diagnostic ability for the untrained and people who just want to run it on their newly-bought game to see how it performs, isn't great!</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">Other Concerns...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">There are some other concerns regarding GPU Busy as a metric for users to manage. First off is that it is thwarted by upscaling techniques, Frame Generation, and frame limiters. Two of those techniques are pretty obvious, upscaling and frame limiters will reduce GPU load, meaning that the GPU Active time will decrease. However, there is a confounding factor here - framerate should proportionately increase when using upscalers (Isn't that the point of them?!) so, the difference between framepresents and GPU Active time should be reduced.... but, as you can see below, it isn't always the case!</div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhf_000xGQ4TaDxEaGpMv_orXM5o2TW7qv2dJ2WaWWOLld5F-U9qv_NxVzpAaDMhDvmny3hcKaFOUp5WsusmpWjaHt_mBPYhcjnGbfjsZIU3NhTkbKsTepWaar4-ClCLF52QtGdLXOmaUA4j1Rh9-X1I55NQHy5Fh07Bui-av8MZCDB21SRtMdZY4U1je0/s444/Comparison.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="312" data-original-width="444" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhf_000xGQ4TaDxEaGpMv_orXM5o2TW7qv2dJ2WaWWOLld5F-U9qv_NxVzpAaDMhDvmny3hcKaFOUp5WsusmpWjaHt_mBPYhcjnGbfjsZIU3NhTkbKsTepWaar4-ClCLF52QtGdLXOmaUA4j1Rh9-X1I55NQHy5Fh07Bui-av8MZCDB21SRtMdZY4U1je0/s16000/Comparison.PNG" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Tests performed in Spider-man at very high quality settings and draw distances using an i5 12400 and RTX 4070...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The more experienced among you will immediately know the reason - upscaling technologies have a processing overhead and Spider-man is quite CPU bound in its design. So, combining those facts, we can see that as the application becomes more CPU bottlenecked, the difference between GPU Active and the frametime presents increases - which is the expected behaviour. However, at the same internal resolution (1080p native vs 4K DLSS Performance) the GPU is <i>less</i> busy. That doesn't make sense to me - and it's less busy by a good margin! Surely, the DLSS overhead should push that closer to the actual frametime calls?</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The question here then becomes... <i>when</i> is the application GPU-bound? At native 4K, it really is. But is it at DLSS Quality, Balanced, Performance? From the relative level of improvement in framerates, I would say it is essentially CPU-bound at DLSS Quality, due to the incrementally smaller improvements thereafter. But is that actually correct? The frametime does keep decreasing at each setting and the GPU becomes less and less utilised - there's an extra 7% performance (or ~5 fps) on the table between the Quality and Performance settings. So, maybe it isn't?</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Further confusing matters, Nvidia's Frame Generation doesn't rely on presents from the programme/API to deliver frames - it delivers one frame and then works on the next frame but delivers an interpolated frame before delivering the second frame. This adds <i>latency</i> but the GPU is <u style="font-style: italic; font-weight: bold;">STILL</u> working just as hard as it was before. But... somehow the GPU is no longer the bottleneck? </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">If you look at those numbers above, it seems likely that GPU Active is somehow dividing the "worktime" of the GPU by the number of frames issued into the display queue (or whatever you wish to call it) - i.e. at 1080p native, we get effectively 2 frames per CPU call. At 4K, it's a bit more of a challenge and so we only get a 30% improvement. The problem is that GPU Active <i>shouldn't</i> be working like this...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Of course, this is just guesswork - Unfortunately, in the same way that Frame Generation is a black box, so too is GPU Active to the average user (at least to me since I cannot understand the source code) and this can lead to misunderstanding and incorrect conclusions...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Next up, we have the logic of the metric.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEgSnK1j35VeyTMhWqqq4tVhmHywgABZZTRSTthB6Y1YSXqpKatJThdbF7UiQ5smeAW6gVItjw2tI2KDEHh3jXUNMrt4ogZ95TAVrSq3UZCNgjzGrxujhhvvqwYBl9BDWDwDWV91oIjy4KsdX945E97rAh2FPR-ATQtwzHLPDO9HuTQMV8wHHW-d3-ibHzQ" style="margin-left: auto; margin-right: auto;"><img alt="" data-original-height="207" data-original-width="754" height="88" src="https://blogger.googleusercontent.com/img/a/AVvXsEgSnK1j35VeyTMhWqqq4tVhmHywgABZZTRSTthB6Y1YSXqpKatJThdbF7UiQ5smeAW6gVItjw2tI2KDEHh3jXUNMrt4ogZ95TAVrSq3UZCNgjzGrxujhhvvqwYBl9BDWDwDWV91oIjy4KsdX945E97rAh2FPR-ATQtwzHLPDO9HuTQMV8wHHW-d3-ibHzQ" width="320" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>A conclusion based on testing <a href="https://www.techspot.com/article/2723-intel-presentmon/">by Techspot</a> (aka Hardware Unboxed)...</b></td></tr></tbody></table><br /><br /></div><div style="text-align: justify;">Hardware Unboxed, <a href="https://www.techspot.com/article/2723-intel-presentmon/">via Techspot</a> had a really good article going through the Presentmon beta application and GPU Busy metric... only, well... their explanation of when something is or is not CPU-bound was a little uncertain*: they confidently state that when the Frametime and GPU Busy metrics are equal then the GPU is the limiting factor at medium quality settings with FSR enabled. But with RT overdrive enabled**, the CPU is now the limiting factor... maybe. They appear to be not quite sure about that from the language they use. However, this conclusion comes when the difference between the frametime (17.8 ms) and GPU Busy (17.1 ms) is only a mere 0.7 ms difference!</div><div style="text-align: justify;"><b><i><span style="color: #274e13;"></span></i></b><blockquote><b><i><span style="color: #274e13;">*Most likely because without guidance, the user has to make up any and all rules regarding the interpretation of the metric!</span></i></b></blockquote><p><span style="color: #274e13;"><b><i></i></b></span></p><blockquote><span style="color: #274e13;"><b><i>**Please note that RT: Overdrive has Frame Generation enabled on the RTX 4070 Ti they were testing on... and bear in mind what we just discussed regarding how GPU Active potentially calculates itself.</i></b></span></blockquote>The problem here is that there is no guidance on what sort of difference to expect when an application is bottlenecked in any particular scenario. How MUCH of a difference is required or expected to be able to point to a conclusion either way? From my perspective, both of HUB's Cyberpunk examples are equivalent situations in terms of the hardware setup they have and thus no different conclusion can be made...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Look back at my Spider-man example: the difference between frame presents and GPU Active is 0 ms for the very certainly GPU-bound 4K native. But at an internal 1440p (DLSS Quality) is is now 0.2 ms. Is this "small, but permanent, difference" now indication of a CPU bottleneck? Are the framegen results CPU-bottlenecked? I would most likely reply - "I don't think so... but I don't know for sure". </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Thinking about it logically, framegen at 4K is interpolating frames from a GPU-bound environment - therefore, the framegen is most likely <i>still</i> GPU-bound - the underlying environment is not *fixed* through the enabling of the feature. Rendering at 1080p is a CPU-bound environment and I do not believe that the magic of framegen will actually alleviate that. However, <u style="font-weight: bold;">BOTH</u> results have a relatively large difference between frametime presents and the GPU active value. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Which, I believe, cannot be understood to be CPU-limited in both scenarios!</div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi0X7rO-EQYPYVUYvJjCwczeA9mdRYU9dYaM25ihqPYtiRlR0q9h07lJbfSl6DDrMhFfWlBkdDVHY01jbJScIHh1njjGZbCnLobb-yxsXs8_mrZB1DMfgKjk5Gv3c4qYKL2yz1Q9IOV8ADRBXtVlNJwMV3kvpqrEk_2JSa1Xah4AgcmDMmeNh6pxI0CVMs/s1075/Comparison%20framegen%20native.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="656" data-original-width="1075" height="390" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi0X7rO-EQYPYVUYvJjCwczeA9mdRYU9dYaM25ihqPYtiRlR0q9h07lJbfSl6DDrMhFfWlBkdDVHY01jbJScIHh1njjGZbCnLobb-yxsXs8_mrZB1DMfgKjk5Gv3c4qYKL2yz1Q9IOV8ADRBXtVlNJwMV3kvpqrEk_2JSa1Xah4AgcmDMmeNh6pxI0CVMs/w640-h390/Comparison%20framegen%20native.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Comparison of the benchmark runs in Spider-man between 1080p and 4K (native vs framegen)...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><div>Why is this such a problem? Because GPU Busy and the aspect of deciding how <i>much</i> difference is appreciable and meaningful is just like asking the tongue-in-cheek question of: "<b><i><u><span style="color: #274e13;"><a href="https://dictionary.cambridge.org/dictionary/english/how-long-is-a-piece-of-string">How long is a piece of string?</a></span></u></i></b>" </div></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Just as <a href="https://hole-in-my-head.blogspot.com/2023/03/we-need-to-talk-about-fps-metrics.html">my prior discussion on FPS metrics reporting</a> tried to highlight, a stutter due to a longer frametime is relative to the average frametime surrounding it. GamersNexus defined a stutter as any deviation above 8 ms between successive frames. However, this also ran into the problem of absolute relative nature - 8 ms from a 33.33 ms (30 fps dropping to 24 fps) average is not the same as going from 6.94 ms (144 fps - dropping to 67 fps)... This is one of the reasons I use 3x the standard deviation of the frametimes when I'm not using a static/locked display framerate.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">This point of view is backed up by Unwinder - <a href="https://forums.guru3d.com/threads/msi-ab-rtss-development-news-thread.412822/page-206#post-6159952">the person behind RTSS development</a>. They have decided to set a particular ratio limit to tell the user when they are or are not CPU/GPU limited. This, currently, is defaulted to GPU Active being 0.75x of the average framerate value. However, as they are happy to point out, this is purely a guesstimate but that a GPU limited state will be achieved when the ratio is as close to 1 as possible.</div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgmLbRB7mp9F71qNWmxHvddBNxlGpLohlIGZAvtWyNHL3WsUHAWlZYGbBA2Htf6Rm5X2v0QcWlKlALsq3Uw22tHMY5AykZqYFOKtREB1DWNqOKT_COlLSLxE-8QubYe_zjmBzjW8XwUDbZAU5Au40foJNp7aLkFwNBZZGjQMVcKu3l7Y1xJC9Bk4nrHlSY/s658/RTSS.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="658" data-original-width="555" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgmLbRB7mp9F71qNWmxHvddBNxlGpLohlIGZAvtWyNHL3WsUHAWlZYGbBA2Htf6Rm5X2v0QcWlKlALsq3Uw22tHMY5AykZqYFOKtREB1DWNqOKT_COlLSLxE-8QubYe_zjmBzjW8XwUDbZAU5Au40foJNp7aLkFwNBZZGjQMVcKu3l7Y1xJC9Bk4nrHlSY/s320/RTSS.PNG" width="270" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>RTSS adds a user-editable value to "define" when an application is or is not CPU/GPU limited...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The problem, here, is that I can prove this assertion wrong.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">When I was recently testing Starfield on two different platforms, I was able to get a GPU Busy difference of 0.4 ms on my AMD system (0.97x ratio), showing, in theory, that my GPU is the bottleneck. Unfortunately, I already have data showing that on my Intel system, which performs >5 fps faster in both minimum and average fps that the GPU <i>is also</i> the bottleneck, with a difference of 0.1 ms (0.99x ratio) - quite embarassing to find out my GPU is cheating on me!</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Am I actually GPU limited now? From the data, I'd say I'm close to it... but from <i><b><a href="https://www.tomshardware.com/features/starfield-pc-performance-how-much-gpu-do-you-need">other people's data</a></b></i> I'd have to say I am not! Tom's Hardware were getting 8 - 10 fps more than I am on an i9-13900K platform... and I find it hard to believe that performance is now <i style="font-weight: bold;">worse</i> after various patches...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">So, <u style="font-style: italic; font-weight: bold;">clearly</u> having a GPU Active time near to the frametime value does not provide the user with any indication on whether the application is actually GPU bound or not!! Especially when we begin to factor various factors like upscaling and frame generation into the equation and when we begin to look at factors relative to absolute values with respect to smaller numbers. (i.e. 0.75x 33.3 ms is a much larger value than 0.75x 16.6 ms)</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">What you can do, though, is see from the graph that there is more variation in the frametime test data - indicating that there is a CPU/system bottleneck to performance. Unfortunately, this data was available before the introduction of GPU Active! So the logic of even including this metric is a bit opaque to my mind.</div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgz5o-eroSWGCTwYKdHq9pYYnGIr6dXPGPrD34-Kq_0FnqixyqMM-Q3HHUduX7tIEKbmhcyN6CtcqDel4CPFqdXhQQ-uizkNsM6WPr4j4nYivb_8hY0JUW6mOC-InUFOArYTYmx3EshQb65mZ915SViOOMtR8SHPESTYbVVBQ5Jb1g4jlltIgQuNMWKXzs/s1074/CPU_comparison.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="325" data-original-width="1074" height="194" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgz5o-eroSWGCTwYKdHq9pYYnGIr6dXPGPrD34-Kq_0FnqixyqMM-Q3HHUduX7tIEKbmhcyN6CtcqDel4CPFqdXhQQ-uizkNsM6WPr4j4nYivb_8hY0JUW6mOC-InUFOArYTYmx3EshQb65mZ915SViOOMtR8SHPESTYbVVBQ5Jb1g4jlltIgQuNMWKXzs/w640-h194/CPU_comparison.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Frametime data from New Atlantis test run...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;">There is also another concern regarding <a href="https://www.hwinfo.com/forum/threads/logviewer-for-hwinfo-is-available.802/post-41475">the amount of data logging</a> happening which may affect the results on lower-end or bottlenecked systems. However, I can't speak to this particular worry...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">So, where do *I* think things are going wrong with the introduction of this metric?</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">How it Should Have Happened...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">A benchmarking tool, like GPU Busy needs to be used in a highly regulated environment where the user (or programme) <i>knows</i> what is being tested. i.e. GPU Busy should only be used in a known benchmark programme environment. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Part of this comes down to the fact that, as I pointed out above, there are too many unknowns involved in the use of GPU Busy - and each particular application and resolution and in-game settings, combined with the hardware being tested will define what the meaningful difference between frame presents and GPU Active time will be; In one application, with one set of hardware, a 0.2 ms difference will mean the system is CPU bottlenecked, in another, it might be a1 ms difference.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Added to this uncertainty, each CPU/GPU pairing (and the average framerate achieved during the benchmark) will also define what the meaningful difference is between the two numbers. Essentially: no two benchmarks can be compared, and no two hardware setups can be compared.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">That intrinsically limits the usefulness of the metric.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Further to this, aside from being used in a standardised environment, documentation, descriptions of functionality and specific workings should be provided (they were not when I checked) <i><b>and</b></i> more importantly, perhaps, the metric should be explained in terms of being able to understand its use and legibility with concrete examples!</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">As I pointed out above, nowhere is it defined at what point you become CPU or GPU limited except at the extremes of the potential range - but this, of course, feeds back into the point that GPU Busy is a 'dumb' metric that does not understand its environment!</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">How can a <1 ms difference in between presents and GPU Active cause any sort of analysis to be mulled upon? Upon what criteria is that based? Furthermore, how can the ratio of 0.75 between the two be defined as the arbitrating factor? Or, for that matter - any other ratio one might choose! What is the justification and the logic for those arbitrary cut-offs? How much visibility or understanding will the end-user have of these arbitrary decisions by integrated tool-makers?</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">As a result of all these issues and concerns, GPU Busy and its use for benchmarking or analytical use will remain a snakeoil product until more control is exerted over it. Results and conclusions deriving from this metric should not be trusted.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">GPU Busy is not fit for purpose in its purported use case.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">The End...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">There is one more facet to this fable of faux pas - undoubtedly, GPU Busy will result in many wrong online conclusions in discussions between users with many using the metric to reinforce their point, not realising they are standing on sand... GPU Active / "GPU Busy" requires much more analysis than simply looking at the frametime graph differences between that value and the frametime presents value - in fact, in many cases, as I've hopefully pointed out in this article, it can be misleading in the simplicity of the message being spun around its introduction to the public.</div></div>Unknownnoreply@blogger.com2tag:blogger.com,1999:blog-7560610393342650347.post-59698800092525990422023-09-10T11:48:00.004+01:002023-09-10T15:08:05.909+01:00Starfield Performance Analysis...<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhutL6X8Vp4Wa0qLhVeBBWD3a_DfybUfL0zWPIhMPHHSm2TzB_fK4upwqzWIEisdQrkASf8UCuNhKj5LwgOhWKwEvbSE4J4C9WNQDjkmRERf8kWk1ooFpZmoZ6P7pgHD8eYP4LpbGg5QPXLfWZdG4GshNafRxiCi79GyujtuX5pFTsoiE1jz7uy2BEHjkc/s1920/Header.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="1080" data-original-width="1920" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhutL6X8Vp4Wa0qLhVeBBWD3a_DfybUfL0zWPIhMPHHSm2TzB_fK4upwqzWIEisdQrkASf8UCuNhKj5LwgOhWKwEvbSE4J4C9WNQDjkmRERf8kWk1ooFpZmoZ6P7pgHD8eYP4LpbGg5QPXLfWZdG4GshNafRxiCi79GyujtuX5pFTsoiE1jz7uy2BEHjkc/w640-h360/Header.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"></td></tr></tbody></table><br /><div style="text-align: justify;">Starfield has been a much anticipated title for a number of years now but the hype, counter-hype and social media battles have been raging pretty strongly around this one since it was made an Xbox and PC exclusive. </div><div style="text-align: justify;">Normally, though, those concerns usually melt away shortly after launch when players can actually get their hands on the game<i> and just play</i>. In Starfield's case, this hasn't been quite the experience - there are many players who are struggling to run the game because it can be quite demanding of PC hardware. There are raging debates as to whether the problem lies with the developers, the engine, or the hardware manufacturers...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Since I had my savegame destroyed in Baldur's Gate 3 and didn't feel like repeating thirty hours of gametime I instead decided to switch over to Starfield for a change in pace. As a happy coincidence, I have a lot of familiarity with prior <a href="https://en.wikipedia.org/wiki/Bethesda_Game_Studios">Bethesda Game Studios</a> titles and a penchant for testing on various hardware configurations... and it just so happens that I have a new testing rig (mostly) up and running so it has turned out that Starfield is a prime target for the shakedown of this new testing capability...<span><a name='more'></a></span></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhXGrzF0FTkgopROoTXdbJT68C4vW6DSqHftdjxCCG5jY8lb_c8-Cc0itzbK9EFxXoEKQEJaw8QG3czDH9nBQnlPuRKEfDIVuoQcx7xjGlhtChiovpFUlmvrcYkN7gpjhsHxUNXMaNFC6d4bgSl0-l0-XdTPm0dYSdkB_P3PCPHsnKs8Z9nfOb3_VMU8wQ/s1097/Complaining.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="886" data-original-width="1097" height="516" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhXGrzF0FTkgopROoTXdbJT68C4vW6DSqHftdjxCCG5jY8lb_c8-Cc0itzbK9EFxXoEKQEJaw8QG3czDH9nBQnlPuRKEfDIVuoQcx7xjGlhtChiovpFUlmvrcYkN7gpjhsHxUNXMaNFC6d4bgSl0-l0-XdTPm0dYSdkB_P3PCPHsnKs8Z9nfOb3_VMU8wQ/w640-h516/Complaining.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>OC3D have <a href="https://overclock3d.net/reviews/software/starfield_pc_performance_review_and_optimisation_guide/8">a good review</a> of the performance limitations of the game, along with <a href="https://www.techspot.com/review/2731-starfield-gpu-benchmark/">Hardware Unboxed</a> and <a href="https://youtu.be/raf_Qo60Gi4?si=zJhke0DK-KyfOOKJ">Gamers</a> <a href="https://youtu.be/7JDbrWmlqMw?si=ILLk4-q-HYDmGrfC">Nexus</a>...</b></td></tr></tbody></table><div style="text-align: justify;"><br /></div><div style="text-align: left;"><h3 style="text-align: justify;"><span style="color: #274e13;">Graphics Performance...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">One thing I've been seeing is that people are complaining about how poorly the game runs on high-end hardware when using the highest settings, without upscaling, at (effectively) the highest resolution (aka 4K). I really do not see an issue with this - the game is technically quite challenging to run because of the way the engine/game is designed.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">On the one hand, you have the heavily CPU reliant physics and item tracking systems, along with the NPC AI demands when using crowds at high and ultra settings*, and on the other hand, you have low process RAM utilisation and a heavy reliance on data streaming from the storage** which will also heavily tax the CPU in individual moments due to the way that those calls and asset decompression is handled.</div><div style="text-align: justify;"></div><blockquote><div style="text-align: justify;"><b><i><span style="color: #274e13;">*Honestly, this is a game where I can say that the world feels lived-in, in terms of numbers of occupants...</span></i></b></div><div style="text-align: justify;"><b><i><span style="color: #274e13;"><br /></span></i></b></div><div style="text-align: justify;"><b><i><span style="color: #274e13;">**<a href="https://hole-in-my-head.blogspot.com/2023/01/yearly-directstorage-rant-part-3.html">Something I've railed against in the past</a>...</span></i></b></div></blockquote><div style="text-align: justify;"></div><div style="text-align: justify;">The way that the game is designed to load the data from the disk happens in relatively large batches, resulting in large frametime spikes*, causing dips and drops in measured average fps when this loading occurs. Unfortunately, as per my informal testing, it does not appear that this can be mitigated through faster storage, RAM or CPU (though I only have the two SKUs to test on)...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">In this sense, the person benchmarking this title needs to ensure that they are not crossing boundaries which result in streaming of data from the storage as this will negatively affect the benchmark result and, in effect, means that the benchmark is actually just testing the game engine's streaming solution rather than the CPU and GPU performance.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">This is not something I've seen mentioned in any technical review of the game and it potentially calls into question some of the results we've seen thrown around the internet because these performance dips due to loading are not consistent between runs.</div><div style="text-align: justify;"><b><i><blockquote><span style="color: #274e13;">*<a href="https://youtu.be/i1zR29mUHOg?si=vnO3WuxygGV7Uold&t=53">I have made an example video to show this effect in action</a>...(please note that there is a slight delay in the HWinfo monitoring of the read rate)</span></blockquote></i></b></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">Test Systems and Results...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Getting back to things, I've tested the game using the following systems:</div><div style="text-align: justify;"><ul><li><b><u>System 1</u></b></li><li>i5-12400</li><li>Gigabyte B660i Aorus Pro DDR4</li><li>16 GB DDR4 3800 CL15</li><li>SN770 2 TB</li></ul><div><ul><li><b><u>System 2</u></b></li><li>i5-12400f</li><li>Gigabyte B760 Gaming X AX</li><li>32 GB DDR5 6400 CL32</li><li>P5 Plus 2 TB</li></ul></div><div><ul><li><b><u>System 3</u></b></li><li>R5 5600X</li><li>MSI B450-A Pro Max</li><li>32 GB DDR4 3200 CL18</li><li>SN750 1 TB</li><li>Crucial P1 1 TB (game install drive)</li></ul><div>I also have tested using the following graphics cards:</div></div><div><ul><li>RTX 3070</li><li>RTX 4070</li><li>RX 6800</li></ul></div><div><br /></div></div><div style="text-align: justify;">The following sets of performance benchmarks are obtained from System 2 performed in New Atlantis in a section near the archives and Constellation's HQ where I was able to find a path which did not normally incur data loading from disk. This path includes the wooded area, since foliage is known to be challenging for GPUs to render.</div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiq-zkLPgCtWI8IB0b7SSUUaKZzk5LcF2e6AqZK5sitBslVl-z7qkV9U--dg4kmtEJEDOENiaGIYA7dL3B392m3nTw5po0LySNqEtGRXf0RMoKJNl1RpVAWSzJSZZSu1ansij0GqsxH68vW37qMiXjcIgQNDRFsyXLFTjahOC___d2W2RP0vRT760QjR4M/s770/Settings_UltraQ.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="388" data-original-width="770" height="322" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiq-zkLPgCtWI8IB0b7SSUUaKZzk5LcF2e6AqZK5sitBslVl-z7qkV9U--dg4kmtEJEDOENiaGIYA7dL3B392m3nTw5po0LySNqEtGRXf0RMoKJNl1RpVAWSzJSZZSu1ansij0GqsxH68vW37qMiXjcIgQNDRFsyXLFTjahOC___d2W2RP0vRT760QjR4M/w640-h322/Settings_UltraQ.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>The reason for the disparity in memory speed is that the BIOS flipped it back to stock JEDEC after the latest BIOS update and I didn't notice. However, I will address the potential effects of this later...</b></td></tr></tbody></table><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh-QWG2LnnbSRcNQ60WeOfSSoZSRDI7VvGQqvp8kvQfloyUIJBQUhskA3_SLZwOZDIDjzm5LZVGScLdvTmiTiT0S3P6zlZudBlV2oPGSIvNLjKrEI9hQ8E48q35g6iCSHHL0CHjXQpzJTzcXypeBqJW16AbG0SY2gHr4UHBGz0e8F1bc8GSES67tVUYHqM/s768/Settings_HighQ.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="385" data-original-width="768" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh-QWG2LnnbSRcNQ60WeOfSSoZSRDI7VvGQqvp8kvQfloyUIJBQUhskA3_SLZwOZDIDjzm5LZVGScLdvTmiTiT0S3P6zlZudBlV2oPGSIvNLjKrEI9hQ8E48q35g6iCSHHL0CHjXQpzJTzcXypeBqJW16AbG0SY2gHr4UHBGz0e8F1bc8GSES67tVUYHqM/w640-h320/Settings_HighQ.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"></td></tr></tbody></table><div style="text-align: left;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgUH54ELqEME0zLs8OZrPtf9yRevi9CCmnQm2aL5k96s_eJlr8bbJn-o9v7arsCTtfOznqTEFp2HyeQsToplFR9-w0OrhEUP97t2qkMyJ7zWxnsQI4_D39sRy0j_w8kazSgCqo8FG-tmlWWOSVSs_fzdiQTq148kbW3_xnQHJnPiJk_1EUd3BZ8pTlcwY4/s768/Settings_MediumQ.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="385" data-original-width="768" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgUH54ELqEME0zLs8OZrPtf9yRevi9CCmnQm2aL5k96s_eJlr8bbJn-o9v7arsCTtfOznqTEFp2HyeQsToplFR9-w0OrhEUP97t2qkMyJ7zWxnsQI4_D39sRy0j_w8kazSgCqo8FG-tmlWWOSVSs_fzdiQTq148kbW3_xnQHJnPiJk_1EUd3BZ8pTlcwY4/w640-h320/Settings_MediumQ.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"></td></tr></tbody></table><br /><div style="text-align: justify;">As has been pointed out elsewhere, AMD cards generally perform better than Nvidia's and, though the game does not fully utilise the VRAM quantity available on the card - in fact this is something which potentially feels like a hold-over from the consoles - both my 8 GB RTX 3070 the 12 GB RTX 4070 would utilise around 4.5 GB VRAM with 8 GB system RAM, while the 16 GB RX 6800 would utilise around 6 GB VRAM with 9 GB system RAM.</div><div style="text-align: justify;">This low utilisation of memory could explain some of the issues with framerate dips due to excess or unecessary streaming of data from the storage.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">One upside we can observe from this data is that changing quality settings really does have a good performance uplift in this game and, at least at native resolution without VRS or upscaling /dynamic resolution, we do not really see that much of a visual degradation for pushing these settings down a couple of notches.</div><div style="text-align: justify;"><br /></div><div><div style="text-align: justify;">Doing this takes the RTX 3070 from ~40 fps average at Ultra/High settings to ~60 fps average at all medium settings at 1080p and I feel like this is a very playable framerate. As a result, like many other outlets have done, I've come up with some quality settings of my own which I find <i>do not</i> negatively affect the performance from "all medium" quality settings but will enable a slightly better looking game.</div><div><br /><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiUQBrbmUiu-Wa_nQiJ5bkI18u17KXIYH68APl6bhflYOmlNznWKRAJyKUmSwarr6jHdB1EocUDuK7_0PL2cBOcKz36YNx-HbVfmAGIPfF7IAnrwrCBlj__GHDWerhhuUWdN1WD3FKBgYVnndPeUa1NlKngDYSfNPVXe8cd1acBkTznPzld4FGD64hnFgM/s827/Optimised%20settings.png" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="827" data-original-width="819" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiUQBrbmUiu-Wa_nQiJ5bkI18u17KXIYH68APl6bhflYOmlNznWKRAJyKUmSwarr6jHdB1EocUDuK7_0PL2cBOcKz36YNx-HbVfmAGIPfF7IAnrwrCBlj__GHDWerhhuUWdN1WD3FKBgYVnndPeUa1NlKngDYSfNPVXe8cd1acBkTznPzld4FGD64hnFgM/s320/Optimised%20settings.png" width="317" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>The optimised settings I determined to not have too much of an impact on performance...</b></td></tr></tbody></table><br /><div><div style="text-align: left;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiu1tADcM5OOKTDYYNbm0SXYp-qBhiyHpvATN2ouHFbfq7gIasQtoQ1CRF-qtB438zUhYrDEB_YtpDcenKXAYejgSI-5i6zW8s5IypkGaXXB4waCWhKXVXGPIdujtub2Y6ePrBUGEvHVvTMjavpyovBh4TyAzlizPUer_bog8gEYtB3eBAGAzgpFLn5TTM/s767/Settings_OptimisedQ.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="387" data-original-width="767" height="322" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiu1tADcM5OOKTDYYNbm0SXYp-qBhiyHpvATN2ouHFbfq7gIasQtoQ1CRF-qtB438zUhYrDEB_YtpDcenKXAYejgSI-5i6zW8s5IypkGaXXB4waCWhKXVXGPIdujtub2Y6ePrBUGEvHVvTMjavpyovBh4TyAzlizPUer_bog8gEYtB3eBAGAzgpFLn5TTM/w640-h322/Settings_OptimisedQ.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>No performance loss is noted for the 3070 and a <i>very slight</i> loss is noted for the other two - and is likely within run-to-run variation...</b></td></tr></tbody></table><br /><div style="text-align: left;"><br /></div><div style="text-align: left;">So, with the broad strokes out of the way, let's get back to that memory speed difference...<br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">What Affects Performance....?</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Given that I've mentioned above that the game is highly reliant on streaming of data from the storage, and that there is minimal data retained in system and video memory, you would think that things like CPU speed/power, PCIe speed, and memory speed would all be heavily tied to the general performance of this game. So, let's take a look!</div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg2MgLwJJwPhqOHzeug95Bzjr9FgFHJmgrgm96srt_HrenElMYcIxsfiHrww0mtZ6aNdJd0OEuGMvqZ7X9_xrWbTM88FamjU3LRKV2heQwEx7ARz-EbFQ_kEj-S-r0LZ_qmCp8M3FQE8SMMjx-NOHPcPhAu3Vdc22VhsEUAcf7TdTbioxAgUsBlh3SKRxw/s770/Effect_RAM.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="387" data-original-width="770" height="322" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg2MgLwJJwPhqOHzeug95Bzjr9FgFHJmgrgm96srt_HrenElMYcIxsfiHrww0mtZ6aNdJd0OEuGMvqZ7X9_xrWbTM88FamjU3LRKV2heQwEx7ARz-EbFQ_kEj-S-r0LZ_qmCp8M3FQE8SMMjx-NOHPcPhAu3Vdc22VhsEUAcf7TdTbioxAgUsBlh3SKRxw/w640-h322/Effect_RAM.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Captured on Systems 1 and 2...</b></td></tr></tbody></table><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj_5xcXBKqxLouacMPTrU6EGzvdsur-RGYhouh7RGRLUfeKaLUJepgXYb5DA2P1PoXv-4B_Gc3br-F97koTHAayJORQ32NwNzH8M3tkdc01yvYWvlaRxfSWllgWd2VORzW3GtYbz_sPPDd9FTg1926gsiygD8G7E0Uov5DxhFAgSY1n94JSbjhl01eFtiE/s648/Effect_RAM_DDR4.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="129" data-original-width="648" height="127" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj_5xcXBKqxLouacMPTrU6EGzvdsur-RGYhouh7RGRLUfeKaLUJepgXYb5DA2P1PoXv-4B_Gc3br-F97koTHAayJORQ32NwNzH8M3tkdc01yvYWvlaRxfSWllgWd2VORzW3GtYbz_sPPDd9FTg1926gsiygD8G7E0Uov5DxhFAgSY1n94JSbjhl01eFtiE/w640-h127/Effect_RAM_DDR4.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Captured on System 3...</b></td></tr></tbody></table><br /><div style="text-align: justify;">At high memory speeds, DDR4 3800 and DDR5 4800 and above, there is no noticeable differnce in performance in the game. I haven't had the time to go through and test the situations when there is "bulk" loading of data from the storage (my time is severely limited compared to professional outlets, after all) but <a href="https://youtu.be/ciOFwUBTs5s?si=LY5HQuLuRwByPKtF&t=1532">other outlets have shown a difference</a> but it is not clear whether that difference stems from the CPU or the memory subsystem - perhaps I'll explore this in a future blogpost!</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Looking at DDR4, there is a difference, mostly in the maximum noted frametimes, with a slight increase in minimum and average fps.</div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEihaBMQwvz9BGEYITRFVHnpDmeSWMaBDZIzrtURCzDG-rPFy1h8M_oz5ntofWsNuNexszSMlInHCTtLJpdR30Q9rkNGQ9Q2-Sf0wjBNDaj5EkznfdcALy4CS5OgSXpEZN1_AmrJc2y8vcD5Da24ojyFBfBHq_mrVnwLi9gFTfHfxAC0ry251nsSMWA36mo/s770/Effect_CPU.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="386" data-original-width="770" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEihaBMQwvz9BGEYITRFVHnpDmeSWMaBDZIzrtURCzDG-rPFy1h8M_oz5ntofWsNuNexszSMlInHCTtLJpdR30Q9rkNGQ9Q2-Sf0wjBNDaj5EkznfdcALy4CS5OgSXpEZN1_AmrJc2y8vcD5Da24ojyFBfBHq_mrVnwLi9gFTfHfxAC0ry251nsSMWA36mo/w640-h320/Effect_CPU.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"></td></tr></tbody></table><br /><div style="text-align: justify;">Moving onto the effect of the CPU, we can see a noticeable increase in performance between the 12400 and the 5600X, with the former providing a good 5 fps advantage in performance, along with a good reduction in maximum measured frametime. If I get around to that future blogpost I mentioned above, I'll likely explore this across all three test systems with the RTX 4070 (the RX 6800 will not fit in System 1!)...</div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh5sfMFRRWkUc_5biq8_fSCdRFAzGqWjpLC7A2-gWXDKcgiGnIeQ2A-yOkMoakKfBa0xowtNPYalk390vyAstMO_UsDRoqm-9VYHKB2ASsF0hl7SH6at0Afzq8npYg2t4xfdtDTql1iUEEkW4J68FLlrwuL0j0dI7Uwi5yxt0pyKHriZhAo1uPpKTzqcGg/s767/Effect_PCIe.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="387" data-original-width="767" height="322" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh5sfMFRRWkUc_5biq8_fSCdRFAzGqWjpLC7A2-gWXDKcgiGnIeQ2A-yOkMoakKfBa0xowtNPYalk390vyAstMO_UsDRoqm-9VYHKB2ASsF0hl7SH6at0Afzq8npYg2t4xfdtDTql1iUEEkW4J68FLlrwuL0j0dI7Uwi5yxt0pyKHriZhAo1uPpKTzqcGg/w640-h322/Effect_PCIe.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>It seems I am not completely immune to data streaming issues!</b></td></tr></tbody></table><br /><div style="text-align: justify;">Looking at the effect of PCIe bandwidth, I reduced both the CPU and PCH PCIe gen from 4 to 3 using System 2 and saw no real difference in performance between the two, except potentially in the maximum frametime which might indicate a bottleneck - this would need more testing to ascertain.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Finally, we have the potential effect of driver revisions on the performance - sometimes users don't update their drivers (I'm one of them).</div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEirchsghBmQE_kXNJZwdb3JI4xAEQWLtaROoKUiEzmCwaev20T-eyq6sFOyAozliBRp8Bbw75IPdF7HrO5oHuQuj2f-v6qIOZS-cXS2e34CF_dq4r7n-lP2R4f3OgglUjTKB45gsdQMWLIWd9eYMTb6DKeKAAnSUHFiYPWdwyPLseBobqslUnh1D5_iWk0/s771/Effect_AMD%20driver.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="386" data-original-width="771" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEirchsghBmQE_kXNJZwdb3JI4xAEQWLtaROoKUiEzmCwaev20T-eyq6sFOyAozliBRp8Bbw75IPdF7HrO5oHuQuj2f-v6qIOZS-cXS2e34CF_dq4r7n-lP2R4f3OgglUjTKB45gsdQMWLIWd9eYMTb6DKeKAAnSUHFiYPWdwyPLseBobqslUnh1D5_iWk0/w640-h320/Effect_AMD%20driver.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"></td></tr></tbody></table><br /><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhXxjA5Lv9WmvPHtAAoXZUzWpnMX8k3zJ7W5_6SSU7h7Mi4MNUCgz8VhG7wxFfMoZhwj7b4rvvlbbh7N14qVXMLIZIKiF7IJ4HHQuVgG7FBbzbS5rpHpm2hEBZTUvYpPOHadc2hmTA_nwEFRi5sKC6C1_k19yTueM0h7fHUyFIMlPQpu45r9-ptZZOMf00/s765/Effect_NV%20driver.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="387" data-original-width="765" height="324" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhXxjA5Lv9WmvPHtAAoXZUzWpnMX8k3zJ7W5_6SSU7h7Mi4MNUCgz8VhG7wxFfMoZhwj7b4rvvlbbh7N14qVXMLIZIKiF7IJ4HHQuVgG7FBbzbS5rpHpm2hEBZTUvYpPOHadc2hmTA_nwEFRi5sKC6C1_k19yTueM0h7fHUyFIMlPQpu45r9-ptZZOMf00/w640-h324/Effect_NV%20driver.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"></td></tr></tbody></table><br /><div style="text-align: justify;">My testing shows essentially no difference between any tested driver revisions - the AMD results are within margin of error and the Nvidia results are remarkably consistent (aside from maximum recorded frametime). There really is no reason to update the GPU driver to play this game as long as you have one of the recent versions.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">Potential Bugs...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">One thing I did want to test was Hardware Unboxed's claim* that Ultra shadow settings negatively impact performance on Nvidia's cards moreso than AMD's cards.</div><div style="text-align: justify;"><b><i><span style="color: #274e13;"><blockquote>*I wasn't able to find it in their bazillion videos they have put out addressing Starfield so can't link it...</blockquote></span></i></b></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEihZsk73Aom4VI3A0pGGtTM_tyY6US47omqCoaapt4YvIZLH5jZaJzrouIem3diHx2dYCNZADyrska2TMgj4ndMU9Svyjk3AXXO9sgFHO8QzxLw4saICgBWo0cihN7SD4EnFpb_N-q9K3uY2ZE5bPENyzgt-GRztgaRJ1a9SeCNxfGbdRXhxu9Sadx2h8Y/s770/Effect_ShadowQ.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="385" data-original-width="770" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEihZsk73Aom4VI3A0pGGtTM_tyY6US47omqCoaapt4YvIZLH5jZaJzrouIem3diHx2dYCNZADyrska2TMgj4ndMU9Svyjk3AXXO9sgFHO8QzxLw4saICgBWo0cihN7SD4EnFpb_N-q9K3uY2ZE5bPENyzgt-GRztgaRJ1a9SeCNxfGbdRXhxu9Sadx2h8Y/w640-h320/Effect_ShadowQ.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"></td></tr></tbody></table><br /><div style="text-align: justify;">After my testing, I can confirm this observation - I saw the average fps increase by around 30% for both Nvidia cards when reducing this setting from Ultra to High whereas I only saw a 12% increase for the AMD card. Additionally, the minimum fps increased by 50 - 60% on the Nvidia cards but only around 15% on the AMD card.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">To me, this is clearly a manifestation of a potential driver problem, as opposed to the game itself - i.e. the way the Nvidia driver handles this setting is broken. As such, it is not worth keeping at Ultra when using Nvidia hardware - though I've already addressed this in my optimised settings.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Additionally, <a href="https://youtu.be/ciOFwUBTs5s?si=LsMhGEURVCfX9n-V&t=1572">Alex over at Digital Foundry</a> has seen that having too many CPU cores and/or hyperthreading enabled will actually lessen performance by around 10%! </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Now, I have not had the chance to test this but it may help to explain some of my results compared to other outlets which are reporting <i>worse</i> numbers than I am at the same quality settings - though this could also be due to the aforementioned streaming dips and a more challenging GPU testing scenario (though, really, the differences are small!)...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">Conclusion...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">I've been enjoying my time in the game thus-far and have only encountered two bugs - neither of which were game-breaking, one was a slight inconvenience and the other may have been purposeful but felt like a bug...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">It's clear that Starfield requires decent hardware to get running well. However, the three systems I'm using are firmly in the mid-range territory of today's hardware generations. Are many gamers playing on older systems/hardware? Yes! However, I don't think that this should restrict developers from targeting "current" mid-range systems for their medium quality settings and a little tweaking of settings will yield pretty decent performance gains in this title, which is how it used to be in the past!</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">So, yes, Todd Howard may have been a little insensitive in the way he phrased his answer to that question posed to him in the introduction to this blogpost, but he also isn't wrong, either. A PC with an AMD R5 5500/R5 5600, 16 GB DDR4, a cheap 1 TB SSD and an ~RTX 3070-equivalent card (e.g. an RX 7600 or RX 6700) will let you play at 60 fps at 1080p or at a higher resolution with upscaling.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">I don't think that's a huge ask for a new game in late 2023!</div></div></div></div></div>Unknownnoreply@blogger.com2tag:blogger.com,1999:blog-7560610393342650347.post-36750726181385883252023-08-02T17:35:00.006+01:002023-08-03T16:21:26.446+01:00What went wrong with RDNA 3...?<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjuMC9GMz-zkU6dJ2FuB8mItmDS1YT1P6nNhZhg7YKRaTZOa4c4TGw-E1JH3ZcPiwDPyg2PspVYYYHUKkTTHRd3NVKiu8cEtjJMHno3PjN_i8jC1mNVq_QhwVEstix-N5x_M5_7EEA_xepwTcyV8py9BePIxvzHUDzz-EpDh_J1oOlGfa2A_bEPDyu39Wg/s1866/Die%20block%20diagram.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="1050" data-original-width="1866" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjuMC9GMz-zkU6dJ2FuB8mItmDS1YT1P6nNhZhg7YKRaTZOa4c4TGw-E1JH3ZcPiwDPyg2PspVYYYHUKkTTHRd3NVKiu8cEtjJMHno3PjN_i8jC1mNVq_QhwVEstix-N5x_M5_7EEA_xepwTcyV8py9BePIxvzHUDzz-EpDh_J1oOlGfa2A_bEPDyu39Wg/w640-h360/Die%20block%20diagram.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"></td></tr></tbody></table><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Much has <a href="https://www.digitaltrends.com/computing/amd-radeon-rx-7900-xtx-xt-are-unfinished/">been made</a> about <a href="https://www.notebookcheck.net/Dismal-AMD-RDNA-3-refresh-rumor-suggests-all-RDNA-3-RX-7000-SKUs-have-been-canned.684174.0.html">the performance</a>, or <a href="https://youtu.be/eJ6tD7CvrJc">lack thereof</a>, in <a href="https://www.tomshardware.com/news/amd-addresses-controversy-rdna-3-shader-pre-fetching-works-fine">the RDNA 3</a> <a href="https://www.youtube.com/watch?v=eD1YCbzB6mE&ab_channel=Moore%27sLawIsDead">architecture</a>. Before launch, <a href="https://hole-in-my-head.blogspot.com/2022/05/analyse-this-dissecting-new-rdna-3.html">rumours persisted of 3+ GHz core clocks</a>, and I had tried to make sense of them at the time - to no avail, leading to predictions that were around 2x the performance of the RX 6900 XT. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Additionally, people (aka leakers) were <i>really</i> misunderstanding the doubling of FP32 units in each workgroup processor and so were counting more than actually existed in terms of real throughput performance and, I think it's safe to say that this decision from AMD was a disaster from a consumer standpoint. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">But is that really the reason RDNA 3 has been less performant than expected? Let's take a look!<span><a name='more'></a></span></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">I digress...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Now, this is the sort of post I tended to engage in back in 2019/2020 when looking semi-logically at all the information out there in order to make an uninformed opinion about how things might bear out in the market. This post is, as those were, an interesting and enteraining look at what reality presents to us - and may not reflect that reality because we have limited and often incomplete information.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">With that noted, let's actually delve in...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgC-DoXdS2p-uKkF0ZfBDtdf8oEoxD5qSJhz1rGSfTf1EcZX6hDp4MFOQIaCw7uARAOWZ4VOTWn0qkEG_biUb_iDNrhJ_w-wpcsDHTyhedzqT0fli-FRq_Qv5-r8OwGBO9hec4CZ2E4TX4RWCGkLQCqh-vvlr5CSdMLr9SQRndaUHiohOopBOxJz3gi1Fs/s553/7600_vs_6600%20XT.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="378" data-original-width="553" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgC-DoXdS2p-uKkF0ZfBDtdf8oEoxD5qSJhz1rGSfTf1EcZX6hDp4MFOQIaCw7uARAOWZ4VOTWn0qkEG_biUb_iDNrhJ_w-wpcsDHTyhedzqT0fli-FRq_Qv5-r8OwGBO9hec4CZ2E4TX4RWCGkLQCqh-vvlr5CSdMLr9SQRndaUHiohOopBOxJz3gi1Fs/s16000/7600_vs_6600%20XT.PNG" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>This looks familiar...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">The King is dead, long live the King!</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Let's face it: RDNA 3 <i>has</i> been a let-down, and its story has been a strange one. Fitted with extra SIMD cores in each compute unit and laden with experimental chiplet designs, we've seen it all, so far... except for those elusive <a href="https://www.notebookcheck.net/AMD-Radeon-cards-with-3D-V-Cache-may-become-a-reality-as-Navi-31-GPU-found-to-integrate-possible-connection-site-for-3D-cache.688185.0.html">rumoured 3D stacked caches</a> on each MCD.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">What we have also seen in comparisons between the <a href="https://www.techpowerup.com/gpu-specs/radeon-rx-7600.c4153">RX 7600</a> and <a href="https://www.techpowerup.com/gpu-specs/radeon-rx-6650-xt.c3898">RX 6650 XT</a>; whose monolithic die configurations, core clocks, and memory systems are essentially identical (aside from the the double WGP SIMD32 layout); there is no performance increase per unit of architecture between RDNA 2 and RDNA 3. This would imply that the fault lies at the feet of the inability to capitalise on those extra cores in each compute unit.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">All that extra silicon, wasted, for no benefit!</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Well, that's probably a large portion of the lack of performance uplift - AMD have been unable to get those extra resources working in a proper fashion - for whatever reason - and the inability for RDNA3 to be pushed to higher core clocks than the prior generation has also been a major drain on the actual performance of the released products.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The other side of the story is a bit more complicated. You see, top-end RDNA 3 is not like bottom-end. The chiplet approach does something contrary to what the recent trends of silicon architecture have been moving towards: it moves data away from the compute resources.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Seriously, what is the major story of computing in the last five years, for gaming? Yes, that's right - keeping more data closer to the compute resources (CPU cores, GPU cores, etc.) improves a large proportion of game-related applications significantly! Hell, not only have the Ryzen 7 5800X3D, 7800X3D, and 7950X3D CPUs all been in high demand, along with the limited edition Ryzen 5 5600X3D but we have also seen the evolution of the Zen ecosystem from split CCXs on a CCD to a combined cache hierarchy in the <a href="https://www.techspot.com/news/83347-zen-3-rumored-flaunting-monumental-ipc-gains-early.html">Zen 1 to Zen 2 transition</a>.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">These are increasingly performant parts for their energy consumption - thus we can conclude that <span style="color: #274e13;"><i><u><b>latency and data locality matters</b></u></i><b><i><u>!</u></i></b></span></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjQF-JQZ_pvrt0kOvUefoFPeBmWuqRN29rJ2w5sQqeyjJ_tJQ28V0Osi1MbI-TEFkRrlnoe9LCaLywuF0RHzK51dDGYfpSvbiIL-bz_N2ENOHoCTmRFwzZVPb1PBZgknCR2tixfVvtw-0wAzYCrmd_3KwHQ-NFAX0aXb73FMKW-0jhu22WWC1NzQjbscrQ/s1866/Fanout%20bandwidth%203.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="1050" data-original-width="1866" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjQF-JQZ_pvrt0kOvUefoFPeBmWuqRN29rJ2w5sQqeyjJ_tJQ28V0Osi1MbI-TEFkRrlnoe9LCaLywuF0RHzK51dDGYfpSvbiIL-bz_N2ENOHoCTmRFwzZVPb1PBZgknCR2tixfVvtw-0wAzYCrmd_3KwHQ-NFAX0aXb73FMKW-0jhu22WWC1NzQjbscrQ/w640-h360/Fanout%20bandwidth%203.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Latency, schmaetency! We'll increase the clock speed to reduce the increase but... oh... we didn't manage to do that! Our bad!</b></td></tr></tbody></table><br /><div style="text-align: justify;">Looking over at that GPU side of the equation has us eyeing the RX 6000 series, which also brought increased cache onto the die (compared to RDNA 1) reducing the latency to, and energy access to that data. RDNA 2 was an amazing architecture...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">So, how did AMD plan out top-end RDNA 3? They undid all that work!</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Depending on how they've arranged things, there are now two pools of memory that are required to be synchronised between themselves, with data within each pool striped across them. The latency to each pool is increased and thus the amount of residency "checking" is likely increased as a result since the memory controllers for the sceond pool are physically separated from the scheduler on the main die, increasing energy cost and latency further due to 'communication' traffic. What's worse is that AMD was (as noted above) banking on RDNA3 clocking <i>much</i> faster than RDNA2 in order to counter the effect of latency - and I'm pretty sure this was <u style="font-style: italic;">only</u> speaking about the latency of missed requests - not including the larger data management overhead... </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">At any rate, <a href="https://www.techpowerup.com/review/asrock-radeon-rx-7900-xtx-taichi-white/35.html">RDNA 3</a> doesn't clock faster than even base <a href="https://www.techpowerup.com/review/asrock-radeon-rx-6900-xt-oc-formula/32.html">RDNA 2</a>. 100 MHz does not a difference make! However, even RDNA 2 can <a href="https://www.techpowerup.com/review/sapphire-radeon-rx-6950-xt-nitro-pure/38.html">almost reach 3 GHz</a> with a bit of a push...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Additionally to this, the L2 cache on-die is laughably small, meaning that data <i>has to be </i>pushed off onto the MCDs more often than I would think is ideal... 6 MB will hold sweet f-A! (As we like to say where I'm from!) with today's more memory intensive game titles and that will cause issues with occupancy over the entire GCD.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgAOS_Srk4dibF6XjoTLTQoTeD7ITmzjXQ1BjepiUzHQxWIKd8KnJfcxBGFlgQahsajZaOf7aYJ6O0mXPTRit_tK3C1rHIs6_8HKM-QvcCa4N1Gc4byeZAPoS6BF5t1P_FviH-C0RuK_DYPRlci37bXO3dBS__VqPE8AE2kQAbpWF24Br91gHlUe7YtCUo/s1866/Fanout%20bandwidth.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="1050" data-original-width="1866" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgAOS_Srk4dibF6XjoTLTQoTeD7ITmzjXQ1BjepiUzHQxWIKd8KnJfcxBGFlgQahsajZaOf7aYJ6O0mXPTRit_tK3C1rHIs6_8HKM-QvcCa4N1Gc4byeZAPoS6BF5t1P_FviH-C0RuK_DYPRlci37bXO3dBS__VqPE8AE2kQAbpWF24Br91gHlUe7YtCUo/w640-h360/Fanout%20bandwidth.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Industry leading... but latency is king for gaming applications, not like the compute heavy cloud/AI industries...</b></td></tr></tbody></table><br /><div style="text-align: justify;">Sure, AMD extols the bandwidth of the interconnects but that extra layer of complexity is most probably hurting the potential performance of their top-end cards by allowing for more cases where the occupancy of compute resources may be low. Just look at the ratio of L2 and L3 cache between RDNA 2 and RDNA 3:</div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhbuKf0Z9x1t-MYinFGZ7rOFj8y07XpwJB9T911QquiLa3B16dy9AjVIu8B-qcvgTEzmgLWjLi_O1eY5cLrOdvTIfEIK9GW5_Y2KX_sc0CbPRFcT-dSHtpa0geJGGPLmVsz0YvDfXjxin8_9t24x5ezX7GaM54OT3GmlGl6abcZakuF8UqJZihQb0ScxL8/s798/Core_cache%20comparison.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="152" data-original-width="798" height="122" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhbuKf0Z9x1t-MYinFGZ7rOFj8y07XpwJB9T911QquiLa3B16dy9AjVIu8B-qcvgTEzmgLWjLi_O1eY5cLrOdvTIfEIK9GW5_Y2KX_sc0CbPRFcT-dSHtpa0geJGGPLmVsz0YvDfXjxin8_9t24x5ezX7GaM54OT3GmlGl6abcZakuF8UqJZihQb0ScxL8/w640-h122/Core_cache%20comparison.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Having a larger number of compute resources, RDNA 3 should have a higher ratio than RDNA 2, especially given the higher latency related to being off-die... </b></td></tr></tbody></table><br /><div style="text-align: justify;">Sure, Navi 21 had less L2 cache for its workloads compared with RDNA3 by a <i>very small</i> margin but the leap to L3 is much larger and the L3 capacity is also much smaller meaning that the likelihood of a cache miss in both L2 and L3 being MUCH larger than in the prior generation. That most likely leads to precious cycles of compute time being wasted... This is because number of times there will be a cache miss from L2 in Navi 31 will be higher than Navi 21... and the number of times there will be a miss in L3 will be a lot higher because of the ratio of cache to compute resources (Navi 31 has 1.2x the resouces of Navi 21*).</div><div style="text-align: justify;"><b><i><blockquote><span style="color: #274e13;">*Not counting the double-pumped SIMDs.</span></blockquote></i></b></div><div style="text-align: justify;">This could theoretically result in situations where the compute units cannot be utilised and the scheduler is unable to use the whole die effectively.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">And it is this fact, combined with the clock frequency issues of RDNA 3 that I believe is holding the architecture back in gaming terms. In other workloads, it's not such an issue... but then wasn't that <a href="https://www.amd.com/en/technologies/cdna">what CDNA was for</a>?</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg7uGkHHmw83-_4OQCRwGVwSI-Y0d7h-qy7Z8TsVp9y1DFN9mHQXdR17Xfd95WkZW2cy7LDpBI_lsSWfpOUyqTxfhlvITsRLri_YG8BvB_jjzjmDz5Mro-Mp4_9vPAi5n8zvnNVNf0BDxQ-JB8aYf0p4RTrPbr6HcV9NNMdb37q04jn3ROeaM23ZR-CmNc/s403/Current%20RDNA3.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="182" data-original-width="403" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg7uGkHHmw83-_4OQCRwGVwSI-Y0d7h-qy7Z8TsVp9y1DFN9mHQXdR17Xfd95WkZW2cy7LDpBI_lsSWfpOUyqTxfhlvITsRLri_YG8BvB_jjzjmDz5Mro-Mp4_9vPAi5n8zvnNVNf0BDxQ-JB8aYf0p4RTrPbr6HcV9NNMdb37q04jn3ROeaM23ZR-CmNc/s16000/Current%20RDNA3.PNG" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>A crude rendition of the Navi 31 layout as it currently stands...</b></td></tr></tbody></table><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">Getting back on Track...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">As I said, this is all conjecture and inference - based on the limited information available to me. However, if these facets of the architecture are the reasons its being held back, well there are some remedies that I think AMD can do to fix the situation.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">First up is that we need to increase data residency and locality: bandwidth does not make up for exponentially increased latency - especially when there are specific situations that lead to edge case <a href="https://www.techtarget.com/searchstorage/definition/race-condition">race conditions</a>.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Secondly, we would need to reduce latency to data and the latency in data management. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Sounds simple, right?!</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgVKMM2pSYl6d0BYIVahMgyZ08fe_pPYfVk8GjJfmAZC3_Nw-K2Feo9IV9ia-NkVi2j2RARpCNHa3zoOp-Q62QbKuzlQFtueOlSaXQn8tlNQKERxnhA8mCteYkKHaQMMhrENb45x-vZ2PvE-sJMLiMcKLdEjy3mG4vH95jK-0WTssjT-TDvGRqtXtuhvps/s401/Fixed%20RDNA3.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="219" data-original-width="401" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgVKMM2pSYl6d0BYIVahMgyZ08fe_pPYfVk8GjJfmAZC3_Nw-K2Feo9IV9ia-NkVi2j2RARpCNHa3zoOp-Q62QbKuzlQFtueOlSaXQn8tlNQKERxnhA8mCteYkKHaQMMhrENb45x-vZ2PvE-sJMLiMcKLdEjy3mG4vH95jK-0WTssjT-TDvGRqtXtuhvps/s16000/Fixed%20RDNA3.PNG" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>My adjusted conception of RDNA 3 - or its successor...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Now, I'm going to <b><i><u>heavily</u></i></b> caveat this speculation with the fact that I'm <u style="font-weight: bold;">not</u> an expert or even amateur in this field - I would rate myself as, at best, a poor layman but (and this is a big 'but') if I am understanding the concept and logic correctly, this is what I would expect AMD to do to fix RDNA3 or to move forward into the next architecture...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><ul><li>To increase data locality and residency, there needs to be a larger L2 cache on the GCD. I would posit 32 MB.</li></ul><ul><li>To reduce latency further, and cost to produce (in terms of interposer complexity and size), as well as improving data management, I would combine two MCDs into a single unit. So, 32 MB (combined) Infinity cache, and 8x 16bit GDDR6 memory controllers... per MCD.</li></ul><div><br /></div><div>That's it! </div><div><br /></div><div>Sounds ludicrously simple and it probably is stupidly wrong. However, there would be reduced data transfers between all dies, reducing power consumption. There would be better data residency in the three layered memory buffers - reduced latency between buffers <i>and</i> as a bonus, ease between choosing 3x MCD for 24 GB, 2x MCD for 16 GB and 1x MCD for 8 GB configurations... which would also reduce packaging costs because there's fewer dies to be futzing around with on the interposer. The L2 cache could scale better with core size (Navi x1, x2, x3, etc) while off-die L3 would also scale with both cache and memory configs down the entire stack.</div><div><br /></div><div><br /></div><h3><span style="color: #274e13;">Conclusion...</span></h3></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">A few years ago, I asked an AMD representative if it made sense to scale RDNA2 down the entire stack and they replied that it did. If I had the chance, I would ask the same person whether it still made sense for RDNA3 - given the performance indifference between the two architectures.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">AMD have hard questions lying ahead of them and, quite frankly, I wouldn't want to be in their shoes because they keep giving the wrong answers, despite what appears to be their best efforts. RDNA2 is like the time I accidentally answered all questions in an exam where you were meant to only answer 3 or 4 (and aced it!) whereas RDNA3 is like the time when I went to an exam and, despite preparations, all knowledge left my head and I spent the entire time sweating and almost failed.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">I hope that RDNA4 will be at least somewhere inbetween...</div>Unknownnoreply@blogger.com5tag:blogger.com,1999:blog-7560610393342650347.post-66524116812520215262023-07-22T20:03:00.006+01:002023-07-22T20:08:18.931+01:00Let's Talk about System Requirements...<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjxia1BXDE_oDapqlNUs-r2R3mq8VcR2cRmtUqvIQHsixaZwQypwD-SMz8KozWdCO2gn0QjXfNi4eXdzu-Z96tbtf7X5FOiCTIHePmELb9BetcCzyFQzdSbZZAnqvpvMJ36BILQgQ3yUG4yucGw4fjMAJPPTcDRwnXP3xC6cW-wv-01HYBw2LNOvilyRhE/s1600/53052590507_573f74be02_h.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="900" data-original-width="1600" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjxia1BXDE_oDapqlNUs-r2R3mq8VcR2cRmtUqvIQHsixaZwQypwD-SMz8KozWdCO2gn0QjXfNi4eXdzu-Z96tbtf7X5FOiCTIHePmELb9BetcCzyFQzdSbZZAnqvpvMJ36BILQgQ3yUG4yucGw4fjMAJPPTcDRwnXP3xC6cW-wv-01HYBw2LNOvilyRhE/w640-h360/53052590507_573f74be02_h.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Everyone and their mother, cousin, friend, associate, partner, and long-lost relation has had strong opinions on <b><i><span style="color: #274e13;"><u>System Requirements</u></span></i></b>.. Hell, I'm right there with them! We all want detailed listings of settings combined with PC hardware... However, as a group, I think we are <i>a bit</i> too happy when developers give us more than nothing to work with.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">So, I want to set the record straight on what are good system requirements for games... and I hope that some developers see this and take the advice on board.<span><a name='more'></a></span></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">Oh, Fantasy free me...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Let's get this out of the way - the Ratchet and Clank system requirements "sheet" is pretty much one of the best styles of sheets that we have historically seen in the industry. That's already above and beyond what is considered acceptable due to the relative rarity of such a detailed indication from the developers and publishers on their game.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">We need these tiered listings: Minimum, Recommended, High and... ugh, "Amazing"(?) Ray tracing, and "Ultimate" Ray tracing...(?)</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Okay, maybe the last two are a bit much. Personally, I think the Dying Light 2 system requirements were a better shout:</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhIXO6tAbMXaytMShSzAhH2NDPmBUApsy9JIe1SMGkQ3ibA9WTK4uAgPh9DiFhRAgIbWk6i7OK8XKKjc12PmR-SZteupTBreifnLZl6S9M0EFsmdCIP5xE1QpLZEMv9l_FB_cdM8UJHEPhionXq5d2wBzdCYMF1qc-7kQf-WL1K3_VXO7GN_k1WCIJK9gc/s1280/3919103-system.png" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="720" data-original-width="1280" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhIXO6tAbMXaytMShSzAhH2NDPmBUApsy9JIe1SMGkQ3ibA9WTK4uAgPh9DiFhRAgIbWk6i7OK8XKKjc12PmR-SZteupTBreifnLZl6S9M0EFsmdCIP5xE1QpLZEMv9l_FB_cdM8UJHEPhionXq5d2wBzdCYMF1qc-7kQf-WL1K3_VXO7GN_k1WCIJK9gc/w640-h360/3919103-system.png" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Simple: RT off and RT on - minimum and recommended...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;">You can't get more simple than that - What is required for the minimum acceptable experience with and without raytracing enabled. The contrast between this and other requirements also highlights the subjective nature of the system requirements provided by many developers:</div><blockquote><div style="text-align: justify;"><span style="color: #274e13;">What <i style="font-weight: bold;">IS</i> high? What <i style="font-weight: bold;">IS</i> ultra? Aside from the play on words (because Spider-man); What <i style="font-weight: bold;">IS</i> Amazing?!</span></div></blockquote><div style="text-align: justify;">People were <i>happy</i> with the Ratchet system requirements above but they are just as subjective as any other, considerably worse, requirements given out by development houses in prior years. The only thing it has going for it is that it outlines its biases.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">So, after all this barely literate criticism, what <i>would</i> be good system requirements (IMHO, of course)...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">Spaced out on Sensation...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Let's face it. There are two reasons you even bother checking out any game's system requirements listings:</div><div style="text-align: justify;"><ol><li>You're curious about what it takes to run the game in comparison to other games.</li><li>You're not sure how the game will run on your PC system.</li></ol></div><div style="text-align: justify;">That's it! That's all that there is to know about people who are looking at these things: are you morbidly, slightly obsessively curious... or are you wanting to play this game but thinking that you can't?</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Seriosuly, that's why Steam just has minimum and recommended requirements - that's all it takes! </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">So, all these multi-layered fancy-pancy requirement listings are just hardware masturbation. Yep, that's all they are - it's an exercise in performing auto-fallatio as a consumer to see whether you meet whatever standard you present yourself as associating with.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Quite frankly, that's not really that valuable to the community.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">So, let me tell you what actually good requirements would look like:</div><div style="text-align: justify;"><ol><li>Resolution-based</li><li>Quality setting-based</li><li>Er, oh, that's it. Quite simple wasn't it!</li></ol></div><div style="text-align: justify;">But somehow no one appears to be able to handle this! Even the much-vaunted Ratchet and Clank: Rift Apart. Minimum is at 720p with low settings, Recommended is at 1080p with medium settings, High is at 1440p high settings... I... how is this useful? It's easy to scale hardware performance between the same resolution at different quality settings. Many humans do this all the time. Similarly, it't easy to scale the expected performance of differing resolutions at the same quality setting.. but to scale both at the same time? It leaves the consumer without a static axis - there are too many variables to take into account in the equation!!</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Here's my CRAZY idea: What if we ditched qualitative reasoning in presenting these system requirements?</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">This is what that would look like:</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><ul><li>Minimum Requirements: Hardware to run the game at a resolution, low settings, at a specific fps. (This was already fine!)</li><li>1080p Medium: Hardware to run the game at 1080p medium, at a specific fps.</li><li>1440p Medium: Hardware to run the game at 1440p medium, at a specific fps.</li><li>4K Medium: Hardware to run the game at 4K medium, at a specific fps.</li><li>Max Requirements: Hardware to run the game maxed-out!, at a specific fps.</li></ul></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">This is all we need! Sure, throw in advice about upscaling tech, etc. The important thing is that we know what it takes to run the game at minimum and maxed-out. Then, we know what it takes for the a quality setting that is constant between the various resolutions. We, as consumers, can extrapolate from that information! Where my specific hardware might fall...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgs4BXH_h5q8QbCA9o1d20Sd5Ruk_fVLTUTMMVQOkqwamAsmWiKV879SF0SChSQW5cjg2hfU7LQIk3GSYc3ruBk8YVIyf5RMrTW3SBesKGYETYz8DER2358TXwXTacDDoeCTpn9YMEa760X5T5UKg_f4iOwy1LeirF7bl_sjkfeQESYP_P2BOGok5FWfNU/s1080/returnal_pc_requirements.webp" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="812" data-original-width="1080" height="482" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgs4BXH_h5q8QbCA9o1d20Sd5Ruk_fVLTUTMMVQOkqwamAsmWiKV879SF0SChSQW5cjg2hfU7LQIk3GSYc3ruBk8YVIyf5RMrTW3SBesKGYETYz8DER2358TXwXTacDDoeCTpn9YMEa760X5T5UKg_f4iOwy1LeirF7bl_sjkfeQESYP_P2BOGok5FWfNU/w640-h482/returnal_pc_requirements.webp" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Returnal's System Requirements were much the same... though maybe slightly better.</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">Voyeuristic Intention...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">I really have the feeling that even these "improved" requirements we've been seeing as essentially useless for consumers. We don't need subjective adjectives like 'Epic' or 'Amazing', we need defined variables like resolution and expected frames per second. Unfortunately, most people in the critical part of the industry have only been calling for more description, instead of standardisation and I believe that the latter is what we actually require...</div>Unknownnoreply@blogger.com1tag:blogger.com,1999:blog-7560610393342650347.post-31027207648602469952023-07-21T12:13:00.003+01:002023-07-21T12:13:21.085+01:00The Performance uplift of Ada Lovelace over Ampere...<div style="text-align: left;"> </div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgFR0N5PurvGLMWpQ6Dl3qFJnoL2WjIMaSMzyFH3bkQCSr1UZLTJfr_p440g83NhOkcPENBiMqWrExFEMTIbiajTIj74lpESd2Z_MpAtNrBw-A0GFVW_Mi1uLG0eHQ72gqg0vxMFu-vbOnCX00XkBSzqr3ZbeQgw5HwkXofIYOxqroLiDopo6PFC1d84sg/s1920/Title.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="1080" data-original-width="1920" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgFR0N5PurvGLMWpQ6Dl3qFJnoL2WjIMaSMzyFH3bkQCSr1UZLTJfr_p440g83NhOkcPENBiMqWrExFEMTIbiajTIj74lpESd2Z_MpAtNrBw-A0GFVW_Mi1uLG0eHQ72gqg0vxMFu-vbOnCX00XkBSzqr3ZbeQgw5HwkXofIYOxqroLiDopo6PFC1d84sg/w640-h360/Title.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"></td></tr></tbody></table><div style="text-align: justify;"><br /><span style="text-align: justify;">One thing i feel like I may have been known for is being one of the first people to comment on the fact that there was no </span><a href="https://hole-in-my-head.blogspot.com/2020/10/analyse-this-potential-performance-of.html" style="text-align: justify;">"IPC" uplift</a><span style="text-align: justify;"> for </span><a href="https://hole-in-my-head.blogspot.com/2020/12/analyse-this-potential-of-rx-6000.html" style="text-align: justify;">AMD's RDNA 2</a><span style="text-align: justify;"> over RDNA architecture. Well, I never had an RX 5000 series card to check with but </span><a href="https://youtu.be/ZIDi_PI8R8o" style="text-align: justify;">Hardware Unboxed confirmed this in practice</a><span style="text-align: justify;">. So, it was nice to feel validated. </span></div><div style="text-align: justify;"><span style="text-align: justify;"><br /></span>Now, I am aware that RDNA 3 is nothng more than a frequency adjusted RDNA2 (because their extra FP32 configurations do not appear to be easily used in existing programmes), but the question still burns within me: have Nvidia been able to increase the performance of their architecture from Ampere to Lovelace?</div><div style="text-align: justify;"><br />Let's find out...</div><p style="text-align: justify;"><span></span></p><a name='more'></a><p></p><p style="text-align: justify;"><br /></p><h3 style="text-align: justify;"><span style="color: #274e13;">The Intro...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Previously, <a href="https://hole-in-my-head.blogspot.com/2021/04/analyse-this-relative-efficiency-of.html">I predicted that</a> Nvidia would be mostly relying on the improvement of process node advantages in their next architecture: I wondered if it was really necesary for Nvidia to focus on rasterisation improvement over other features like raytracing, given the lacklustre increase in performance from the RTX 20 to 30 series...</div><div style="text-align: justify;"><br />In contrast, RDNA 2 felt like a big step for AMD. Sure, it had lots of success in improving data management and, as such, helped improve energy use in the architecture. But, as I conjectured back then: performance improvements were coming from the increased clock speeds, not from any other architectural magic.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">On the surface, Nvidia's Lovelace architecture looks a lot like the tech giant took a sneaky glance at AMD's homework and applied their own variation on the theme of increasing cache size and decreasing the width of the memory interface.</div><div style="text-align: justify;"><br />In fact, the internals of the SM is the same between the two architectures. But can the large L2 cache actually help improve calculation speed? On the face of it, you might assume it could since the v-cache on AMD's CPUs vastly helps throughput by retaining important working data locally.</div><div style="text-align: justify;"><br />Now, unfortunately, there is no facility that I'm aware of that allows the user to disable parts of the GPU silicon like we are able to with CPUs, to simulate lower resource unit parts. Fortunately, we (and really, I specifically mean, "me") are blessed with a GPU from each of the RTX 30 and RTX 40 sites lineups that are configured with the same number of cores. In fact, if we take a quick comparison between the RYX 3070 and RTX 4070, we can see that they are remarkably similar! (Which is not the case for many of the available parts.)</div><div style="text-align: justify;"><br /></div><p style="text-align: justify;"></p><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEhbpj4iqAD435aFEB7mOtNqZ0DJFn0kqEy6kmKVA_KeTj3oQ-Ve1F-ACOQOubUz2zBSULW2XHDjzqjXe9V-r9v9F4pVunbh64M12-DOfJVyAIzmiCrRyO5fhW3P1W-ZaHBC3-DSc-jb5w4KNRrHFoL4S0HLdTOcyhZb8cxhPc4j45N7TSzHBJP1v0KOD94" style="margin-left: auto; margin-right: auto;"><img alt="" data-original-height="189" data-original-width="254" src="https://blogger.googleusercontent.com/img/a/AVvXsEhbpj4iqAD435aFEB7mOtNqZ0DJFn0kqEy6kmKVA_KeTj3oQ-Ve1F-ACOQOubUz2zBSULW2XHDjzqjXe9V-r9v9F4pVunbh64M12-DOfJVyAIzmiCrRyO5fhW3P1W-ZaHBC3-DSc-jb5w4KNRrHFoL4S0HLdTOcyhZb8cxhPc4j45N7TSzHBJP1v0KOD94=s16000" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Core specs of the two cards...</b></td></tr></tbody></table><br /><div style="text-align: justify;">Sure, the number of <a href="https://en.wikipedia.org/wiki/Render_output_unit#:~:text=The%20ROPs%20perform%20the%20transactions,MSAA%20is%20contained%20in%20ROPs.">ROP</a>s is reduced and, of course, the amount of L2 memory has been increased (a major part of the architectural upgrade talking points) and, as AMD had done with RDNA2, the video memory bus width is decreased due to that larger L2 cache. This last point comes about because the so-called "hit-rate" for data will be higher on-chip, there is less need to travel to the VRAM to get what is needed to do calculations.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">This is great, because memory controllers are expensive in terms of die area and energy cost... plus, they are expensive in terms of circuit board design and component cost - you have to run those traces, place the GDDR chip and manage the power phases for the additional components. If you can effectively get the same performance (or close enough!) by removing some of them, then you've got a product that is cheaper to produce...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The trade-off is that this effect of cache is stronger at lower resolutions - at higher rendering resolutions, the wider VRAM bus width wins-out. i.e. High-end cards need to have a large memory bus width. Now, some of this discrepancy can be countered by hugely increasing the frequency of the memory: faster working memory can do more in the same period of time. This is precisely what Nvidia have done with the RTX 4070 - pairing it with fast GDDR6X running at 21 Gbps compared to the relatively measley 14 Gbps GDDR6 on the RTX 3070.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">So, my hope is that this will be an interesting test to see how much the increased L2 cache really helps with the Ada Lovelace architecture...</div><div style="text-align: justify;"><br /></div><div><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjSoUi5evsWXzTI6eQl2MQ3B0rQt4bAX1IOlf8hHHMFw4BwH-pw7JrFpl7n5_IJiXd9STBFx24kvXYampW2leaoJg9orUA5tjq-9-zNzmTmMXm3bxdFMO8iQ6BDQxHm6VPX-JJYKYy957_GY6yyxaB7lfMKtP9Uw_pZo_A1RxnxGFq7a3jmRKxpN19qQMQ/s780/rdna_2_deep_dive_infinity_cache.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="439" data-original-width="780" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjSoUi5evsWXzTI6eQl2MQ3B0rQt4bAX1IOlf8hHHMFw4BwH-pw7JrFpl7n5_IJiXd9STBFx24kvXYampW2leaoJg9orUA5tjq-9-zNzmTmMXm3bxdFMO8iQ6BDQxHm6VPX-JJYKYy957_GY6yyxaB7lfMKtP9Uw_pZo_A1RxnxGFq7a3jmRKxpN19qQMQ/w640-h360/rdna_2_deep_dive_infinity_cache.png" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>AMD released this slide, speaking about the trade-off between cache size and data hit-rate...</b></td></tr></tbody></table><br /><div><br /></div><div><br /><h3 style="text-align: justify;"><span style="color: #274e13;">The Setup...</span></h3><div style="text-align: justify;"><br />Just like HUB did in their RDNA 2 testing, the aim is to lock the core clock frequency to a setting that both cards can achieve using a piece of software. I used MSI Afterburner for this as it integrates well with the Nvidia GPUs. In practice, this was more of a challenge for the Ampere card as it kept wanting to drop clocks, very slightly. However, you will see that, in practice there's not really an issue with a ~20Mhz difference.</div><div style="text-align: justify;"><br />In contrast the Lovelace card had no issues and took the down-clocking in its stride. Of course, this strategy isn't fool-proof because by reducing the core clocks, we are also slowing down all the on-die caches and this may have a non-linear larger effect on performance than we might expect.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">As a result, I will attempt to extrapolate between the stock performance and when underclocked to see the scaling.<br /><br />One thing I also wanted to mitigate as much as possible was the vast gulf in memory frequency between the two parts. At stock, the 3070 has 14 Gbps memory while the 4070 has 21 Gbps, owing to the use of GDDR6X.</div><div style="text-align: justify;"><br />So, I increased the 3070's memory to 16 Gbps and reduced the 4070's to 20 Gbps*. This doesn't close the gap as much as I'd like but it's better than nothing!** </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">One last note - as usual, I am recording the tests using Nvidia's Frameview as I find that internal game benchmarks do not usually line up with the actual results. For instance, though I've used the included benchmark in Returnal, the values given for average, minimum, and maximum fps vary slightly from the results calculated using frameview.</div><div style="text-align: justify;"><span style="color: #274e13;"><blockquote><b><i>*I am not aware of any way to reduce this further as underclocking the memory is quite limited in the Afterburner software. </i></b></blockquote><blockquote><p><i><b>**In retrospect, I should have left the 4070's memory speed where it was as these alterations actually put the theoretical memory bandwidth of the 3070 at 512 GB/s and the 4070 at 480 GB/s - swapping them! Next time, I won't make the same mistake :)</b></i> </p></blockquote></span><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi577-zGG8Rhq78KqpUyQ7dRvV-o5aLmcIWxBc2UMKp4Hmb421yI0ICc1irfbCBTkKLvPfJhj7mCKMRWRlRCZTzBpJVneNg96Ri0EhIuVNk_BpiiB1qzyn_Vceuay3LTseBA1NQQ4UZMIVmEmMtg7C3ohHo6TGICBz52qytYw447UI-yv5_J-PoqCt1HTY/s503/Returnal%20memory%20scaling_clock%20scaling.PNG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="195" data-original-width="503" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi577-zGG8Rhq78KqpUyQ7dRvV-o5aLmcIWxBc2UMKp4Hmb421yI0ICc1irfbCBTkKLvPfJhj7mCKMRWRlRCZTzBpJVneNg96Ri0EhIuVNk_BpiiB1qzyn_Vceuay3LTseBA1NQQ4UZMIVmEmMtg7C3ohHo6TGICBz52qytYw447UI-yv5_J-PoqCt1HTY/s16000/Returnal%20memory%20scaling_clock%20scaling.PNG" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Here, we can compare the effect of memory speed and core clock speed on the result...</b></td></tr></tbody></table><br /><div style="text-align: justify;">Returnal is a game that is completely GPU-limited, so the above results are not being constrained by the i5-12400 in my system. Looking at the results we can see that the performance drop is, indeed, non-linear: For a 30% drop in core clock speed, we are losing around 18% of the performance, and clawing back another couple of percent as we bump up the memory frequency from 20 Gbps to 21 Gbps and then 22 Gbps for a loss of only 16 %.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">I should note that I am not quite sure why the minimum fps is lower at the stock settings but this could be one of those results where there was a hiccup in the system. I would disregard the comparisons to the minimum fps value in the stock configuration for this test, since it was not also observed when raytracing was enabled.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">While I didn't perform the memory scaling test with RT enabled, we have the same difference of 19% performance loss for the 30% core clock frequency loss - which I think is pretty impressive!</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">What <i>IS</i> interesting to note is that we are confirming, here, that the RTX 4070 is limited by the memory bandwidth. Unfortunately, I could not manage higher on my card but it seems clear to me that Nvidia could pull out more performance by upping the spec even further to the 24 or 26 Gbps standard (or, you know, by widening the bus width!).</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">This, in my opinion, has implications for the RTX 4060 series cards - they are using GDDR6, with the 4060 using 17 Gbps memory and the Ti variants using 18 Gbps memory. We've already seen that these cards are bandwidth constrained in the reviews posted by many outlets but I wonder at the potential for extra performance if they had been paired with GDDR6X like their bigger siblings...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiMtVPjk43M0TW_CSTwb94yG4xSUyRtw63MR685Bl5gdMBehrGSaYT3Ds60VlJYtFMIjVwtGpTnRsPPzC9h4SITKTmonEYwNAbDSDil1Hm3vouXWqT-OiKhNiRDztnBAU0lU9OxZ707Mu6C8iQj_znQwpGI9Pst3_v_Xphe6we4WypuLxTlYO7_3omAn_E/s1049/Returnal.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="262" data-original-width="1049" height="160" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiMtVPjk43M0TW_CSTwb94yG4xSUyRtw63MR685Bl5gdMBehrGSaYT3Ds60VlJYtFMIjVwtGpTnRsPPzC9h4SITKTmonEYwNAbDSDil1Hm3vouXWqT-OiKhNiRDztnBAU0lU9OxZ707Mu6C8iQj_znQwpGI9Pst3_v_Xphe6we4WypuLxTlYO7_3omAn_E/w640-h160/Returnal.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>With Returnal, we can see zero uplift between the two architectures...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;">With the downclocking, we can see that the RTX 4070 is essentially an RTX 3070 in performance - there is <u><b>zero</b></u> improvement in architecture observed here - just as observed with RDNA2. What is surprising to me is how the RTX 3070 actually slightly beats its successor in almost all metrics - the 4070 has a slightly more consistent presentation, with fewer sequential frametime excursions beyond 3 standard deviations. I would speculate that this might be an effect of the larger L2 cache in action. However, it's not a large improvement and the player would probably not notice the difference.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjsYi3G6Zy5rms6jusvg9n41N2VfN8W0MHmkaCwSjkqzJEWDHDXhGoh-ECguCF4vcyv3poGMQ-QC-6nSvpnYF_-oG3AfPlchJ5ORIc_i4_r29KQffWn9JYPhcbFwGXdQqgpKOOk3cGoPCs_2zkHBNqBmN5RC0c6-X7WsBEEpHpnt2ChQ-3Uhn9o_OHhy5k/s1049/Jedi%20Survivor.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="262" data-original-width="1049" height="160" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjsYi3G6Zy5rms6jusvg9n41N2VfN8W0MHmkaCwSjkqzJEWDHDXhGoh-ECguCF4vcyv3poGMQ-QC-6nSvpnYF_-oG3AfPlchJ5ORIc_i4_r29KQffWn9JYPhcbFwGXdQqgpKOOk3cGoPCs_2zkHBNqBmN5RC0c6-X7WsBEEpHpnt2ChQ-3Uhn9o_OHhy5k/w640-h160/Jedi%20Survivor.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Being CPU-bound, the frametime spikes can be pretty brutal in areas of high-density assets, even after a few patches...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Jedi Survivor's semi-open worlds provide a different challenge, one that is bottlenecked around the CPU and system memory - resulting in large frametime spikes when traversing the world. Again, the experience between the 3070 and the clock-matched 4070 is very similar, with the 4070 only pulling ahead when ray tracing is enabled. Again, something I am guessing is related to the larger L2 cache - since we know that some RT workloads can benefit from the larger bandwidth and latency of being on-chip.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The lower maximum frametime spike in the test of the RTX 4070 with RT on is explainable because I messed-up that run because I got into an unplanned extended fight with some battledroids and just looped around again, without re-loading. That spike, which is pretty consistent for all first time runs, is reduced on the second go-around as some data must already be resident in the RAM, significantly reducing the severity of the spike. I bet you can guess the location of the spike, too - it's just as you enter the Pyloon Saloon...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiPUf721jD4n7IK5LDcqi440ettttNsxUKb2OaFawbrlBngzcrkz44BJySYtdX6kU8mSN1iJJjXPKrMVtfRqzPW_c27jzvz1hviwWtkHmqYE-7fX9vtK2jhSsRz90PRPKeQ2NS6Utv95zlSX_ugXAP8yI4JkV-vlqST-wdtVPdnv1Xvr6_IcYWQAxrZRow/s1049/Hogwarts%20-%20Hogsmeade.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="262" data-original-width="1049" height="160" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiPUf721jD4n7IK5LDcqi440ettttNsxUKb2OaFawbrlBngzcrkz44BJySYtdX6kU8mSN1iJJjXPKrMVtfRqzPW_c27jzvz1hviwWtkHmqYE-7fX9vtK2jhSsRz90PRPKeQ2NS6Utv95zlSX_ugXAP8yI4JkV-vlqST-wdtVPdnv1Xvr6_IcYWQAxrZRow/w640-h160/Hogwarts%20-%20Hogsmeade.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"></td></tr></tbody></table><br />Next up, I have two tests from within Hogwart's Legacy as I wanted to test the effect of moving quickly throughout the world as well as the traditional run around Hogsmeade. Each of these tests is performed from a fresh load into the game so there are no shenanigans as above with Jedi Survivor.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">With RT off, both cards perform very similarly - though the RTX 4070 is more consistent in its presentation with fewer excursions noted. RT on tells a different story, though: the 3070 outperforms the 4070 by a decent margin in Hogsmeade! To be honest, I cannot fully explain this result... I might have guessed that it was related to the lower memory bandwidth, but I tested with 21 Gbps memory and got a result that was within the margins of error. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">It's possible that this is related to down-clocking of the die itself, with everything running a little slower.<br /><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjj4adsttFOzT98DC5Zj-So23FrEyHxyfUvS1pcu_QPkSzLCxJNltVzcRqslpvsrvEmTvamhjkyy4Y4lUfvJS5K5ouuUyn8NdeCUrerZFpWqKFI4MJs7DKj9aA_4GUzAWsWqYvvCdsxi8fHR6fBhv4Pa9Cr1oyGEWZ1IxFn1Ia4U_w9i1IwE-i_R-ifglY/s1049/Hogwarts%20-%20broom.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="262" data-original-width="1049" height="160" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjj4adsttFOzT98DC5Zj-So23FrEyHxyfUvS1pcu_QPkSzLCxJNltVzcRqslpvsrvEmTvamhjkyy4Y4lUfvJS5K5ouuUyn8NdeCUrerZFpWqKFI4MJs7DKj9aA_4GUzAWsWqYvvCdsxi8fHR6fBhv4Pa9Cr1oyGEWZ1IxFn1Ia4U_w9i1IwE-i_R-ifglY/w640-h160/Hogwarts%20-%20broom.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"></td></tr></tbody></table><br /><div style="text-align: justify;">Moving onto the broomflight test, we see the expected results - with RT off, the 3070 is very slightly outperforming its successor, while with RT enabled, we see the 4070 pull ahead by a similar margin as the 3070 in the prior test.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Our second to last test is one of my favourite, as it's actually fun to swing through the city! Spider-man continues the trend of the RTX 3070 outperforming the 4070, with RT enabled and off. However, once again, we can see that the performance of the 4070 with RT enabled closes the gap significantly, which could be a combination of the improved RT cores and larger L2 cache...</div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjN-CfdwotkTdiEmpZg1FWCHCOJS6ERCQJsPqhUj-_GBpnPflJnfEXREmUnbZ3PAvs5yClxKLQvIaXWgjfSXNtF6jtXdhN3dzGKZrGhQ4Wx4sp27rAA3WOOjWxjkX_AY4CT19cWtd2Je7AWNHiImkJw05II8kAqapbElgZBJCy28xpe1IoPYKOXineGqBM/s1049/Spider-man.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="262" data-original-width="1049" height="160" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjN-CfdwotkTdiEmpZg1FWCHCOJS6ERCQJsPqhUj-_GBpnPflJnfEXREmUnbZ3PAvs5yClxKLQvIaXWgjfSXNtF6jtXdhN3dzGKZrGhQ4Wx4sp27rAA3WOOjWxjkX_AY4CT19cWtd2Je7AWNHiImkJw05II8kAqapbElgZBJCy28xpe1IoPYKOXineGqBM/w640-h160/Spider-man.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"></td></tr></tbody></table><br />The last of the benchmarks is a return to The Last of Us - which, after many, many patches, is now performing quite well. In this test, we see a similar result for both cards, with the RTX 4070 performing slightly better in terms of presentational consistency.<br /><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjwRTvMUlAyBFUmCYuIXBn2_QbiSscRih-jG1JzGWWSRUp_6GiDFykalKe2mwfGfbWncyyjM2YzlNCMYzzWlzqY7veGQf_9JmMf8ULaYy4uA0Mjj-GpeVBS7aWZ1XwVLQ0DJZDMGie6hl5xaHEoXApn2lTShCOqNi1vnTTHsX6qnRex9YnVO7fjmvzpo6s/s701/TLOU.PNG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="352" data-original-width="701" height="322" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjwRTvMUlAyBFUmCYuIXBn2_QbiSscRih-jG1JzGWWSRUp_6GiDFykalKe2mwfGfbWncyyjM2YzlNCMYzzWlzqY7veGQf_9JmMf8ULaYy4uA0Mjj-GpeVBS7aWZ1XwVLQ0DJZDMGie6hl5xaHEoXApn2lTShCOqNi1vnTTHsX6qnRex9YnVO7fjmvzpo6s/w640-h322/TLOU.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Once again, we're looking at virually identical performance...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">The Conclusion...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">As was the situation with the move from RDNA1 to RDNA2, the main performance benefit in the new architecture of the RTX 40 series over the RTX 30 series is the increase in core clock frequency. Yes, the larger L2 cache helps to mitigate the narrower memory bus width but it is not really having any effect on absolute graphical performance of the card as the underlying architecture is essentially the same.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><a href="https://www.nvidia.com/en-us/geforce/news/rtx-40-series-vram-video-memory-explained/">Nvidia themselves put forth</a> the idea that (paraphrasing) "increasing cache hit rate increases frame rate".</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">In terms of compensating for a narrower bus width and avoiding bottlenecks, they can be correct in a data-starved environment where increased latency will cause frametime spikes - the L2 can reduce those as the rest of the die has to wait less time for the data it requires. They are also correct that this change improves efficiency by a large margin by reducing the memory traffic to VRAM.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">However, we can see in my data that the L2 is, for the most part, having no noticeable effect on performance compared to the last generation part.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">What is likely having a negative effect on the performance is both the lower VRAM bandwidth and fewer numbers of ROP units (running at the lower core frequency!) on the die and these cut-backs could be masking any potential benefit that the increase in L2 cache might have from a theoretical standpoint in an iso-frequency environment.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhgoFp-hGfzqwiuu4N-f0eIIn84FyTUwqCikqbxtXdnXdbb930TBYyNgWJVCkVC7ut6ej2r7NqSXxs49rbSKJHnXzYAZwRbsFSsZTn6EAS-_3IIbWJQAUEVdjhIE4808BXizv1bcvOyW9LkG9tF06-rpqUTj-VLr7AM-DgWTisv3sgWo-tKWi1nbcgfcaY/s3840/nvidia-geforce-ada-lovelace-memory-subsystem-performance-and-efficiency-improvements.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="2160" data-original-width="3840" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhgoFp-hGfzqwiuu4N-f0eIIn84FyTUwqCikqbxtXdnXdbb930TBYyNgWJVCkVC7ut6ej2r7NqSXxs49rbSKJHnXzYAZwRbsFSsZTn6EAS-_3IIbWJQAUEVdjhIE4808BXizv1bcvOyW9LkG9tF06-rpqUTj-VLr7AM-DgWTisv3sgWo-tKWi1nbcgfcaY/w640-h360/nvidia-geforce-ada-lovelace-memory-subsystem-performance-and-efficiency-improvements.png" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Nvidia's comparison with an RTX 4060 Ti with 32MB L2 cache versus a special version with only 2 MB L2 cache...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;">So, Nvidia might not have been telling fibs - they were just speaking in the general sense and not about specific designs that they were putting into their new GPU lineup compared to the prior generation of cards as they have shown (above) that adding the larger L2 does increase performance on the RTX 4060 Ti.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">What would have been interesting to see is the effect of the larger L2 cache coupled with the same number of ROPs and memory bus width as the RTX 3070... Alas, we can only dream. What is undeniable is the amount of energy saving going on in the new architecture. Across all results, the RTX 4070 was pulling roughly half the power compared to the 3070 at the same performance.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">And that, as they say, is that.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The performance of the new architecture is technically the same as the old but only because they have removed aspects that helped with that performance and countering those removals with that additional cache. If the cards were identical in all other aspects then we might actually be seeing a larger performance uplift than we are...</div></div></div>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-7560610393342650347.post-14516277259448159222023-07-08T11:08:00.003+01:002023-07-08T12:16:24.887+01:00June Roundup... Software highs and hardware lows...<div style="text-align: left;"><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjFy0iltK9QrwXdJ8nugNmmtqtjbukkzRUW4XJul8mprlVKo6FT2GeCyZK2fB5qHPoqfJxxa5a8SuE_vnHgfh9SccSe9-B6fAVVJvnb84q6dz91mJw88eEYwIgiB5WvbNpqZg2L-BvXqnmhL2CCxZP7g8pCnWnNMlLTiA_a4Pf1vGup3jFqX79OH5E7724/s5857/Title.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="3893" data-original-width="5857" height="426" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjFy0iltK9QrwXdJ8nugNmmtqtjbukkzRUW4XJul8mprlVKo6FT2GeCyZK2fB5qHPoqfJxxa5a8SuE_vnHgfh9SccSe9-B6fAVVJvnb84q6dz91mJw88eEYwIgiB5WvbNpqZg2L-BvXqnmhL2CCxZP7g8pCnWnNMlLTiA_a4Pf1vGup3jFqX79OH5E7724/w640-h426/Title.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;">I don't have much time for technical analysis over this coming period though I have a few ideas to explore in the coming months. So, I thought I'd have a bit of a round-up of my thoughts that, more often than not, end up in YouTube video comments and Twitter discussions - especially when I don't see these points replicated anywhere else...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">So, let's summarize!<span><a name='more'></a></span></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">GPU news </span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The RX 7600 and RTX 4060 and RTX 4060 Ti were released and there are a few take-aways from these releases.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><ul><li>Enthusiast consumers are still really angry about issues related to prices and availability over the last couple of years.</li><li>RDNA 3 really isn't much of an improvement over RDNA 2 per resource unit.</li><li>RTX 4060 and RTX 4060 Ti are actually impressive cards just at the wrong price point.</li><li>If RDNA 3 had competed this graphics card generation could have been an amazing one... probably the best in history (no exaggeration!).</li></ul><div><br /></div><h4><span style="color: #274e13;">The Gnashing of Teeth...</span></h4><div><br /></div><div>Jayz2cents' pulled review of the RTX 4060 Ti* and general semi-faux** jumping on the hate bandwagon for clicks/views that I've seen on Youtube have highlighted to me how much the last couple of years has damaged the gaming and PC technology scene on the consumer side - let alone the industry side.</div><div><br /></div><div>They way I see it is there is residual aggression and frustration over being unable to get parts combined with the same emotions with being unable to afford them... both during the crypto craze and now with the current generation of cards being hyper-inflated in price (for whatever constellation of reasons).</div><div><br /></div><div>This effect is exacerbated by <a href="https://hole-in-my-head.blogspot.com/2022/02/the-rate-of-advancement-in-gaming.html">the increased duration</a> between graphics card generations which is only getting longer as time progresses, resulting in an unending cycle of consumer desire being unfulfilled and a sharp slow-down in generational increase in performance being available to the low-end of the market.</div><div><br /></div><div>The point, here, is that virtually no one had their nose out of joint when Nvidia released the RTX 20 series because consumers had bought into the GTX 9 and 10 series of cards wholesale (as well as AMD's very serviceable RX 400 and 500 series cards). However, at the point of 2020 and now, worse - three years later, we are in a situation where consumers need or feel the need to upgrade to improve the performance in modern games and find themselves unable to do so for both reasons outlined above.</div><div><b><span style="color: #274e13; font-style: italic;"></span><blockquote style="font-style: italic;"><span style="color: #274e13;">*Because it wasn't negative and Phil dared to have a positive opinion about the product...</span></blockquote><p></p><blockquote><span style="color: #274e13;">**<i>Oh, I'm sure they think the prices are too high but the hyperbolic language of a good number of techtubers is really playing to the current leanings and expectations of the audience that both avoids confrontation with them and also drives engagement for people wanting to validate their outrage...</i></span></blockquote><p></p></b></div><div><div>Plus, I feel like the push-back against new technologies like ray tracing is tied heavily into this lack of ability to buy the capable hardware. Although I see some people claiming that they turn off all those features on their RTX 4090s because they only get 100 fps or somesuch... I just can't see the majority of gamers taking that stance!</div><div><br /></div><div>I feel that all those polls of people saying that they don't care about RT in games is purely a reflection of the fact that they don't, for the most part, have hardware in their gaming rigs that can do it. If they did, I would bet a lot of money that the poll percentages would be strongly inverted. That's also a reason why you've seen movement over the last year from outlets such as HUB move in favour of RT when they were previously pretty dead set against it - purely on what was available in the then-current games (which, IMO, as a tech enthusiast is the wrong way to look at things but I digress...).</div><div><br /></div></div><div><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiog6SPcgzkTgLXjLV8zxbNPaDGN_ENJQOYZ8ev-dtbxzi2XyJ3T5uWxnnkMlsQdiA1MRvIzNQBkHZVP4Vdez_SVB7UenIAbaOqvN_IaLAm79J1zV-Pagk5R-xQcpng1S7wVXlv9z7OLPDS4jTv27ZsdSSuFr8tUz8mCnRU8ldzysAayWSroLJxg2ECHu4/s804/Radeon.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="475" data-original-width="804" height="378" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiog6SPcgzkTgLXjLV8zxbNPaDGN_ENJQOYZ8ev-dtbxzi2XyJ3T5uWxnnkMlsQdiA1MRvIzNQBkHZVP4Vdez_SVB7UenIAbaOqvN_IaLAm79J1zV-Pagk5R-xQcpng1S7wVXlv9z7OLPDS4jTv27ZsdSSuFr8tUz8mCnRU8ldzysAayWSroLJxg2ECHu4/w640-h378/Radeon.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>The RX 7600 is actually decent value for money, despite not having that great a performance compared to last generation offerings - especially with its continually (and officially) falling price!</b></td></tr></tbody></table><br /><div><br /></div><h4><span style="color: #274e13;">The Double Helix...</span></h4><div><span><br /></span></div><div><span>RDNA has been a very successful architecture for AMD, I would say more than GCN was - it's flexible, very scalable and relatively power efficient. Plus, the improvements they've made to the front end and even in registers and the cache hierarchy has shown similar scalability, too.</span></div><div><span><br /></span></div><div><span>On the other hand, despite architectural changes between power- and cost- saving features* and widening of the DCU throughput** RDNA has not really delivered on a lot of its promises. Most improvements appear to have come through process node optimisation - allowing higher frequency and lower voltages required to achieve those frequencies.</span></div><div><span><br /></span></div><div><span>Last time, <a href="https://hole-in-my-head.blogspot.com/2023/06/analyse-this-what-would-mid-gen-console.html">I made the point</a> that <a href="https://www.techspot.com/review/2686-amd-radeon-7600/">the monolithic RX 7600</a> <a href="https://www.techpowerup.com/gpu-specs/radeon-rx-7600.c4153">isn't that different</a> from <a href="https://www.techpowerup.com/gpu-specs/radeon-rx-6650-xt.c3898">the monolithic RX 6650 XT</a> and, given that we know that RDNA 2 has no improvements, beyond data management and core frequency over RDNA 1, this basically highlights that there's no performance increase across three generations of RDNA architectures.</span></div><div><span><br /></span></div><div><span><br /></span></div><div><span>Let's take a look at what they have in common:</span></div><div><ol><li><span>2048 shader units</span></li><li><span>128 TMUs</span></li><li><span>64 ROPs</span></li><li><span>L0 = 32 KB per WGP</span></li><li><span>L1 = 128 KB per array</span></li><li><span>L2 = 2 MB</span></li><li><span>L3 = 32 MB</span></li><li><span>PCIe 4.0 x8 lanes</span></li><li><span>128 bit bus / 8 GB VRAM</span></li><li><span>1x PCIe power 8-pin required</span></li></ol></div><div>And let's see what's different:</div><div><ol><li>RX 7600 is 86% of the die size of the RX 6650 XT</li><li>RX 7600 has 18 Gbps GDDR6 and RX 6650 XT has 17.5 Gbps GDDR6</li><li>RX 7600 has 13,300 million transistors and RX 6650 XT has 11,060 million</li><li>RX 7600 uses TSMC 6 nm node and RX 6650 XT uses 7 nm</li><li>RX 7600 has a boost clock of 2655 MHz and RX 6650 XT has 2635 MHz</li><li>RX 7600 has a TDP of 165 W and RX 6650 XT has 176 W</li></ol></div><div>The problem here is that this is essentially the same product. None of the improvements that differentiate RDNA 3 from RDNA 2 have had any benefit to the performance of the card - it's all down to process node improvements and memory speed between the two. In fact, given the overclocking potential of RDNA 2, it's very easy to match the RX 7600 performance improvement of 7% through a small core and memory frequency increase. Yes, you may not save that <a href="https://www.techpowerup.com/review/amd-radeon-rx-7600/38.html">~20 - 35 W difference during gaming</a> but the consumer could have had this performance for a similar price for over half a year now. This is not an exciting product and shows the lack of any real improvement between generations on AMD's part.</div><div><br /></div><div><span style="color: #274e13; font-style: italic; font-weight: bold;"></span><blockquote style="font-style: italic; font-weight: bold;"><span style="color: #274e13;">*The increase in all caches but mostly the L3 cache allowed reduction in the number of memory controllers required to feed the compute units. The implementation of chiplets also allows other power-savings which are not fully realised on RDNA3 because of an apparent bug in data transfer between the chiplets, VRAM and the GPU die (at least from my reading of the situation surrounding the idle power draw)... Both of these changes also concurrently result in manufacturing cost savings for AMD.</span></blockquote><p style="font-style: italic; font-weight: bold;"><span style="color: #274e13;"></span></p><blockquote><span style="color: #274e13;"><b><i>**Which, in theory, should allow developers to implement certain visual effects with greater speed (thus improving duration between frametimes) but which also requires either intense driver effort to implement or specific effort from developers of each individual game to implement... This (if my understanding is correct) should have a stronger effect at higher resolutions.</i></b></span></blockquote><div><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgwWXUMFfodFUgSwE7tSLdx6klrv_PFqo7mjP2M52km4_pT-dFcVrZCX1-yxODmhDr6cFiq15l4hNDCFNDesHXN-dxtuw3XoHB2LX8vCGvLfJMp1hZFReezZ5c7Qn4AxxruEl-E5jUth9_ejDjuTsBgeqMizC1GgKR8d-aCxj6rdCyY9QK0-iKL_5rfKUE/s711/Capture.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="504" data-original-width="711" height="284" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgwWXUMFfodFUgSwE7tSLdx6klrv_PFqo7mjP2M52km4_pT-dFcVrZCX1-yxODmhDr6cFiq15l4hNDCFNDesHXN-dxtuw3XoHB2LX8vCGvLfJMp1hZFReezZ5c7Qn4AxxruEl-E5jUth9_ejDjuTsBgeqMizC1GgKR8d-aCxj6rdCyY9QK0-iKL_5rfKUE/w400-h284/Capture.PNG" width="400" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>This picture of the 4060 Ti is small because that's how small the die is! *Ba-dum-pish!*...</b></td></tr></tbody></table><br /><div><br /></div><h4><span style="color: #274e13;">The Racers...</span></h4><p><br /></p><div>Nvidia have stumbled in the eyes of enthusiast gamers this generation only because their actions during last generation were mostly obscured from public view due to the surrounding market conditions. Now their strategies are on view for all to see, without any filters - <a href="https://www.marketwatch.com/story/moores-laws-dead-nvidia-ceo-jensen-says-in-justifying-gaming-card-price-hike-11663798618">despite what they may want you to believe</a>.</div><div><br /></div><div>As a result, combined with the general anger in enthusiast circles that I noted above, we are looking at consistently strong levels of criticism that I've never seen in the industry. It's not just gamers; it's tech press, game devs, AIB partners (most notably with the departure of EVGA from the GPU market!), and even some industry analysts...</div><div><br /></div><div>Nvidia is lucky that AI has taken off substantially this year otherwise there wouldn't be much positive press for them to focus on. </div><div><br /></div><div>Saying all that, the hatred their products lower in the stack have recieved are unfortunate in a technical sense because, price aside, they are decent - especially when viewed from a efficiency perspective (which I feel is an increasingly important aspect of PC hardware). Yes, we have some regressions from the last generation of products but, let's be fair, there's the same situation on AMD's side as well!</div><div><br /></div><div>Unfortunately, Nvidia's choice to use lower-than-usual-tier silicon in their products has resulted in a situation where the consumer is losing and, thus, their products can generally not be recommended. Even relatively placid commentators <a href="https://youtu.be/rE_vzPR6tfA?t=1866">like NX Gamer</a> cannot get behind their shenanigans! However, <a href="https://youtu.be/4A1n6Mmj1M0">Craft Computing has stirred the waters</a> by pointing out that many reviews and reviewers are only assessing the absolute performance on an academic level, rather than a practical measure.</div><div><br />This isn't a new argument and outlets such as <a href="https://www.youtube.com/@DigitalFoundry">Digital Foundry</a>, <a href="https://www.youtube.com/@NXGamer">NX Gamer</a> and <a href="https://hole-in-my-head.blogspot.com/">myself</a>* have been pivoting to cover the "user experience" which is, IMO, becoming more important than ever in this post-pixel age**.</div><div><br /></div><div>While no particular review method is incorrect - all review types are fine as long as they are internally consistent! There is a growing need for "real world" experiential testing, showing the reasons for (and potentially against) buying the new "thing". Academic reviews do not address this need - or at least they are addressing it less and less... though I, myself, value them incredibly as I'm a bit of a nerd like that.</div><div><br /></div><div><b><i><span style="color: #274e13;"><blockquote>*If I can be considered an outlet after more than 15 years!</blockquote><p></p><blockquote>**The pixel is dead, upscaling is here to stay, hardware advances are slowing and tech cost is inflating beyond what the majority can afford... </blockquote><p></p></span></i></b></div><div><br /></div><div>Coming back to Nvidia, the RTX 40 series really is the RDNA 2 of their lineup: as I will show in my next article, their improvements this generation really only relate to process node and operating frequency, the same as RDNA2 was over RDNA1. Though, to be fair, RDNA3 appears to also be in the same boat... so you <i>could</i> argue there's more performance stagnation on AMD's side of the fence...</div><div><br /></div><div><br /></div></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgX5SMXVrvMLJlyZB2xFIkDol2od-kFozGF98DaCitYET4-kd2ttQ3Z5PZUCYGlvhd-G6sjRfZLwiR-hcSWilpHHFD8s0z4qWKIGTox1VBQuJBhwyxOm9zmP1AX8w4EMbwAz_KxhwQO7DidUNXpAvfqmqlIM114CR3s5rfRP0aefMbKvlaNd1U3-ubabdU/s800/SF_sm.png" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="450" data-original-width="800" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgX5SMXVrvMLJlyZB2xFIkDol2od-kFozGF98DaCitYET4-kd2ttQ3Z5PZUCYGlvhd-G6sjRfZLwiR-hcSWilpHHFD8s0z4qWKIGTox1VBQuJBhwyxOm9zmP1AX8w4EMbwAz_KxhwQO7DidUNXpAvfqmqlIM114CR3s5rfRP0aefMbKvlaNd1U3-ubabdU/w640-h360/SF_sm.png" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Starfield is in contention for the most controversial game of all time! However, I believe that Mass Effect 3 still wins out...</b></td></tr></tbody></table><br /><div><br /></div><h3><span style="color: #274e13;">Software Gains...</span></h3><div><br /></div><div>Much as been made about Xbox's "lack" of software titles - in reality, it's just short hand for "big games". As a result, Starfield has becom the idol upon which many have rested their hopes. For a studio that is RENOWNED for the technical glitches and shortcomings, this seems like one of those cosmic jokes one reads about from time to time in science fiction... appropriate.</div><div><br /></div><div>At any rate, the "<a href="https://www.gamespot.com/articles/starfield-will-run-at-30-fps-on-xbox-series-x-s/1100-6515086/">30 fps</a> <a href="https://screenrant.com/starfield-30-fps-good-performance/">on the Series</a> <a href="https://www.ign.com/articles/bethesdas-todd-howard-confirms-starfield-performance-and-frame-rate-on-xbox-series-x-and-s">consoles</a>" <a href="https://youtu.be/aeeH0N-MFuM">issue isn't</a> <a href="https://youtu.be/i9ikne_9iEI">going to go away.</a> It's a shock, to be sure. Unfortunately, because of the prolongued segue into the current console generation, a lot of games were able to <i>easily</i> get away with 60 fps modes and higher. Now, as we move into a period of games releasing that only focus on the current generation of consoles, we will probably find that some developers have bitten off more than they can chew or more than they expected to chew.</div><div><br /></div><div>Let's face it: it's not like 60 fps has been an expectation for the last ten years of gaming on console. 30 fps was considered fine from both sides: developer and consumer. So, everyone and their dog has likely been making games with a "flexible" frametime budget in mind over the last 3-5 years for their debut on the new consoles. Unfortunately, in that timeframe, consumers have realised that higher fps is generally a better experience that results in lower input latency and a smoother visual experience.</div><div><br /></div><div>Developers cannot spin on a dime - so every game that did not have this consideration in its development, or which is unable to optimise the game for this aspect of presentation in the time available to release, will be unable to meet this new consumer expectation.</div><div><br /></div><div>However, in comparison with other games, I am not really worried about this release on the Xbox. <b style="text-decoration-line: underline;">Starfield is a 10-year game.</b> The performance at release doesn't matter - it will be patched, re-released on new hardware, it will be played, modded and analytically dissected over on the PC platform.</div><div><br /></div><div>Does the <a href="https://www.eurogamer.net/digitalfoundry-face-off-skyrim">poor performance of Skyrim's release on PS3 or Xbox 360</a> mean than the game wasn't successful or long-lasting? In a word: No.</div><div><br /></div><div>Yes, it's disappointing in the here and now but this isn't a flash in the pan (assuming the same trajectory of prior Bethesda games) - it is a marathon upon which player experiences will be built...</div><div><br /></div><div><br /></div><h3><span style="color: #274e13;">Conclusion...</span></h3><div><br /></div><div>And that pretty much wraps up where things are at, IMO. As I hinted at earlier in the blogpost - next time I'll be looking at the performance uplift of the Ada Lovelace architecture over the Ampere lineup.</div></div></div>Unknownnoreply@blogger.com2tag:blogger.com,1999:blog-7560610393342650347.post-12229256472345741522023-06-21T20:17:00.003+01:002023-06-22T15:26:22.897+01:00Analyse This: What would a mid-gen console refresh look like...?<div style="text-align: left;"><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiUtopjhkwgEyRAQ5we7p0jyZ5K-dmva0OoO9p9uYQs-wFOB4KqT1Hl-809KFKF7r5knRroRW_MH4JfXpy8UhyboGFsTDr2W4mcDzVk9x8jgSyG0P1duIPtLLiF-Xe4_WPGwhQPl2zsJTPwFGAQe9ry_Uvcv2N3r18MPwP7uwQ3uJNvdDFvEMwIUpu5GgY/s740/Ryzen_Z1%20header1.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="416" data-original-width="740" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiUtopjhkwgEyRAQ5we7p0jyZ5K-dmva0OoO9p9uYQs-wFOB4KqT1Hl-809KFKF7r5knRroRW_MH4JfXpy8UhyboGFsTDr2W4mcDzVk9x8jgSyG0P1duIPtLLiF-Xe4_WPGwhQPl2zsJTPwFGAQe9ry_Uvcv2N3r18MPwP7uwQ3uJNvdDFvEMwIUpu5GgY/w640-h360/Ryzen_Z1%20header1.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"></td></tr></tbody></table><br /><div style="text-align: justify;">There have been a lot of rumours and meanderings surrounding a potential mid-gen "pro" console for both Sony and Microsoft's current consoles. However, I've not really seen much analysis for what form such a device would take or why such a device might exist. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">For this post, I'm going to delve back into my hardware speculation territory to see if we can't imagine some devices for both companies that might make some sense in the market.<span><a name='more'></a></span></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">Adapting to Reality...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">I'm going to come right out and say the quiet part out loud: There isn't a really strong use-case for releasing a "pro" console for either Microsoft or Sony. The beauty of both the Series X and PS5 is that they are pretty powerful consoles and developers have yet to really master or stretch the hardware to its limit. Also, both consoles were in such short supply for the first two years of their lifecycle that pent-up demand is still being satisfied, with consoles selling-through pretty quickly to this day!</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">If anything, the fact that <a href="https://blog.playstation.com/2022/08/25/ps5-price-to-increase-in-select-markets-due-to-global-economic-environment-including-high-inflation-rates/">Sony increased the price of the PS5</a> in any market where there is little competition from the Xbox... with Microsoft <a href="https://www.gamesindustry.biz/microsoft-increases-the-cost-of-xbox-series-sx-units-in-sweden">following suit in two countries</a> where they barely make a mark... puts paid to the notion!</div><blockquote><div style="text-align: justify;"><b><i><span style="color: #274e13;">Scratch that! During the writing of this blogpost, <a href="https://www.theverge.com/2023/6/21/23768400/microsoft-xbox-series-x-xbox-game-pass-price-increase">they've now done a Sony</a>!</span></i></b></div></blockquote><div style="text-align: justify;">However, saying that, there are commercial realities aside from the nebulous "economic conditions" excuse. One of these reasons could be the shift away from the previously readily available and mass-produced 14 Gbps GDDR6 that is used in both Xbox and PS5 consoles: most new products are shipping with 16, 18, and soon to be 21 and 23 Gbps GDDR6 modules.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">In all likelihood, sooner rather than later both companies will be purchasing the faster memory and downclocking it to match the listed specs of their consoles. For this very same reason AMD essentially updated their lineup with the 6X50 XT variants of cards, dropping the slower 16 Gbps spec on their memory in favour of the 18 Gbps spec.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><b><span style="color: #274e13;">The first improvement: increasing memory to the 18 Gbps spec.</span></b></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">This would bring a free performance boost to both consoles with the SX going from 560 GB/s to 720 GB/s maximum bandwidth and the PS5 increasing from 448 GB/s to 576 GB/s. Not too shabby!</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Unfortunately, the compromises made in the Series S are quite large. Five modules of 2 GB each cannot be doubled or optimised better than they already are because the four memory controllers on the APU die are already over-provisioned. You can use 18 Gbps GDDR6, as with the other consoles but the increase is mediocre at best and does not address the capacity issue, at worst: 224 GB/s increases to 288 GB/s on the main portion and the single GDDR6 module increases from 56 GB/s to 72 GB/s.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">This cannot save this heavily compromised design and so will likely never even be on the table for consideration... unless GDDR6W 4 GB modules enter the market soon... and cheaply: which is very unlikely!</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgi9TJXyDhjMrDSvXGvC6fPuF_t8g8qgIFaaw8nStjz13OeKJJ4g4_DrYDF3DC7s2GJiNKa3fK2q0n8XCQygrlvR6XwQXWl8uoREfq2Sg2AuCf8sWKKNx7myNl7l6mPy03V9zbjNwTR0-TUAKKSCaLZSN7q27JlZ2S1u3iYEUPWE9Q_xgNHPdw8252ytOY/s727/1c23d72c-2f8b-47d9-a14f-d5bdad4f3ad2.png" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="346" data-original-width="727" height="304" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgi9TJXyDhjMrDSvXGvC6fPuF_t8g8qgIFaaw8nStjz13OeKJJ4g4_DrYDF3DC7s2GJiNKa3fK2q0n8XCQygrlvR6XwQXWl8uoREfq2Sg2AuCf8sWKKNx7myNl7l6mPy03V9zbjNwTR0-TUAKKSCaLZSN7q27JlZ2S1u3iYEUPWE9Q_xgNHPdw8252ytOY/w640-h304/1c23d72c-2f8b-47d9-a14f-d5bdad4f3ad2.png" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>The Series S has too many compromises in its design that are impossible to work around...</b></td></tr></tbody></table><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The other obvious choice, for Microsoft, is ditching the asymmetric design of the memory layout on the Series X. No more two pools of memory - just use 2 GB modules and be done with it. The cost difference isn't that large and, if we're honest, 1 GB modules will be being phased out of mass production over the coming couple of years with the upcoming newer JEDEC specs and GDDR6W on the horizon, anyway...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><b><span style="color: #274e13;">The second improvement: Series X moving to a unified memory design - 20 GB memory across a 320 bit bus.</span></b></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">This would remove any sub-optimal aspects to the Series X and give the best opportunities for developers to use the system well.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">But what about the other parts of the consoles?</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj-s-lGd1kcb-ejPmJCkqHuJpTBZyewI3A6_mp4ik5nWZouMRAGspRAWSnNvCA5SijgFgxdy28tbZkteaeYcRUHeUIJ2_5kQ83ZTnGYJuZaHONrtwmiEoshqdU_Q7VAByekv1Fqkk_RwKF1C56fUUYciENV764rcBJIIKG6N5d8-dGY1czu3IkE2pz0YyM/s1920/Zen3_arch_19.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="1080" data-original-width="1920" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj-s-lGd1kcb-ejPmJCkqHuJpTBZyewI3A6_mp4ik5nWZouMRAGspRAWSnNvCA5SijgFgxdy28tbZkteaeYcRUHeUIJ2_5kQ83ZTnGYJuZaHONrtwmiEoshqdU_Q7VAByekv1Fqkk_RwKF1C56fUUYciENV764rcBJIIKG6N5d8-dGY1czu3IkE2pz0YyM/w640-h360/Zen3_arch_19.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Zen 2 doesn't have the most optimal L3 cache layout... and the APUs reduced it by 3/4.</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">The Almost-Rans...</span></h3><h4 style="text-align: justify;"><span style="color: #274e13;"><br /></span></h4><h4 style="text-align: justify;"><span style="color: #274e13;">V-cache...</span></h4><div style="text-align: justify;"><br /></div><div style="text-align: justify;">I'm sure people would love to see the application of 3D stacked L3 cache that we've observed in the desktop Zen 3 and Zen 4 lineups, especially given the paucity of the stuff on the console APUs*. Unfortunately, there are going to be at least three reasons why this will not be the case:</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><ol><li>Zen 2 (as far as we know) never had any hardware designs with this overlaid structure in mind - i.e. the Zen 2 design did not include TSVs (Through Silicon Vias) that enable power delivery and communication to the silicon wafer stacked atop the main die. <a href="https://www.tomshardware.com/news/amd-3d-vcache-in-development-for-years">Zen 3 had these design elements from the start</a>.</li><li>Stacked V-cache will cause cooling to become more of an issue. Loading up layers of silicon (especially active silicon) on top of each other reduces heat transfer and increases heat production on the chip. This could lead to lower clockspeeds or more expensive cooling solutions.</li><li>The cost of such an implementation would be quite large. The APUs in the consoles are significantly larger than the CCDs used in Zen 3. Not only is the packaging and layering itself an added expense, but if anything goes wrong with that manufacturing step, the loss of the much larger APU die is also a big cost to eat in terms of overall process efficiency...</li></ol></div><blockquote><div style="text-align: justify;"><b><i><span style="color: #274e13;">*Zen 3 and Zen 4 both use 32 MB of L3 cache shared across 8 CPU cores. However, the consoles use the old Zen 2 design split the cache into two, sharing 16 MB for 4 CPU cores (x2 on the 8 core designs as in the image above). However, mobile parts shaved this L3 cache down to only 4 MB per 4 cores in order to save on precious space...</span></i></b></div><div style="text-align: justify;"></div></blockquote><div style="text-align: justify;">So, IMO, V-cache is off of the table when it comes to potential console updates. But what about using an updated Zen core design?</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEijLGP6kL4fYTLkFbJfnwQAbNugrudI3647x1QBt0jVAUJU3i7A_ub3c6K4KE1KzgF92tN9QEmFtBSBL2yuzRVe-UEf5kCogdW67O9m6jydcFXvXK2uH5HsZjDZ1xl5QsjIcW7BolIFw-WNBbqXOZTwhV49vfOzq06sBBTjVPO_XFV-dmzNqoMPvGBsZ6Q/s1759/Chip%20layout.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="1440" data-original-width="1759" height="328" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEijLGP6kL4fYTLkFbJfnwQAbNugrudI3647x1QBt0jVAUJU3i7A_ub3c6K4KE1KzgF92tN9QEmFtBSBL2yuzRVe-UEf5kCogdW67O9m6jydcFXvXK2uH5HsZjDZ1xl5QsjIcW7BolIFw-WNBbqXOZTwhV49vfOzq06sBBTjVPO_XFV-dmzNqoMPvGBsZ6Q/w400-h328/Chip%20layout.jpg" width="400" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>RDNA 3 has been a huge letdown - whether that's the chiplet-based N31 or the monolithic N33...</b></td></tr></tbody></table><div style="text-align: justify;"><br /></div><h4 style="text-align: justify;"><span style="color: #274e13;">Zen 3 or 4... or RDNA 3...</span></h4><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Yeah, no... Seriously! </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The problem with this is that the consoles are not PCs. This idea of the "phone model" of console designs is just rubbish. Many commentators bring up the "end of console generations" time and time again. It's been <a href="https://hole-in-my-head.blogspot.com/2016/08/the-end-of-console-generations-or-just.html">literally eight years</a> we've been talking about this and there is ZERO evidence that this is going to happen!</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The reason is because console manufacturers have to write custom APIs and drivers to utilise the hardware in those devices. This isn't the situation where a phone can just use the same OS across different silicon? Oh, wait? That does happen, right? Well, no. Phones use layers of abstraction to "hide" the idiosyncracies of the hardware versions from the developers. Consoles do not do this - in fact, if anything, it's more akin to the opposite situation: developers of console games have to code to the idiosyncracies of the hardware to get the best out of it!</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">You see, the phone model relies on the fact that the applications are never really going to challenge the hardware in current or future iterations of the architecture. On the contrary, games* tend to be pretty demanding of the hardware they are run on! The end result is that the hardware manufacturers cannot (or do not want to) spend the money on providing very optimised abstraction layers for the game engines to hook into, and game developers would not want to support lots of different architectures within the same ecosystem**.</div><div style="text-align: justify;"><blockquote><b><i><span style="color: #274e13;">*I'm going to caveat this with a "this refers to graphically demanding and simulation-heavy titles... Like Cyberpunk and Starfield!"</span></i></b></blockquote></div><div style="text-align: justify;"><b><i><blockquote><span style="color: #274e13;">**It's hard enough with various performance targets!</span></blockquote></i></b></div><div style="text-align: justify;">All of this adds up to the fact that neither Sony nor Microsoft will want to incorporate fully-fleshed Zen 3 or Zen 4 architectures into their current generation of consoles. Nor would they wish to do the same on the graphics front with RDNA 3 (relative flop as it is!)...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">Optimisations...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">There is, however, some light at the end of the tunnel: Process node optimisations.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Both TSMC 7nm and 6nm process nodes are said to be "compatible", ostensibly meaning that there is minimal work to get the same or similar designs working across them. <a href="https://www.tomshardware.com/news/playstation-5-refresh-boasts-new-6nm-amd-oberon-plus-soc">We already have confirmation</a> that Sony moved the PS5 APU onto the more optimised node which has allowed them to reduce the cooling systems in the console by a considerable amount, in order to save money (despite them increasing the price - as noted earlier!).</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">However, there is something that is possible: increased clock speeds.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">It's no secret that both console vendors are limiting the frequencies of their APUs as much as possible in order to get the power and heat under control. However, <a href="https://www.techpowerup.com/gpu-specs/radeon-rx-7600.c4153">the RX 7600</a> has shown us that given the same specs <a href="https://www.techpowerup.com/gpu-specs/radeon-rx-6650-xt.c3898">to the prior generation card</a> (almost) can result in around a 7% increase in performance for 6% less power consumption. That's not nothing - especially for a theoretical Series X successor that can harness a properly implemented unified memory system.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">If you kept the same power consumption, you could push that increase in performance up by, potentially, the same amount (even without increasing clock frequency*) <i>and</i> grant the CPU extra performance headroom, to boot!</div><div style="text-align: justify;"><b><i><span style="color: #274e13;"><blockquote>*It seems to me, from my testing, that a lot of modern GPUs and CPUs are limited by power per unit time than they are frequency...</blockquote></span></i></b></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjlEW5exfp-xBASTBZaZo2cwiopLHa0LtAaWE6yUWyImDKY_scyXZND7mtmLJEF4ix6RYubh-Wjdun_u7ecF14Qw6CECF7XDhbSDO5xRJ_Gz3LI0eAKThjrfn5_cY3CGxMOET83jp7XxXhFm8jq620u0w9PGaiv98F7R3tonAEpdoSl6LnJphCwnFsx_Ak/s1920/81cff7c9-07b2-4c9b-aa76-5586b624fc33.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="660" data-original-width="1920" height="220" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjlEW5exfp-xBASTBZaZo2cwiopLHa0LtAaWE6yUWyImDKY_scyXZND7mtmLJEF4ix6RYubh-Wjdun_u7ecF14Qw6CECF7XDhbSDO5xRJ_Gz3LI0eAKThjrfn5_cY3CGxMOET83jp7XxXhFm8jq620u0w9PGaiv98F7R3tonAEpdoSl6LnJphCwnFsx_Ak/w640-h220/81cff7c9-07b2-4c9b-aa76-5586b624fc33.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>The Xbox One X had a pretty decent performance gain over the Xbox One S, but it didn't set the world on fire in terms of sales...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">Conclusion...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">These potential implementations would put us firmly in PS4 Pro and XBOX levels of performance increase over the base PS4 and XBOS console. The chief question there is whether Microsoft or Sony saw any real benefit from doing so <i>last</i> generation. Did the sales of the Pro and X offset the development costs - both on the hardware and software sides of things? My guess is, "no".</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">In that frame of reference, despite the fact that both console makers <i>could easily </i>do something to improve upon the base (main) console designs to provide equivalent mid-gen upgrades as they did last generation, the probability of them doing so seems slim, at best.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The phone model doesn't translate well to consoles. Sales do not correlate with sunk cost. The benefits of an increase in performance do not really work at this juncture when neither console is being pushed to its limits...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">As a result, I would be <i>shocked</i> if any such mid-gen refresh materialised!</div></div>Unknownnoreply@blogger.com2tag:blogger.com,1999:blog-7560610393342650347.post-76717874642771143362023-06-01T19:49:00.004+01:002023-06-01T19:49:53.554+01:00Mid-Range Hardware: RTX 4070 review (Part 2)<div style="text-align: left;"><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgOJ-bohBZvtdfET-cOVgbYNdruaJaKHDGpnJhSlF8nSCDGLF6Js3TSjkLvcl9bpWwLvkDYZG7qU_ogJ9g5pqEasWu-h3-loO6JW2-HlA2b8ia8xqfXvJfs0eM_ZNJKPLU9-j1gNUcAUh4-HnNz4tH_YbXrrOEftT2m4xTnQ0Mkj-mePJyh8HtuIeTp/s1920/Title%20part%202.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="1080" data-original-width="1920" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgOJ-bohBZvtdfET-cOVgbYNdruaJaKHDGpnJhSlF8nSCDGLF6Js3TSjkLvcl9bpWwLvkDYZG7qU_ogJ9g5pqEasWu-h3-loO6JW2-HlA2b8ia8xqfXvJfs0eM_ZNJKPLU9-j1gNUcAUh4-HnNz4tH_YbXrrOEftT2m4xTnQ0Mkj-mePJyh8HtuIeTp/w640-h360/Title%20part%202.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"></td></tr></tbody></table><br /><div style="text-align: justify;">Last time, <a href="https://hole-in-my-head.blogspot.com/2023/05/mid-range-hardware-rtx-4070-review.html">I looked at the relative performance</a> between the RTX 4070 and an RTX 3070 on an Intel-based system. This time, I've chucked these two cards into my AMD system and compared them with my RX 6800 to see the performance scaling on a mid-range system.</div><div style="text-align: justify;"><span><a name='more'></a></span></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">On Strong Foundations...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The system in question:</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><ul><li>Ryzen 5 5600X</li><li>32 GB DDR4 3200 CL18 Corsair Vengeance</li><li>MSI B450-A Pro Max</li><li>WD Black SN750 1TB</li><li>RTX 3070 Zotac Twin Edge OC</li><li>RTX 4070 MSI Ventus 2x</li><li>RX 6800 XFX Speedster SWFT 319</li></ul></div><div style="text-align: justify;">Just another quick explainer on the benchmarking. The average fps metric is self-explanatory. However, the minimum fps is calculated based on a moving average covering a period of approximately 1 second. This differs from how some review site outlets discuss percentile fps numbers, using a singular frametime value to turn it into an fps value. As I've discussed on Twitter <a href="https://twitter.com/Duoae/status/1637439784527228930">when raising this subject</a>, this is not a correct procedure as a singular frametime value is not an averaged experience and thus cannot become a "frames per second" number - it is taken entirely out of context.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">So, instead, I present the maximum frametime value as a pure number in milliseconds and also, alongside that, present the number of frametime excursions. This is a term I've made up to help describe the overall experience of the title being tested without having to present a million overlaid frametime graphs. It is drawn from the practice of process performance - a way of monitoring that your process is under control. In this instance, because I am testing with uncapped framerates instead of trying for a target, this value is set to be 3 standard deviations from the mean frametime value. Any time there is a frametime registered above that limit, it is counted; the higher the number of frametime excursions, the more stuttery the experience may be for the player.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">With that out of the way, let's move onto the testing!</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">Results...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">First thing's first. Even though I used a slightly lower power limit on the OC profile for this setup, a lot of the older titles' results are quite similar to those obtained on the Intel system. However, the keen-eyed among you will note the overall <i>very slightly</i> lower performance across the entire test suite. This is because <a href="https://docs.google.com/spreadsheets/d/1Vsk0DI3SMw9S8me8eO99vezdTYRC1cPhkE5BoJyz2Ts/edit?usp=sharing">the 12400 Intel CPU is a stronger part than the 5600X</a>* - and this allows the GPUs to be able to push out more frames, so aside from some instances where we are CPU limited, we're actually slightly even <i>more</i> limited than we were last time...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><blockquote><b><i><span style="color: #274e13;">*Yes, using DDR4 3200 will cause a slight decrease but my prior testing showed equal or <5 fps on average difference between DDR4 3800 (tuned) and the 4 stick setup we're using here.</span></i></b></blockquote></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjtmgxMqkhU68--CXsVCprA6foyDJ6RZ8UFR7-5ts65q8lD4ZTOgdo8EVUWtlm2wZe7uRTrEtG4JMPiVcAxiTJjRSBdNtlD0a50P9RcUzQ0bElJCaJzcySTg2GZZ_1fdMDbM4_9ZgWtX0LOMWK9t23LV2RBdSX2AhkDsHtjI8ZflDd3xzxIhPx8pbZb/s1408/Heaven_Superposition.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="351" data-original-width="1408" height="160" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjtmgxMqkhU68--CXsVCprA6foyDJ6RZ8UFR7-5ts65q8lD4ZTOgdo8EVUWtlm2wZe7uRTrEtG4JMPiVcAxiTJjRSBdNtlD0a50P9RcUzQ0bElJCaJzcySTg2GZZ_1fdMDbM4_9ZgWtX0LOMWK9t23LV2RBdSX2AhkDsHtjI8ZflDd3xzxIhPx8pbZb/w640-h160/Heaven_Superposition.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"></td></tr></tbody></table><br /><div style="text-align: justify;">Unigine Heaven actually shows the opposite trend to the rest of the results (discounting game engines that are biased towards AMD CPUs/GPUs :) ) where the Ryzen setup performs on par or slightly better at stock. This is roundly reversed in Superposition, though.</div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj9qXm6QlQJGHbFEAD-NzjSG3zghUQF2HW5rxirD4vNzlJ1SeVaEmoLc1Uh4kuL2wK5d55lEivrmxDBu5sJk6hc3vIF0Jhdn02bJrvhY92aUTadvc5Qo1pb9y8jSKLTQmGfR_HuyKkRNc6ysGJEl0bXXmteGG2bANEANb0Q55cHjxNnMICksFKbiviz/s1412/Valhalla_Timespy.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="349" data-original-width="1412" height="158" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj9qXm6QlQJGHbFEAD-NzjSG3zghUQF2HW5rxirD4vNzlJ1SeVaEmoLc1Uh4kuL2wK5d55lEivrmxDBu5sJk6hc3vIF0Jhdn02bJrvhY92aUTadvc5Qo1pb9y8jSKLTQmGfR_HuyKkRNc6ysGJEl0bXXmteGG2bANEANb0Q55cHjxNnMICksFKbiviz/w640-h158/Valhalla_Timespy.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"></td></tr></tbody></table><div style="text-align: justify;">For Assassin's Creed Valhalla, there's essentially zero difference between the Nvidia card results on both AMD and Intel systems. In Timespy, we see the difference in CPU performance (yes, it's the graphics test but the graphics card still requires the CPU to feed frames to it!). What <i>was</i> interesting here is that I <i>could not get</i> the RX 6800 stable in the graphics 2 test under any circumstances. Even if I kept the core and memory frequency at stock with a -5% power limit, the test would crash every time. So, this was a bit of a fail...</div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgwun-hmoPdWI8cuwD9teTrWxn-ZFVAtAKGPTxvEnx14ue1HTG488DExbGkUF8_xoONqZBR88Yvgsuqser0yeWoJ7B3DQb96qVAub5a07G_UryWqwLgqlrq9ZwJmmaA5Jo32sPD2xf1l1Qe1Gddq-2Oz2FWFE8NyuJ9QZ9IJ030Ib36PQuaJ4EbVaNW/s1410/Arkham_Metro.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="350" data-original-width="1410" height="158" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgwun-hmoPdWI8cuwD9teTrWxn-ZFVAtAKGPTxvEnx14ue1HTG488DExbGkUF8_xoONqZBR88Yvgsuqser0yeWoJ7B3DQb96qVAub5a07G_UryWqwLgqlrq9ZwJmmaA5Jo32sPD2xf1l1Qe1Gddq-2Oz2FWFE8NyuJ9QZ9IJ030Ib36PQuaJ4EbVaNW/w640-h158/Arkham_Metro.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>In hindsight, I shouldn't have tested with these settings on Arkham Knight...</b></td></tr></tbody></table><br /><div style="text-align: justify;">Since AMD cards cannot use the Nvidia gameworks technologies, the performance is higher on that part because the game is less demanding to run! D'oh! Otherwise, the results are consistent between the platforms. Metro Exodus, on the other hand, appears to show a slight decrease on the AMD system. I would theorise that this is a result of the lower RAM speed in this RT-based title.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Moving onto my manual benchmarks things begin to get a bit more interesting:</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhfTiGBDcZIXe6uDPdVQaf9IdCgA1HVa1r9h3NEKShXGX7sUwjODQbvPi2A-1kk1XtMZCOVBIOUMqjSCYI_lK5bC8MDLTzokJmBIgk6GqnorWoB5jyy0fcpTKQW9qtXgrIwc4gvzYDEqWkjgNlRXw3F3qB8CK3GCXsa8Eo2O6MbRKhI20ZpS6cZZ_pb/s1401/Hogwarts_AMD.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="349" data-original-width="1401" height="160" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhfTiGBDcZIXe6uDPdVQaf9IdCgA1HVa1r9h3NEKShXGX7sUwjODQbvPi2A-1kk1XtMZCOVBIOUMqjSCYI_lK5bC8MDLTzokJmBIgk6GqnorWoB5jyy0fcpTKQW9qtXgrIwc4gvzYDEqWkjgNlRXw3F3qB8CK3GCXsa8Eo2O6MbRKhI20ZpS6cZZ_pb/w640-h160/Hogwarts_AMD.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>I tested this result four times...</b></td></tr></tbody></table><br /><div style="text-align: justify;">Hogwart's displays the difference between the performance of the 12400 and the 5600X - with higher GPU utilisation across the board corresponding to higher fps output. Surprisingly, the AMD GPU is highly utilised and has a better minimum <u style="font-style: italic;">and</u> average fps than either Nvidia GPU which I am thinking is related to the dreaded Nvidia driver overhead. Otherwise, the CPU limitation of this title is <i>worse</i> using the 5600X with both 30 and 4070 cards than it was on the 12400 - where the 4070 had better performance than the 3070 and managed to only reach a real limitation when the 4070 was OC'ed.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The number of frametime spikes were about the same across the two systems and essentially correlate to the 3070 not having enough VRAM*, whereas the magnitude of those spikes is also generally slightly worse on the AMD system, too. I'm not <i>quite</i> sure why the 3070 has such a drop in reported power usage from just a 5% difference in the OC between the two systems but I think it is likely due to the CPU limitation reducing the fps by around 10, on average.</div><blockquote><div style="text-align: justify;"><b><span style="color: #274e13;"><i>*Yes, I know it's a meme, now. However, this is a real effect in this (and other titles). The 3070 performs worse in frametime metrics across both systems...</i></span></b></div></blockquote><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjE09uTNH1PS6OPw1gpvl__qdhBRgrf7K6-SBZzZxaSqcFZdl82PNfcZ0jYKHI37bZaUfUo3XPNzhTeQdXf9XMyxayWuK3WX3Wh36N0RDOb3dl8oBWhQ3EdFcGpkAsZEzMI4bkAuVAPkSs4s95h1JIQwoF9Q1UPAI1BuewZ3XLh3A9sJTG576ErL_uw/s1399/Spider-man%20AMD.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="349" data-original-width="1399" height="160" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjE09uTNH1PS6OPw1gpvl__qdhBRgrf7K6-SBZzZxaSqcFZdl82PNfcZ0jYKHI37bZaUfUo3XPNzhTeQdXf9XMyxayWuK3WX3Wh36N0RDOb3dl8oBWhQ3EdFcGpkAsZEzMI4bkAuVAPkSs4s95h1JIQwoF9Q1UPAI1BuewZ3XLh3A9sJTG576ErL_uw/w640-h160/Spider-man%20AMD.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"></td></tr></tbody></table><br /></div><div style="text-align: left;">These test results show that Spider-man is a really well-optimised title. Yes, we observe generally lower performance than on the Intel CPU due to the aforementioned difference in performance but the RX 6800 performs wonderfully in this title with generally higher minimum and average fps compared to the Nvidia parts (aside from the overclocked 4070).<br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;">While the GPU utilisation is at a high on the RX 6800, power consimption is almost as low as the RTX 4070 (which is incredibly impressive!) and whatever optimisations performed by Nixxes to port this title to the PC have resulted in generally low maximum frametime spike values, and consistent frame presentation, as evidenced by the low number of excursions during the test! Interestingly, these excursions are lower on the AMD system than on the Intel system. I don't have an explanation for this other than (potentially) the fact that I have four sticks of RAM on the AMD system and it's allowing better system memory access.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiXUeF7RkXOG4--v6ugX01cyvUWjspIgTZs0vW7TjX0bas7xbAq9Fk51pFyaPCJ-mYaLjPwKFdMq-cZ_E9Jo_lAHa4H0std9BoP9LHoN-lRQmJop-0jUwNm06K1lSd1_phaNhAJaBdhJOrPtqlqSxnJsU7b_8MoGS2xm1D_AE-xoiKsQSv-zwEzbBXc/s1401/TLOU_AMD.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="351" data-original-width="1401" height="160" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiXUeF7RkXOG4--v6ugX01cyvUWjspIgTZs0vW7TjX0bas7xbAq9Fk51pFyaPCJ-mYaLjPwKFdMq-cZ_E9Jo_lAHa4H0std9BoP9LHoN-lRQmJop-0jUwNm06K1lSd1_phaNhAJaBdhJOrPtqlqSxnJsU7b_8MoGS2xm1D_AE-xoiKsQSv-zwEzbBXc/w640-h160/TLOU_AMD.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"></td></tr></tbody></table><br /><div style="text-align: justify;">Ironically, not the last test - The Last of Us continues to show us the disparity between the performance of the two CPUs, with AMD's results falling slightly behind. However, once again, this is a title where the AMD GPU is performing better than either Nvidia card and at comparable power consumption to that of the 4070 (which we know <a href="https://hole-in-my-head.blogspot.com/2023/05/literally-again-exactly-happens.html">is a very efficient part</a>). </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">It's also interesting to note that the maximum frametime spikes are <i>worse</i> on the Nvidia cards than compared to their performance on the 12400 but that AMD's 6800 really provides a better experience. This appears to be a title with optimisations based around the RDNA architecture's design. Additionally, in case you missed it in the chart above, the stability of the presentation is much greater on the 6800 with less than half the frametime excursions.</div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjiaZS_fsH7TGzxM9wG6aPWJfGcBiRog2daGDtMLUYAv8i0jJDO62lIdg_lflCYg4oTS8LLPIdkbERTmF6n3NRgE4s_zFGA9RgWCEk42MohwsU_eGiJW0B-_DdX0n-d22ClUBpfQXjM_Htqmy39WHgV-5T8j4ZyZsAm6yZ83ALW4wtOSWy0VwFhWyTb/s1174/TLOU%20frametimes.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="313" data-original-width="1174" height="170" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjiaZS_fsH7TGzxM9wG6aPWJfGcBiRog2daGDtMLUYAv8i0jJDO62lIdg_lflCYg4oTS8LLPIdkbERTmF6n3NRgE4s_zFGA9RgWCEk42MohwsU_eGiJW0B-_DdX0n-d22ClUBpfQXjM_Htqmy39WHgV-5T8j4ZyZsAm6yZ83ALW4wtOSWy0VwFhWyTb/w640-h170/TLOU%20frametimes.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"></td></tr></tbody></table><br /><div style="text-align: justify;">This can easily be shown in the frametime plot over the course of the benchmark - the RX 6800 essentially decimates the (usually more potent) RTX 4070 across the entire benchmark. It's not perfect, of course, this is a title that is suffering from relatively poor optimisation on PC. However, users with AMD RDNA GPUs will likely not have noticed as many problems as those using Nvidia's products...</div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi9DkyrrB8ChNMu5wa4lbRbTk2L6edv5Oz5H7584HmaL7dSO_MlYVPVOrnUG4g1DFQab5cXW55TMww7CmcD9OHGpf1vIqQWncEm1e6N0Itr3bgRSqvyjXCfML99YjNR4uhNquLYTsasuGgN81RXvyIimaLTyV4E2X0muP-j_btYELNeD2SjauhqwXWC/s1402/Plague%20Tale_AMD.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="349" data-original-width="1402" height="160" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi9DkyrrB8ChNMu5wa4lbRbTk2L6edv5Oz5H7584HmaL7dSO_MlYVPVOrnUG4g1DFQab5cXW55TMww7CmcD9OHGpf1vIqQWncEm1e6N0Itr3bgRSqvyjXCfML99YjNR4uhNquLYTsasuGgN81RXvyIimaLTyV4E2X0muP-j_btYELNeD2SjauhqwXWC/w640-h160/Plague%20Tale_AMD.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"></td></tr></tbody></table><div style="text-align: justify;"><br /></div></div><div style="text-align: justify;">Moving onto A Plague Tale Requiem, the CPU limitation on the R5 5600X is more apparent than ever - with the 4070 performing essentially the same as the 3070 and the RX 6800. What is once again impressive, is the power usage of the 6800 - matching and slightly beating the 4070 for a similar performance*.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Additionally, once again, the frametime excursions and max frametime spikes are incredibly good on this title - Asobo studio have crafted an amazing engine that is able to moderate the experience to an extent that most other studios cannot. While I may not be <i>as</i> in love with this title as many other commentators appear to be (the hair strand rendering on the characters really puts me off!) this game is very impressive!</div><div style="text-align: left;"><b><i><blockquote><span style="color: #274e13;">*Of course, with a stronger CPU, this would most likely not be the case...</span></blockquote></i></b><br /><h3 style="text-align: justify;"><span style="color: #274e13;">On frequency...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">You may have noted that I made quite a few mentions of the CPU bottleneck experienced in this mid-range review and I am happy to report that <a href="https://twitter.com/davidgburns">DGBurns from over on Twitter</a> has deigned to aid me in some CPU to CPU comparisons on the AMD side. He also has an RTX 4070 but his CPU is an R9 5950X. Since the 4070 is very tightly controlled at stock settings, the card to card variation should be absolutely minimal, meaning that this CPU should be able to show us the effect of having a faster clocked Zen 3 CPU than my 5600X.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi56rJvjZofuubqzJk7_hTQK4RZ-aswhx98xQNm1hBIRf_8bwtKdsS78LPDLxhjNZVJAaE_2Davo78XNsZ1hvKg8_Vri9Ttl8K3y6EyJ2xQuEORRZcZoL_UDq57rM4d_HVrKZFCQdpP-LflvpMfXqa25ePNXnelr9wtCol8nPmOwPgdWv-_-lNvpnhu/s1408/5950X_Heaven_Superposition.PNG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="352" data-original-width="1408" height="160" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi56rJvjZofuubqzJk7_hTQK4RZ-aswhx98xQNm1hBIRf_8bwtKdsS78LPDLxhjNZVJAaE_2Davo78XNsZ1hvKg8_Vri9Ttl8K3y6EyJ2xQuEORRZcZoL_UDq57rM4d_HVrKZFCQdpP-LflvpMfXqa25ePNXnelr9wtCol8nPmOwPgdWv-_-lNvpnhu/w640-h160/5950X_Heaven_Superposition.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"></td></tr></tbody></table><br />Heaven shows that we are essentially identical in performance. However, (and you can't see this from the graph above) I can see that his CPU is only 6% utilised. Meaning that only one logical CPU thread is being used. This means that the Heaven benchmark is entirely GPU-bound in modern scenarios (the 10 point difference is basically a run-to-run variation).</div><div style="text-align: left;"><br /></div><div style="text-align: left;">The more modern Superposition has a higher CPU load, in comparison. That ~100 pt difference is most likely indicative of the difference in clock speed between the 5950X (~4.9 GHz) and the 5600X (~4.2 GHz).<div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgsWhcgln5lujcDV53u0E-clTxTUy3wgI26Ss2kz94Yt2F90Pj7wlA47sjcZG-dE5bBkVAVX7L7Vuw0ImvCvxO_MrO6hzhQpBB8lrbJpKfrksTXzEuBlqVNWJP3e8Sf0uM1hr-1SWMmzZT63fjycW6JNMS4zDmdF7Qk0-DgC3SbxIeAND9o6hTEVL-S/s1414/5950X_Knight_metro.PNG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="352" data-original-width="1414" height="160" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgsWhcgln5lujcDV53u0E-clTxTUy3wgI26Ss2kz94Yt2F90Pj7wlA47sjcZG-dE5bBkVAVX7L7Vuw0ImvCvxO_MrO6hzhQpBB8lrbJpKfrksTXzEuBlqVNWJP3e8Sf0uM1hr-1SWMmzZT63fjycW6JNMS4zDmdF7Qk0-DgC3SbxIeAND9o6hTEVL-S/w640-h160/5950X_Knight_metro.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"></td></tr></tbody></table><br /></div><div style="text-align: left;">Moving on to Arkham Knight, we can see that result repeated with the average fps increasing by around 10 fps and the minimums slightly less so... Metro Exodus Enhanced Edition shows us that there is a similar, slight uplift when increasing processor frequency and core count.</div><div style="text-align: left;"><br /></div><div style="text-align: left;"><br /></div><div style="text-align: left;"><h3 style="text-align: left;"><span style="color: #274e13;">Conclusion...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Like last time, we find ourselves concluding that in the mid-range, we are potentially more CPU-bound in modern game titles than most might realise. However, it is not purely a case of simply upgrading your CPU to a higher-end part. CPU architecture is more important than clock speed, these days. The difference between a low-to-mid-range part is not that great - even with RAM tuning. Sure, you might manage to also overclock your CPU, too... but that's never a given! </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">From what I'm seeing, on the AMD side of the equation - a v-cache part is most likely to guarantee the best performance for the price. In our testing, the 5950X was not <i>that</i> much faster than the 5600X, despite clocking around 500 MHz faster on average.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Similarly, the 12400 gave slightly less performance than the 5950X - showing that architecture's superiority in performance - even at a lower cost.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">On the other side of things, the RX 6800 is a beast of a GPU for a mid-range PC; giving performance between a 3070 and 4070, sometimes equalling a 4070 and sometimes beating it. The fact that it easily matches the 4070 on power usage is very impressive. Unfortunately, there are some issues with software and hardware that can crop up on the RX 6000 cards. Yes, these are much better than for the 5000 series but I have experienced several software problems with Radeon Adrenaline that I could only solve through user modification of the powerplay tables using MorePowerTool. I have also, unfortunately, experienced failures of all three of my DisplayPorts on the card for reasons that I cannot understand*.</div><div style="text-align: justify;"><b><i><span style="color: #274e13;"></span><blockquote><span style="color: #274e13;">*I thought it may have been a faulty DP cable until I remembered that I was using an HDMI cable with a DP adapter...</span></blockquote></i></b></div><div style="text-align: justify;">The other issue I have with cards like the 6800 is the size! I cannot fit this card into my SFF PC - it just won't go! On the other side of the coin, the 4070 can be easily purchased in two-slot, two-fan configurations that fit into any build, along with a single 8-pin power connection. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">It really is a shame that the RTX 3070 wasn't released with 12+ GB of VRAM because the card can be seen to be right on the cusp of consistently good performance when using very high or maximum settings at 1080p. As a result, both the 6800 and 4070 are sort of overkill for a low-to-mid-range PC setup. However, their VRAM quantity is adequately matched for gameplay at 1440p.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Personally, the cheaper RX 6000 cards are the way to go - as long as you are okay with some potential troubleshooting and other issues. The real issue, right now, is that modern games are CPU limited and there's not much any mid-range or low-end system owners can do about it - it's not a graphics bottleneck for the most part...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Anyway, I hope you found this review series helpful. I hope to cover more games in the future with a keener eye on framerate performance targets as I did in the <a href="https://hole-in-my-head.blogspot.com/2023/03/analyse-this-technical-performance-of.html">Hogwart's analysis</a>. Until next time!</div></div>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-7560610393342650347.post-72155828714028087322023-05-17T20:35:00.007+01:002023-05-17T21:01:07.803+01:00The Power Scaling of the RTX 4070<div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgyfrRsYlMHmj3Y63RNJKqrFnKb83UoLaz4oeiCk-wLuHygvGJizqh-97VypZOG8nBtFbHYMlP9Hizn3EDuSOIlAP2BaduydztnhtFPSJNDvZiC4E-9xx2Xz_MBi_PRnNLGWoor4t0ni7hQehJBsaunZu3RHjlbgzT-vvteGtFW7wj7wy4ixe_uJK1x/s1920/Title.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="1080" data-original-width="1920" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgyfrRsYlMHmj3Y63RNJKqrFnKb83UoLaz4oeiCk-wLuHygvGJizqh-97VypZOG8nBtFbHYMlP9Hizn3EDuSOIlAP2BaduydztnhtFPSJNDvZiC4E-9xx2Xz_MBi_PRnNLGWoor4t0ni7hQehJBsaunZu3RHjlbgzT-vvteGtFW7wj7wy4ixe_uJK1x/w640-h360/Title.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Literally, again, <i>exactly</i> what happens...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><a href="https://hole-in-my-head.blogspot.com/2022/06/the-power-curve-of-rtx-3070-and-ampere.html">As is</a> <a href="https://hole-in-my-head.blogspot.com/2023/02/the-power-curve-of-rx-6800-and.html">my wont</a>, I have decided to poke and prod at any and all hardware within my nefarious reach. Since I've picked up the RTX 4070, why should this particular product be spared? Because it's small and cute? </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">"NO!", says the wise man. "They should be subject to the woes and wiles of mortals as much as any other product!"</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">And so the story goes, again and again... Join me within these pages* where I will outline the limitations of the new sacrificial lamb.</div><blockquote><div style="text-align: justify;"><b><i><span style="color: #274e13;">*Technically, no pages are present given the format of this blog...<span><a name='more'></a></span></span></i></b></div></blockquote><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">A Game of Two Halves...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The RTX 4070 is a surprisingly complicated release. As I noted in <a href="https://hole-in-my-head.blogspot.com/2023/05/mid-range-hardware-rtx-4070-review.html">the prior blogpost</a>, The range in power delivery of this card is quite large, with the cheapest cards pulling a hard limit of 200 W and the higher end models drawing up to ~240 W. The problem here is that extra 40+ Watts only grants a 6.6% performance bonus (<a href="https://www.techpowerup.com/review/msi-geforce-rtx-4070-gaming-x-trio/41.html">according to TechPowerUp</a>, when using one of the strongest systems available in a synthetic benchmark).</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">In a rather unexpected turn of events, I'm going to start out with the summary of my findings, rather than the data:</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">My initial assessment was that the card is, in general, extremely power limited but I am questioning the logic of this. Aside from my own testing confirming this, <a href="https://www.techpowerup.com/review/msi-geforce-rtx-4070-gaming-x-trio/37.html">TechPowerUp's reviews</a> show a card that runs at a <a href="https://www.techpowerup.com/review/asus-geforce-rtx-4070-dual/37.html">consistently high frequency</a>, without thermal throttling. All differences (and there are a couple of fps <i>average</i> differences between the highest and lowest end models are really based on sustained core frequency differences, rather than any other limit such as data or power. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">As such, my current assessment is that the RTX 4070 is essentially a card pushed to it's almost absolute limit: there is virtually no headroom to be had, regardless of how much power you are able to push to it. The reason for this is the hard voltage limit on the core: you just can't push it at all! In that vein, we can see that the upper limit of core frequency is limited by the core voltage stability.. which I have found is <i>extremely tight</i> around the verified core target.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiSkgJKTdnQeZrP-GQWFAp9XIKTjnxHIkrW_mo0XjprJq2tJCXwGj623DsUg_o7uBjL2QupjH_0oLV2Yi06KNaEzgPJLX4wR6DjViSc6b65ge4YoFpsuaj2-zq0ssmQOL1e5P-B0H1TjZyyi9-HhzPNDG9VeSnPDX-EkT6l27t-Hp9gE7EOmDXQTAZv/s706/Metro%20Power%20Scaling_numbers_3070.png" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="284" data-original-width="706" height="257" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiSkgJKTdnQeZrP-GQWFAp9XIKTjnxHIkrW_mo0XjprJq2tJCXwGj623DsUg_o7uBjL2QupjH_0oLV2Yi06KNaEzgPJLX4wR6DjViSc6b65ge4YoFpsuaj2-zq0ssmQOL1e5P-B0H1TjZyyi9-HhzPNDG9VeSnPDX-EkT6l27t-Hp9gE7EOmDXQTAZv/w640-h257/Metro%20Power%20Scaling_numbers_3070.png" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>The RTX 3070 allowed decent variation of voltage settings when playing with the card...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Whereas, with the 30 series releases, you were able to drop core voltage by a fair margin, maybe 100 mV, and maintain target core frequencies without issue.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Perhaps the most surprising aspect, for me was the similarities between the behaviour of the RTX 4070 and the RX 6800 - okay, these are both chips that are manufactured on TSMC processes, but the scaling behaviour was more similar between the two than not: the 4070 would try and reach the maximum set core frequency, regardless of ability to do so (causing crashing) as the RX 6800 did. In comparison, the 3070 was easier to manage because it would automatically throttle core frequency instead of pushing it to a point of instability. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Maybe this observation is a coincidence...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The end conclusion, from my part, appears to be that the RTX 4070 is an incredibly finely balanced part - neither power limited or memory bandwidth limited. The only potential limitation is the core voltage and, as I said, this appears to be either a limitation on the part of the cards... or of TSMC's process node.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">So, with that out of the way, let's get to the data...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">Power Scaling tests...</span></h3><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjypRCMGsX6IX3MsqMY-RaRlOZmVQ0Fbcdw_Su92RnDcweHBtlF2jFekeBwoJseieLMc0hpTFVTok5n0x5WgJA0N8AKtRU8fDDEYDOW6DQzXBFIerXFKjCdurrGS_SPETC-36UJXsC1OvSEtJEce7Sppdcc6mD9EN0F4cpozz2s35bco2SNwEpQrDpF/s749/Metro%20Power%20Scaling_numbers_3070.png" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="319" data-original-width="749" height="273" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjypRCMGsX6IX3MsqMY-RaRlOZmVQ0Fbcdw_Su92RnDcweHBtlF2jFekeBwoJseieLMc0hpTFVTok5n0x5WgJA0N8AKtRU8fDDEYDOW6DQzXBFIerXFKjCdurrGS_SPETC-36UJXsC1OvSEtJEce7Sppdcc6mD9EN0F4cpozz2s35bco2SNwEpQrDpF/w640-h273/Metro%20Power%20Scaling_numbers_3070.png" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Metro Exodus: Enhanced Edition tasks the entire silicon, and I've found it was an excellent stability predictor during <a href="https://hole-in-my-head.blogspot.com/2023/02/the-power-curve-of-rx-6800-and.html">my RX 6800 scaling tests</a>...</b></td></tr></tbody></table><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Going back to the mainstay of Metro Exodus: Enhanced Edition, I found that the RTX 4070 was able to clock around 180 Hz higher than at stock settings ~3000 MHz core frequency. Additionally, I was able to increase the memory throughput from 21 Gbps to 23 Gbps without performance regression or insertion of instability. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Unfortunately, all of this effort was essentially for naught: a measly 4% gain from the stock settings in Metro Exodus. Additionally, I found (as stated above), that the core was not reliably stable when messing around with the voltage. Playing around with the RTX 3070 also gave me around a 6% increase in performance with a 10% power boost from stock (with core and memory frequency, and voltage optimisations) in Unigine Superposition... but the RTX 4070 had no further bonus to add to its 2% gain because of the lack of ability to add more power to the silicon. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The end result is that both parts are better suited to undervolting/underclocking/power-limiting than trying to push past their stock performance.</div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiwgVs_VjtWMLeipv-TpVKKKIi8R8a3_tYf2UQTW2lNzi0w-M9uPq64Cyp46uORtJeOywwTwY-63cyytMM3RCmkVm7oxkqFiEugjoFPyS54y13tSW1ieTyOUaSRTquFnORfxnTraTzL_DXdKvGH96eT8nxZbz3SbmPJc6RvrwJyoevyEsn3wHKv4LUK/s721/Metro%20Power%20Scaling_numbers.png" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="325" data-original-width="721" height="288" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiwgVs_VjtWMLeipv-TpVKKKIi8R8a3_tYf2UQTW2lNzi0w-M9uPq64Cyp46uORtJeOywwTwY-63cyytMM3RCmkVm7oxkqFiEugjoFPyS54y13tSW1ieTyOUaSRTquFnORfxnTraTzL_DXdKvGH96eT8nxZbz3SbmPJc6RvrwJyoevyEsn3wHKv4LUK/w640-h288/Metro%20Power%20Scaling_numbers.png" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>At 80% power limit, with the overclock, we're getting near stock performance...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi8efm2v8QmlTcAqdZq0sUrNDy_LMmEcF5HPEfi5bWOI0IteawjyNWzdrzaUw5prw8BEwQUrOKHmif2tegyqh1d3puX3qh7dDz0nBAcO3MmP4U1pTP4_uDmpH-LgLz7xrawiTwS5f37DnTtmeqPooQ5b_nAoruutflBc9qhreAnSUCwnfexHGJ4iQcB/s971/Metro%20Power%20Scaling.png" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="562" data-original-width="971" height="370" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi8efm2v8QmlTcAqdZq0sUrNDy_LMmEcF5HPEfi5bWOI0IteawjyNWzdrzaUw5prw8BEwQUrOKHmif2tegyqh1d3puX3qh7dDz0nBAcO3MmP4U1pTP4_uDmpH-LgLz7xrawiTwS5f37DnTtmeqPooQ5b_nAoruutflBc9qhreAnSUCwnfexHGJ4iQcB/w640-h370/Metro%20Power%20Scaling.png" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>While unable to go past 100% power limit, it is clear that with a slight core and memory frequency overclock that the RTX 4070 is able to maintain stock performance at 80% the power!</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;">One important item to note is that the RTX 4070's performance drops off at a faster rate with lower power limits, compared to the RTX 3070. The latter card is able to hold 10% more performance at 1080p and a 50% power limt when compared to the RTX 4070.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">As for the RTX 3070, I retested that power curve with the knowledge I've gained from the RX 6800 and RTX 4070 and it appears that the card has had around a 2% performance increase, at stock (and also when overclocked) purely through driver updates since my original testing.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Conversely, Unigine Superposition shows that the RTX 4070 scales less well when not all of the silicon is put to the test - despite it being a ray tracing benchmark, the test is software-only, meaning that the advantages of the hardware are not put into question. The result of this can be observed in the flat profile that the scaling of this test generates.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiIWJU_n-MuUsi0N3GNL8fYNYJiak1XXK2ornAdKL-SsoZ1JzhQ0f1wzpYwOfnlL4JUyQBngogmnLhEj2xQYrwTZGE_3u19pz0cwfd-eY3iM23Fp1glhMS9Lxd1GHqzmGwRV5wv2SKQSDKb4BoUhBfbRZEY5NhsywEb4NcbCeJ0mg_VNWk4Vj3d5xCS/s750/Superposition%20Power%20Scaling_numbers.png" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="326" data-original-width="750" height="278" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiIWJU_n-MuUsi0N3GNL8fYNYJiak1XXK2ornAdKL-SsoZ1JzhQ0f1wzpYwOfnlL4JUyQBngogmnLhEj2xQYrwTZGE_3u19pz0cwfd-eY3iM23Fp1glhMS9Lxd1GHqzmGwRV5wv2SKQSDKb4BoUhBfbRZEY5NhsywEb4NcbCeJ0mg_VNWk4Vj3d5xCS/w640-h278/Superposition%20Power%20Scaling_numbers.png" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Superposition shows the same trend in power/performance/overclocking...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhN_TdDm4HakjKAKZItag7hUL05cR-EAwzYcfbfquLhVD17hzzQMqRecwy5XNXKB9YtPS1cEIZRhuufBBCln3hx4GxodJHf_Kv9LUnwTl-XYnUNa_RPN_mPPGTiAmX2A3iwWoBoTG0IxhzVOH8jDhBi-VB63E4TPy7sP-YvCv6qxo5CZTmqNg9ZwxO2/s976/Superposition%20Power%20Scaling.png" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="562" data-original-width="976" height="369" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhN_TdDm4HakjKAKZItag7hUL05cR-EAwzYcfbfquLhVD17hzzQMqRecwy5XNXKB9YtPS1cEIZRhuufBBCln3hx4GxodJHf_Kv9LUnwTl-XYnUNa_RPN_mPPGTiAmX2A3iwWoBoTG0IxhzVOH8jDhBi-VB63E4TPy7sP-YvCv6qxo5CZTmqNg9ZwxO2/w640-h369/Superposition%20Power%20Scaling.png" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>While more linear in power usage w.r.t. performance, Superposition scaling shows the limitations of a pseudo synthetic test...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">Conclusion...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">This is a rather short and sweet post. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The RTX 4070 doesn't have a lot of upward mobility within its sheathed design. However, it is a very power efficient part that is readily able to reduce power usage by 20%, whilst maintining the same performance. This is great from a user perspective, especially in the current period where electricity prices are at a premium in certain parts of the world.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Despite similar performance scaling curves to the RTX 3070, the RTX 4070 shows us that it is capable of a 28% performance increase with 20 W less power usage, at stock and when optimised. While the 30 series part is clearly limited from pushing forward by lack of available power, the 40 series part has no clear road to gaining more performance, other than new voltage regulators on the circuitboard - something that is not so simple to pull off.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">What is clear, though, is that the 3070 has been <i>very slightly</i> improved by the Nvidia drivers since launch - meaning that any disparities between the launch performance of that card and the launch version of the 4070 have been <i>very slightly</i> closed. As such, I fully expect the performance of the RTX 4070 to extend beyond the current 28% limit at 1080p to around 30 - 32%, over time.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Yes, this isn't an exciting result but I am certain the performance figures will be improved on systems with more powerful processors that can better feed the GPU hardware... However, as it stands, in the mid-range, users are looking at a decent, if unexciting uplift in real-world computational performance.</div>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-7560610393342650347.post-63045954832995719902023-05-06T13:03:00.005+01:002023-06-24T12:29:26.392+01:00Mid-Range Hardware: RTX 4070 review<div style="text-align: left;"><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgs8nIU6KZtqTzarwgnTIZX8Re0Js6TGM5iUiWyuzd24GxMHYha6snBUj0bE1PLLAS3gYC-zQz0BeMW66853gGcgNpfHTNbJPvp020Ui3Ji_OERR_6D8grdjZkw8PyTOzLFW1jPYW7QGjNYL91mk8RY_J9MeGe6dz6gQzHUq4TdGWMeeaa1IyUQzjq7/s1920/Title.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="1080" data-original-width="1920" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgs8nIU6KZtqTzarwgnTIZX8Re0Js6TGM5iUiWyuzd24GxMHYha6snBUj0bE1PLLAS3gYC-zQz0BeMW66853gGcgNpfHTNbJPvp020Ui3Ji_OERR_6D8grdjZkw8PyTOzLFW1jPYW7QGjNYL91mk8RY_J9MeGe6dz6gQzHUq4TdGWMeeaa1IyUQzjq7/w640-h360/Title.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The RTX 4070 is an interesting card. $100 more expensive than the 3070 on paper but cheaper than most people were probably able to purchase their 30 series card during the period of late 2020 through to just before the release of the 40 series in 2023.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Certainly, I paid slightly less than MSRP on the base model and that was around €50 cheaper than I paid for my 3070 when, in desperation, I just bought the first card I could get my hands on because I didn't have a graphics card.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">However, the 40 series cards are almost literally languishing on store shelves due to their high prices and lacklustre performance increases over their 30 series counterparts, with the notable exception of the RTX 4090.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">With that in mind, I want to focus on that step forward in performance and look at how the RTX 4070 improves on the 3070 on a mid-range system.<span><a name='more'></a></span></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">Test setup and Introduction...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The system in question:</div><div style="text-align: justify;"><ul><li>Intel i5-12400</li><li>16GB DDR4 3800 CL16 Patriot Viper 4400</li><li>Aorus B660i</li><li>WD Blue 570 1 TB</li><li>RTX 3070 Zotac Twin Edge OC</li><li>RTX 4070 MSI Ventus 2x</li></ul></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Sure, there are a lot of reviews that look at the absolute performance of graphics cards by pairing them with top tier computer systems but it is a known fact that people on lower-end hardware might not expect experience quite the level of gains observed in those reviews.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">This is due to the various overheads that are present in software, particularly for Nvidia GPUs, and it has been <a href="https://youtu.be/JLEIJhunaW8">noted by outlets such as Hardware Unboxed</a> that AMD GPUs have a performance advantage on lower-end CPUs.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The stock expereince is one point of view. However, something I've been tentatively dipping my toes into over time has been undervolting/power limiting and overclocking. In that vein of thought, one of the main points from all the reviews I've seen has been highlighting how the RTX 4070 is effectively an RTX 3080 but at ~120 W lower power requirements.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">That, in and of itself, is impressive but all the 4070s I can find for sale/review have a single 8-pin power connector and a <a href="https://www.techpowerup.com/review/msi-geforce-rtx-4070-ventus-3x/42.html"><i>hard</i> 200 W power limit</a> on most reference designs. There are exceptions to this power limit and I was able to find cards allowed up to a <a href="https://www.techpowerup.com/review/asus-geforce-rtx-4070-tuf/41.html">216 W limit</a>, with the Nvidia founders edition having 220 W and the MSI Gaming X Trio having a much higher power limit of 240 W*. This means that, quality of silicon aside, the potential to overclock most models of this card is minimal at best - potentially - in comparison with it's predecessor, the RTX 3070 which had many models <a href="https://www.techpowerup.com/review/asus-geforce-rtx-3070-noctua-oc/39.html">able to supply 250 W and even one up to 330 W</a> from the stock power limit of 225W.</div><div style="text-align: justify;"><i><b><span style="color: #274e13;"><blockquote>*Both of these latter cards use the 12VHPWR connector</blockquote></span></b></i></div><div style="text-align: justify;">So, how much of the gap can be closed by an overclocked 3070 and how much can the gap be extended, in kind, by an OC'ed 4070?</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">I'll get into the power and clock scaling of the 4070, next time. So, let's just look at the overclocked and power limited results from each version of the card I own. It is important to note that I am using an optimised overclock - reducing the power limit to 90% on both cards. This actually reduces the amount of performance gain by around 2% on the 4070 and between 2 - 4% on the 3070 from what is actually possible on my specific cards (and with my specific hardware setup) but I think this allows for wiggle-room on quality of silicon, card power and user skill to give a generalised picture.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">The Tests...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">I have two types of test - canned benchmarks and custom runs. The difference here is that for the games where I'm using my own custom data, I'm adding in extra information such as the %GPU utilisation, the GPU power draw, largest frametime spike and the number of frametime excursions above 3 standard deviations from the average frametime during the run.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">This last metric is a modification of <a href="https://hole-in-my-head.blogspot.com/2023/03/we-need-to-talk-about-fps-metrics.html">what I was doing last time</a>, when I was defining a deviation of at least 8 ms from the prior frametime as a visible stutter - following on from GamersNexus' methodology. My methodology allows for the change in user experience with scaling of performance*. It could lead to being a bit deceptive, though. If the standard deviation of the data is quite small, there is a chance that the number of excursions will be over-represented.</div><div style="text-align: justify;"><i><b><span style="color: #274e13;"><blockquote>*i.e. An 8 ms deviation from a 16.66 ms average is more noticeable than the same deviation at an 33.33 ms average.</blockquote></span></b></i></div><div style="text-align: justify;">The other metrics I will include are relatively self explanatory. So, let's get on with the show!</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgyjeZJeFBPodeG8TAT6p1KIkxkGr3SGQTAZ7mSl7HeNW1-xZaVpGBgUHS5d2UpR6erKeF6TxhleUx39iAzo3MHygWxBVxvDIUIP2DGSj0UxOTCkbgG_z6yoFi_9QMB5MNJnG9u1--bH38jcpvBnHNxvR-MUKlOFubvyvJunZEDUhLlAJvKXYsKdMZg/s1407/Superposition_Heaven.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="351" data-original-width="1407" height="160" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgyjeZJeFBPodeG8TAT6p1KIkxkGr3SGQTAZ7mSl7HeNW1-xZaVpGBgUHS5d2UpR6erKeF6TxhleUx39iAzo3MHygWxBVxvDIUIP2DGSj0UxOTCkbgG_z6yoFi_9QMB5MNJnG9u1--bH38jcpvBnHNxvR-MUKlOFubvyvJunZEDUhLlAJvKXYsKdMZg/w640-h160/Superposition_Heaven.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"></td></tr></tbody></table><br /><div style="text-align: justify;">Starting with the old stand-bys: <a href="https://hole-in-my-head.blogspot.com/2023/02/the-power-curve-of-rx-6800-and.html">Unigine Heaven isn't so suitable for testing modern graphics cards</a> as I've seen that it's less demanding on the functional hardware within the newer GPU designs and is very core clock frequency sensitive - so it's relatively easy to "make the numbers go up". Superposition uses more of the silicon on the chip with its screen space ray traced global illumination but is still quite easygoing on modern mid-range and higher end GPUs. While my scores aren't anything to write home about (I'm somewhere in the 800 - 900th range with the above Superposition results) it's clear that the RTX 40 series silicon can clock very high, very easily.</div><div style="text-align: justify;"><blockquote><b><i><span style="color: #274e13;">I will make a note here that on the reference model of card there is no way to increase the power limit - I am unable to go past 200 W (100% limit). Additionally, there is minimal ability to adjust the core voltage - my card will not go above or (stably) below 1.1 volt (<a href="https://www.reddit.com/r/nvidia/comments/12pihwo/rtx_4070_efficiency_undervolting/">unlike some others</a>). Additionally, I am not even able to get my card to heat up above 69 degrees Celcius. The only conclusion I can make is that the card is power and voltage limited to such an extent that there is minimal headroom to overclock. The best score I've been able to obtain in Superposition has been 10202 pts, which is really not that much higher </span></i></b></blockquote></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj7ckmIMY_2ufqveBCO3e4mJhDF0X-jznF0DzDpljKMwn8TiEkEoN5sYwUB_kYOVFn_Ff8wPkYJrrWRzOS4NaBd3O90JZtOj82G8CpWKC-MZwkNI85DdYgaEMw4tEafPy3O0GCz4Wf4dSeVqXhX4Isc89Tf55bgp-ttwPlWlx8oSBbjqmwZbXA7lWHn/s1404/AC%20Valhalla_Timespy.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="350" data-original-width="1404" height="160" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj7ckmIMY_2ufqveBCO3e4mJhDF0X-jznF0DzDpljKMwn8TiEkEoN5sYwUB_kYOVFn_Ff8wPkYJrrWRzOS4NaBd3O90JZtOj82G8CpWKC-MZwkNI85DdYgaEMw4tEafPy3O0GCz4Wf4dSeVqXhX4Isc89Tf55bgp-ttwPlWlx8oSBbjqmwZbXA7lWHn/w640-h160/AC%20Valhalla_Timespy.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"></td></tr></tbody></table><br /><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj_kFWR2YFL8dXbc7LMYLfhLe1lOKINB6o-PjcybGr34XeEc5FxDB6-qEbBpU58xgvhYZddmgwzNhJL6WS65xdXhdYnQyXHgbIrPP6pEsQ3zY7Fesib-uP4Nc9BmyLwHv5IButHYpPbYVPgdgrN8fVBsDQlw93hr_2Y7i0yuiYNfZ9uU5hmQuj7Qqnz/s1404/Knight_Metro.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="352" data-original-width="1404" height="160" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj_kFWR2YFL8dXbc7LMYLfhLe1lOKINB6o-PjcybGr34XeEc5FxDB6-qEbBpU58xgvhYZddmgwzNhJL6WS65xdXhdYnQyXHgbIrPP6pEsQ3zY7Fesib-uP4Nc9BmyLwHv5IButHYpPbYVPgdgrN8fVBsDQlw93hr_2Y7i0yuiYNfZ9uU5hmQuj7Qqnz/w640-h160/Knight_Metro.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"></td></tr></tbody></table><br /><div style="text-align: justify;">Going through the other canned benchmarks, we're looking at around a 20 - 30 % performance increase (<a href="https://www.techpowerup.com/gpu-specs/geforce-rtx-3070.c3674">which is pretty typical</a> for the RTX 3080/4070 over the RTX 3070). What is most impressive is the fact that the card is doing that at whilst saving around 20% of the power use. The only issue here is that these are not modern titles and none of them involve real gameplay. So, let's take a look at those real-world scenarios...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">First up, there's Hogwart's Legacy. This manual benchmark is a run around Hogsmeade on the latest patch as of 06/05/2023. The performance is much improved since the launch, though I am also testing with mostly only high settings outside of materials and textures - which are set to ultra. You can see that GPU utilisation is quite low - even for the 3070. There are also small stutters related to loading of data onto the GPU and we can see that there are more of these experienced by the user of the 3070, even if the average fps is only very slightly increased.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">It's clear to me that we are slightly CPU bottlenecked, here. While lower amount of VRAM on the 3070 is the cause of the increased amount of stutters (hence the improvement in the minimum fps when memory is overclocked!).</div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj7OS-UhnLIKd_bFhJPSn-RxZ1fhejatePpTaIYoGbhSU7iZuX8Dhox35PxgwSy8b6UP6OkfFH0l8rYGEF8MSeCEsRU5RzIwsjP4Mp9UOisIn_h5CJaQHjAioWoTt0xUB7hXEKrlpDdQSHJ09gmSs9TNrElzZuDIwVzIo0QiFD0r1qblx1SeR5kRa0t/s1405/Hogwarts.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="352" data-original-width="1405" height="160" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj7OS-UhnLIKd_bFhJPSn-RxZ1fhejatePpTaIYoGbhSU7iZuX8Dhox35PxgwSy8b6UP6OkfFH0l8rYGEF8MSeCEsRU5RzIwsjP4Mp9UOisIn_h5CJaQHjAioWoTt0xUB7hXEKrlpDdQSHJ09gmSs9TNrElzZuDIwVzIo0QiFD0r1qblx1SeR5kRa0t/w640-h160/Hogwarts.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"></td></tr></tbody></table><br /><div style="text-align: justify;">Next, we have Spider-man, using high settings. Once again, we're in quite a similar situation to that in Hogwart's - the game is very heavy on the CPU but one interesting facet is that the RTX 3070 struggled a bit when overclocked and I am at a loss to explain why*. The frametime graph shows a consistent stutter so I am inclined to throw this result out as being affected by something going on in the OS/background tasks - unfortunately, I have not had the time to re-re-install the 3070. However, best case scenario, we are looking at equivalent performance between all four cards, only at a much lower power draw (due to the low utilisation) of the 4070.</div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhP7r7AlLPREJCG6Eh_mG7Q8qZ6TIlY6xNIXCkHe8oEXAUNr9gbyDLjBXmjMVjzfrsQRPycStDIIjASo2ASgcFFBy5GmQZuugDlyLJhZzUWqseR0RSi3XPCTJzHyyNfKik-UDGCt6dT-l9J6-A3ZDCpCLj8StvNc8zviJWomBimwlcyzMe12s4hRWQG/s1402/Spiderman.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="350" data-original-width="1402" height="160" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhP7r7AlLPREJCG6Eh_mG7Q8qZ6TIlY6xNIXCkHe8oEXAUNr9gbyDLjBXmjMVjzfrsQRPycStDIIjASo2ASgcFFBy5GmQZuugDlyLJhZzUWqseR0RSi3XPCTJzHyyNfKik-UDGCt6dT-l9J6-A3ZDCpCLj8StvNc8zviJWomBimwlcyzMe12s4hRWQG/w640-h160/Spiderman.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Apologies - I forgot to include this chart when I posted this review...</b></td></tr></tbody></table><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj8j4d9LJhfnFbzqTXfZHbsZ5EB9MKXeiqFxX0EE9yG8epTAGjwp9TDC3GSFrQxUI0s7oFiPMB187kevyGdlzo33TRhi8EzImNF3doWf-sBfV1hMd86HWsF9tSvJK_7UwXMfrwRo7ezW80TuugAt18gnu8Q8mA6oyEB4RrSZfuETPX6MidLBzAmt8Id/s905/Unexplained%20stutters_Spiderman.png" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="471" data-original-width="905" height="209" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj8j4d9LJhfnFbzqTXfZHbsZ5EB9MKXeiqFxX0EE9yG8epTAGjwp9TDC3GSFrQxUI0s7oFiPMB187kevyGdlzo33TRhi8EzImNF3doWf-sBfV1hMd86HWsF9tSvJK_7UwXMfrwRo7ezW80TuugAt18gnu8Q8mA6oyEB4RrSZfuETPX6MidLBzAmt8Id/w400-h209/Unexplained%20stutters_Spiderman.png" width="400" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>The RTX 3070 OC result was marred by a consistently spaced system interrupt during testing of Spider-man...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><b><i><span style="color: #274e13;"></span></i></b><blockquote><b><i><span style="color: #274e13;">*[Update 24/06/2023] It seems that there was, indeed, a resource issue occurring during the test. From a similar observation made during <a href="https://www.igorslab.de/en/test-series-with-cards-from-nvidia-and-amd-when-suddenly-the-memory-of-the-graphics-card-is-no-longer-enough/3/">Igor Wallossek's VRAM testing</a>, it seems most likely that there was contention between the game and another programme for VRAM utilisation. </span></i></b></blockquote></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Following that, we have The Last of Us. This release has also been marred by terrible performance at launch, some issues of which have since been resolved (such as inappropriate LOD/MIP level on assets when selecting medium and low quality textures). While the increased graphical grunt of the 4070 is giving a small boost to the average, the minimum fps are entirely influenced by the incessant stutters and poor frametimes that are experienced by the user in this title. As a result, at this resolution my system is entirely CPU bound.</div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgGWCrcUnqnkxxFsroHwRFhb2j1FEV80B_SSBtGl8SIhl-f2Li41co2dXxVkhsF87S1WpWWB2ZtpaXC8oc4Xs0JdTfrq1vk-nWecXHH2vAFUk4Jhxgr4hScy1jIuwVAFFAYGE7v3dsyZZDa7ZnxYUntMlWKQ7hjz2I4EYJKB8s-jsm40GekXnMOMnnt/s1402/TLOU.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="351" data-original-width="1402" height="160" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgGWCrcUnqnkxxFsroHwRFhb2j1FEV80B_SSBtGl8SIhl-f2Li41co2dXxVkhsF87S1WpWWB2ZtpaXC8oc4Xs0JdTfrq1vk-nWecXHH2vAFUk4Jhxgr4hScy1jIuwVAFFAYGE7v3dsyZZDa7ZnxYUntMlWKQ7hjz2I4EYJKB8s-jsm40GekXnMOMnnt/w640-h160/TLOU.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"></td></tr></tbody></table><br /><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjUY1hXy2zuMbSM7pVZyBS6TWBucSE9x8ULXQdkyfM20lOjbki1Pp38X7cvapyGVvI5wMY2fgYNyxzJj8J0kq93aLf4548hLWK3pV5OAFWzI7ABLtc-lN6IwTstjcKk3fJw-6dnJS3lndI4YekZKelYz-XAeOHoD66Ib5_2ZSrKVP5jTIYSMAe0Cww-/s1169/TLOU%20stutters.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="309" data-original-width="1169" height="170" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjUY1hXy2zuMbSM7pVZyBS6TWBucSE9x8ULXQdkyfM20lOjbki1Pp38X7cvapyGVvI5wMY2fgYNyxzJj8J0kq93aLf4548hLWK3pV5OAFWzI7ABLtc-lN6IwTstjcKk3fJw-6dnJS3lndI4YekZKelYz-XAeOHoD66Ib5_2ZSrKVP5jTIYSMAe0Cww-/w640-h170/TLOU%20stutters.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"></td></tr></tbody></table><br /><div style="text-align: justify;">Finally, we take a look at a Plague Tale Requiem. Taking a turn around the busy town scene, not only can we see a small performance improvement when moving to the 4070, we also observe that the game delivers frames exceptionally well, across the board.</div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjhLwsgCoSJZ_aJQyET_uG9P9-Gc5MvLgZPHk7JtQDNZflKTQGn2ezg1f-yl5lWkCeGQJ2Zl6X8Tv-Uzx4QmgjTjQklud6-26vlYZGGNFsgWgfuA_wMa1xcgOaii0GwjCvytg8xZ7z8CfkjBucKOeoj3Y1m6jDYeDhKuEmeF6wE6ZFZ4egqS6eMp5rt/s1405/Plague%20Tale.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="350" data-original-width="1405" height="160" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjhLwsgCoSJZ_aJQyET_uG9P9-Gc5MvLgZPHk7JtQDNZflKTQGn2ezg1f-yl5lWkCeGQJ2Zl6X8Tv-Uzx4QmgjTjQklud6-26vlYZGGNFsgWgfuA_wMa1xcgOaii0GwjCvytg8xZ7z8CfkjBucKOeoj3Y1m6jDYeDhKuEmeF6wE6ZFZ4egqS6eMp5rt/w640-h160/Plague%20Tale.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">Conclusion...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">And so ends our first little foray into mid-range gaming and we can draw an important conclusion here - while top-of-the-line CPUs and new systems with DDR5 might be able to eke out <a href="https://www.techpowerup.com/review/asus-geforce-rtx-4070-tuf/31.html">a 20% performance advantage</a> for the RTX 4070 over the RTX 3070, on a pretty decent, modern mid-range CPU, there's effectively zero difference in performance at 1080p when running modern triple-A games. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">For older titles, or titles where the CPU is not a bottleneck, we are looking at that expected 25 - 30 % performance increase and that's acceptable.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Going back to those CPU-limited titles: at higher resolutions and in situations where the 8 GB of VRAM is limiting the performance (for instance, if I bumped up the settings to the very maximum). there will be a more noticeable performance uplift, even on the i5-12400. However, it seems like the better choice is to invest in a faster CPU than a faster GPU as a mid-range system isn't going to get the best performance out of that graphics card!</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">In the next entry in this series, I will explore the differences on my AMD system and compare against the RX 6800... So, stay tuned!</div></div>Unknownnoreply@blogger.com3tag:blogger.com,1999:blog-7560610393342650347.post-80444431386136851082023-04-01T15:29:00.010+01:002023-04-01T20:19:23.537+01:00Analyse This: The Technical Performance of Last of Us Part 1...<div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEga7FrS-fWlf6xTpgdiAMY3sFzJn0mTKyBnwdu0p0vMKOVn16B5NhN6toblkndVAbggo-dgxKKMQCy2-RhtR8K0uBLgKjlvBPKUgs3yGW5jZQU00mEwtVfDqBAx3v7o_n1eXeoHwR1XXKFJNHGgmEWedhp4u0lkbcXcUyOBKUgxAvGFw7RwtupBa4Jt/s1920/Title%201.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="1080" data-original-width="1920" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEga7FrS-fWlf6xTpgdiAMY3sFzJn0mTKyBnwdu0p0vMKOVn16B5NhN6toblkndVAbggo-dgxKKMQCy2-RhtR8K0uBLgKjlvBPKUgs3yGW5jZQU00mEwtVfDqBAx3v7o_n1eXeoHwR1XXKFJNHGgmEWedhp4u0lkbcXcUyOBKUgxAvGFw7RwtupBa4Jt/w640-h360/Title%201.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Where we're going, we don't need roads...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Although I'm fairly sure we do... The Last of Us Part 1 (from here on in: TLOU) has had a rocky start on its PC release. Problems have been reported far and wide, and players have complained, and complained, about performance issues on this PS3 port of a game.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">But is this really a fair comparison? Does TLOU really have the performance problems that people ascribe it? Let's find out!</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">There will be no spoilers here, today, because I have barely played ten minutes of this release. However, I think I know enough to have a very short post on the demands of this game.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">So, without further ado:<span><a name='more'></a></span></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">I never had a Post Thoughts on TLOU when it released on the PS3. In fact, I don't even remember if that "structure" existed on this blog at that point in time. At any rate, I hold the original game with a certain fondness in my heart, if not my mind*. This was one of those few games that you could sit, absorb and continue to enjoy past the moment.</div><div style="text-align: justify;"><b><i><span style="color: #274e13;"></span><blockquote><span style="color: #274e13;">*The ending made no scientific sense, and as such, I found it very easy to just blast away at the fraudulent people who were trying to chop up Ellie for no gain...</span></blockquote></i></b></div><div style="text-align: justify;">The game has since been <a href="https://www.youtube.com/watch?v=Xn8H0vOfIL0&ab_channel=DigitalFoundry">updated for PS4</a> and then <a href="https://www.youtube.com/watch?v=0mv0dAwPqCs&ab_channel=DigitalFoundry">re-mastered for PS5</a>... leading to this very moment where exclusive PC gamers get to experience the story and world that was crafted by Naughty Dog. Personally, I think this port is long overdue. It could and should of happened after the PS4 port but it didn't. What we have been granted with, now, appears to be - at the surface level - a difficult port based on the PS5 remaster. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">What this is not, essentially, is an old game. This is a new game, using new technologies. However, this game is based on a game that has been transistioned between various hardware, perhaps without forethought on its interoperability with the PC gaming ecosystem. As such, in its current state, it has strange demands on PC hardware that are difficult to explain beyond being simple technological patches on the original engine.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjNYHMr0efWvjdcmqoqdIiPLZA78Xd8Ggjc-K7WnBwdgB3hGaiszO9DmIXz4Xvy-KFWOFRMG22DtLf89Bf_Q_E6r2bry-b8-t7bYELAMi4NtA522fk_xPKESWALxrYCJVIyDMztFWmx34SQoizj3nAZpUaJczpf7dByi9Yv5FtDP7xpBz2PX_xKqwuk/s1920/20230401143725_1.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="1080" data-original-width="1920" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjNYHMr0efWvjdcmqoqdIiPLZA78Xd8Ggjc-K7WnBwdgB3hGaiszO9DmIXz4Xvy-KFWOFRMG22DtLf89Bf_Q_E6r2bry-b8-t7bYELAMi4NtA522fk_xPKESWALxrYCJVIyDMztFWmx34SQoizj3nAZpUaJczpf7dByi9Yv5FtDP7xpBz2PX_xKqwuk/w640-h360/20230401143725_1.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Texture settings have a noticeable effect on the visuals of the game but do scale well with PC resources...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">Getting over it...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">TLOU is not a difficult game to run, graphically. The options included in the game are quite versatile and enable decent levels of scaling for the available GPU power. However, this isn't the whole story. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Aside from lacking a benchmarking feature to enable players to understand how their choices really affect the technical performance, the options in this game scale in a way that most other games do not: For instance, VRAM is easily eaten up by the higher texture settings in the game (see above). But scale very strongly with the available settings, meaning that they will work with whatever hardware is available - unfortunately, with an associated loss in visual fidelity...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">I would say that "medium" settings are fine... though not pretty compared to various other titles.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">As a result, it is not quite clear why or how the game is using so many resources to generate visuals which are on par with other titles of the current generation - without ray tracing. However, users choosing not to realistically set their settings to lie within the VRAM requirements of the game (which are realistically set, aside from one menu bug) is most likely one of the main causes of the current conflagration surrounding TLOU on PC.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiyvFaIxi26BQdx5BwbWIRweOn400SVVBysFrx0U8nU4IfrO5TaVIsbd-XUCTUoymylYNDmy5MlhIvzBdI1Gy4pLnZhkXj_uKEj2V6tam8NpBtQRkrmSRAgThspzH9FoTOUZ5wYM7JHQbCz2XEU_2InoiA9ttRzSZkhQjxWi7yPPA_BSNViaLdf9loQ/s3004/Menu%20bug.png" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="772" data-original-width="3004" height="164" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiyvFaIxi26BQdx5BwbWIRweOn400SVVBysFrx0U8nU4IfrO5TaVIsbd-XUCTUoymylYNDmy5MlhIvzBdI1Gy4pLnZhkXj_uKEj2V6tam8NpBtQRkrmSRAgThspzH9FoTOUZ5wYM7JHQbCz2XEU_2InoiA9ttRzSZkhQjxWi7yPPA_BSNViaLdf9loQ/w640-h164/Menu%20bug.png" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>The VRAM utilisation bar in the settings menu is very accurate, aside from this one bug I noticed when playing with settings - the High and Medium settings <i>must</i> be accidentally swapped...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Worse still, this title exhibits a strange reliance on CPU processing power, which may be leading many users to incorrectly finger the wrong piece of their hardware as being the culprit.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">I have two systems available to me and I have quickly tested the game on both - at the very start. It may not be representative of scenes later into the game but this testing does highlight some differences in the setups and identify potential solutions to those looking for them.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">System Limitations...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The systems I use are <a href="https://hole-in-my-head.blogspot.com/2022/08/analyse-this-performance-of-spider-man.html">covered in prior blogposts</a>, so I won't go over them again. However, the long and short of it is that I have a weaker processor paired with a stronger GPU and a stronger processor (and RAM) paired with a weaker GPU. This is not some Machiavellian stroke of genius to whittle-out issues on PC releases... it's just dumb luck*.</div><div style="text-align: justify;"><b><i><span style="color: #274e13;"><blockquote>*If it is even that...</blockquote></span></i></b></div><div style="text-align: justify;">The point I'm trying to make, here, is that we are potentially able to observe where system-level bottlenecks or application (a.k.a. game) bottlenecks, or demands, are focussed.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The aspirations I will look at for this analysis are 60 fps at 1080p. These limitations are a reflection of the hardware available to me but also information which the any user will find useful in terms of their own experience. Many other reviewers will be able to spit out bar charts covering the average and minimum or percentile lows (wrong as they may be) but they do not explore the more common situation of a v-sync'ed experience.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">This analysis will focus on frametimes, instead of 'fps', with a target of 16.666 ms for our "perfect" user experience. The benchmark is an approximate two minute period during the escape of Joel and his daughter during the initial outbreak, covering in-engine cutscene and movement gameplay, both in an ubran environment with a large number of AI NPCs and a small, lightly forrested area with minimal AI NPCs.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">During the analysis I will often refer to each system as the AMD/Intel system, for ease of writing - this is <i>only</i> referring to the platform in each system... and does not infer any poor performance on either manufacturer.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">First up, let's look at the effect of the two systems on the game:</div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhAzHh5zMTclx14YgkPZ6KKWZCZ0xmsNifuxm2e-l9xBeLB0yGiSr2igu_yuozMj3SmbhlAQGCue7iW3lPn3f_ZIRmIl14ylvcP5OFbCBTVCj4LrNDx0uSumAiiZxX6ghv2cFLEugvJjXPDLr3GiSZjizQNLyiW6EFSqm8VIQGO3uJN1RUW89QANy6W/s1193/AMD_comparison.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="1193" data-original-width="982" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhAzHh5zMTclx14YgkPZ6KKWZCZ0xmsNifuxm2e-l9xBeLB0yGiSr2igu_yuozMj3SmbhlAQGCue7iW3lPn3f_ZIRmIl14ylvcP5OFbCBTVCj4LrNDx0uSumAiiZxX6ghv2cFLEugvJjXPDLr3GiSZjizQNLyiW6EFSqm8VIQGO3uJN1RUW89QANy6W/w526-h640/AMD_comparison.jpg" width="526" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>The game scales well, graphically, but it seems that the texture detail is where the majority of the problems are emanating from...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The AMD system with an R5 5600X paired with an RX 6800 does well in terms of presentation to the user - with minimal frame drops or missing frames at medium or high quality settings. Ultra quality, on the other hand, provides a sudden challenge to the system, with a frametime spike in the hundreds of milliseconds.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Dialling back the four texture setting options (seen above in the image) to medium, we see that these spikes are severely reduced at ultra quality settings. What is odd in this benchmark run is that we <i>should not</i> be anywhere near to breaching the RX 6800's 16 GB of VRAM. Additionally, we do not consistently observe frametime spikes at the same points during the benchmark run (taking into account run to run variance).</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Moving onto the Intel system:</div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjPGtQD_2OUf1OGhQZZgGuHC9_QtDglmLWakw9B1dwBLX0r20P3yEvc_LKXss0WkXk7sJ_bm-6YXwIYgRAuiD0t6z4wUBr3cX1De4oZlmTYP_19Z7fBOH7wvR2H1PqOksp7nH7dDs3k5uS9VDB7CHGeS4J16oZGvTFYcSf8YQZW3-WJAk0c_6AKlfLM/s1193/Intel_comparison.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="1193" data-original-width="982" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjPGtQD_2OUf1OGhQZZgGuHC9_QtDglmLWakw9B1dwBLX0r20P3yEvc_LKXss0WkXk7sJ_bm-6YXwIYgRAuiD0t6z4wUBr3cX1De4oZlmTYP_19Z7fBOH7wvR2H1PqOksp7nH7dDs3k5uS9VDB7CHGeS4J16oZGvTFYcSf8YQZW3-WJAk0c_6AKlfLM/w526-h640/Intel_comparison.jpg" width="526" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>The RTX 3070 gives a much worse presentation to the player... but behind the scenes we see a different story playing out...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The i5-12400 paired with the RTX 3070 is struggling.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Presentation-wise, we're looking at lots of stutters and dropped frames throughout the benchmark. Yes, at ultra we're seeing terrible frametime spikes - though much better than with the AMD system. Primarily, the reason for these is the 3070's 8 GB VRAM buffer - at high and ultra quality, the game flat-out tells the user that this is above the poor GPU's VRAM limit. So, fair enough. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">However, dropping the four texture settings to medium at ultra quality, as we did with the RX 6800, we do not observe a huge improvement in presentation compared to the high quality preset.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">This indicates that the issue is not only related to VRAM quantity.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">Playing the long game...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The thing is - are we really saying that the RTX 3070 is <i>that</i> much worse than the RX 6800 at 1080p with a non-raytraced game? Personally, I would not expect it to be... So what <i>IS</i> going on?</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Removing graphical considerations, as much as we can (in this particular comparison) I used the optimised settings <a href="https://hole-in-my-head.blogspot.com/2023/01/analyse-this-does-ram-speed-and-latency.html">I have previously validated</a> for the Intel system at DDR4 3200 at the medium quality preset. For the main portion of the testing, the RAM was operating at the DDR4 3800 optimal settings.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEih7z8-XbiP6vtO6r0f7UM1qA1FMzXEIVuv6NZuo_dkG6xr2t1_lY3l-jcTxbnPyxEzYx_Qux1ebh1rLcpr5umbEh_P_19MDtXIaW_DSzuBG8mDLHe8D0fAjXIHjLKxrQ0qQdhMAd5pprh_LgzE9O_sEe_F0YnoYU8eyHNES0O7HXKIHLdmdrPHGlVV/s977/Memory%20comparison.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="596" data-original-width="977" height="390" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEih7z8-XbiP6vtO6r0f7UM1qA1FMzXEIVuv6NZuo_dkG6xr2t1_lY3l-jcTxbnPyxEzYx_Qux1ebh1rLcpr5umbEh_P_19MDtXIaW_DSzuBG8mDLHe8D0fAjXIHjLKxrQ0qQdhMAd5pprh_LgzE9O_sEe_F0YnoYU8eyHNES0O7HXKIHLdmdrPHGlVV/w640-h390/Memory%20comparison.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>With the faster RAM speed, (aka higher bandwidth) the game plays smoother...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;">What we are able to observe is an appreciable effect of memory bandwidth on the presentation of the game! The faster RAM reduces sequential frametime variance and also reduces delays in frame presentation for the user on the screen.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">This isn't the whole story though.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Look back up at the graphs for each system and compare the each quality setting between them. At both medium and high quality settings, the 12400 outperforms the 5600X. At the higher quality settings, the GPUs become more of a limiting factor. This means that the game is also heavily affected by the throughput of the CPU: frametime spikes are smaller on the 12400 with faster RAM.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">This is specifically happening in the sections of the benchmark where there are physics interactions and large numbers of AI NPCs running around. Looking at the 5600X/RX 6800 system, we can see that in the later sections when we move away from those situations, the sequential frametime reaches its true potential, inbetween some undue frametime spikes.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgIJ_lXwieJPT_U9LFdzXFGvwJ9Eo-bQHMmY6loiBQeGEwwhOu20Rx_IPc5smy_k5EojjXWfmv11JimYBCmoQ1AaSfHI0E_CV_yXuBmQ9VbZ7kLFlMSi6ETmPs64whC-O0DL2HCurb9Z7DS8g05N_oK02UChye3UIKtCGCREybDvMKxIcKeFiW3_cDk/s1920/20230401161854_1.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="1080" data-original-width="1920" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgIJ_lXwieJPT_U9LFdzXFGvwJ9Eo-bQHMmY6loiBQeGEwwhOu20Rx_IPc5smy_k5EojjXWfmv11JimYBCmoQ1AaSfHI0E_CV_yXuBmQ9VbZ7kLFlMSi6ETmPs64whC-O0DL2HCurb9Z7DS8g05N_oK02UChye3UIKtCGCREybDvMKxIcKeFiW3_cDk/w640-h360/20230401161854_1.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>There is some (post?) processing happing that is heavily impacting the AMD CPU... on medium quality settings.</b></td></tr></tbody></table><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">Conclusion...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The Last of Us is, apparently, a very heavily CPU and RAM-limited game. From my brief time with it, the game seems to scale well with available graphical processing resources but the CPU limitation is a difficult aspect to explain without an understanding of how the game engine is working. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">It is important to note that the game will heavily load up the system memory of any system. I have observed that the game will easily eat up 16 GB as a buffer between storage and the VRAM - this is actually optimal behaviour and can explain why the bandwidth of memory is quite an important performance attribute for this game.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Speaking of VRAM, the game does not look its best at medium settings - with smeary surfaces and horrid reflection quality. High or Ultra is where you would like to be for the best experience. Which means you would ideally have greater than 8 GB VRAM on your GPU...</div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiyDDnmTErTtKhn2cnaeWepQZvrND2n3-K5NcVLvelyMJjn-_YLRzBlGgvxBWuzqrQMUTJEpD9ZJoCfnl4jdFTQ1StcUvU23_Oz7KKwok0UYvCvVVigjnFKts7dMcrhQQtu17-kipZ41aKZYQILZ7L18FVXI-d-YDIbCbhzHI9O7uHGDu7kiWLkqAYS/s1920/Reflection_medium_ultra.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="1080" data-original-width="1920" height="225" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiyDDnmTErTtKhn2cnaeWepQZvrND2n3-K5NcVLvelyMJjn-_YLRzBlGgvxBWuzqrQMUTJEpD9ZJoCfnl4jdFTQ1StcUvU23_Oz7KKwok0UYvCvVVigjnFKts7dMcrhQQtu17-kipZ41aKZYQILZ7L18FVXI-d-YDIbCbhzHI9O7uHGDu7kiWLkqAYS/w400-h225/Reflection_medium_ultra.jpg" width="400" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Medium quality setting reflections look absolutely terrible...</b></td></tr></tbody></table><div style="text-align: justify;"><br /></div><div style="text-align: justify;">However, neither the RX 5600X or i5-12400 are weak CPUs compared to what is available in the current generation of consoles but, here we are... a game wreaking havoc on any CPU that is not at the peak of current performance levels. In this, I am at a loss as to <i>what</i> exactly, the game is doing to warrant such high CPU usage. In the portion I've been testing, I can see that the worst portions are those with explosions and heavy physics interactions. However, this isn't the only game with large numbers of NPCs (Hogwart's Legacy/Spider-man) or physics interactions...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Could it be that the engine on PS5 uses the CPU in some unusual manner which doesn't translate well to the hardware setup on PC?</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">At any rate, players of this game, at this point in time, ideally need a strong, modern CPU with fast, low latency system memory and a GPU with a larger than 8 GB VRAM framebuffer in order to have the best experience...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">It's possible that <i>some</i> of this can be addressed in an eventual patch but if the CPU demands are hard-coded into the engine, it seems like that is not a reasonable item to fix, going forward... This means that, overall, TLOU is a disappointing technical release on PC, years late from the original experience.</div>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-7560610393342650347.post-38840449509919811402023-03-19T12:55:00.006+00:002023-03-20T05:58:01.680+00:00We Need to Talk About FPS Metrics Reporting...<div style="text-align: left;"><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjSKArFhf1UWS30YpKfM1pwao4CMRFv_-BHFrSzuuZM7u3eorHQ9JZkK2iVS-uvO2oY44Ee9zBlrMMbVLOcu-C41eI5hJcdUYKATBrmCGDWvhqXGgEWZNFwdOnX4jNucvI8BWLw64ZJumuDLhNHM85ZBRw_alSgdk6cbikA384GWV0241IJ-J-R_x__/s1920/Title.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="1080" data-original-width="1920" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjSKArFhf1UWS30YpKfM1pwao4CMRFv_-BHFrSzuuZM7u3eorHQ9JZkK2iVS-uvO2oY44Ee9zBlrMMbVLOcu-C41eI5hJcdUYKATBrmCGDWvhqXGgEWZNFwdOnX4jNucvI8BWLw64ZJumuDLhNHM85ZBRw_alSgdk6cbikA384GWV0241IJ-J-R_x__/w640-h360/Title.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"></td></tr></tbody></table><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Recently, I've been on a bit of a tear, running around trying to get people to listen to me about what <i>I</i> believe is a better way to <a href="https://hole-in-my-head.blogspot.com/2023/02/analyse-this-forspoken-demo-analysis.html">analyse</a> <a href="https://hole-in-my-head.blogspot.com/2022/08/analyse-this-performance-of-spider-man.html">the performance</a> of <a href="https://hole-in-my-head.blogspot.com/2023/03/analyse-this-technical-performance-of.html">individual games</a>, as well as the hardware used to run those games than is currently being performed by the vast majority of hardware reviewers in the tech space...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">As with all new ideas, things are still developing and I'm still choosing what to keep and what to drop - what works, and what doesn't. Today, I'm going to summarise the conclusions I've come to, thus far, and also introduce a new concept that I feel like <i>everyone</i> in the tech space should be doing. However, some are getting it wrong: wrong to <b><u>SUCH</u></b> an extent that it literally took me a while to double check the concept because I can't be the only person to have realised this over the last ten years...</div><span><a name='more'></a></span><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">What works...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The journey started back in 2021, listening to a <a href="https://www.youtube.com/live/miYNwPAJ7mM?feature=share&t=2881">Techdeals video</a>. Sure, Techdeals has kinda fallen off the deep end in the intervening period, but his thoughts on frametime vs stability vs presenting that information to the consumer of a tech/game review got my neurons all fired up.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">This was surely a solvable problem! </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">We've gone through advancements in presentation of benchmark data (heck, even with the benchmark process and best practices!) before. Why not another time? So, I set myself this goal - to try and understand what is important in benchmarking data and how to further improve what we have.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh-g_od5LSUXe0fpL4MV6YqFH4uRtVBioEJaRH1HKBgg3n2ZktVHL8a4uuoFvViNqu28ZSmscMKACJ1aDtjqWuHp6Lfeg_oNRWHCQEkVaQxrTUscy-Jh_u-fTDOlJGmy1XcZcn0LI64AgExfCEMMZYSV6W_1bG2z0u2I-daUds044Mv0We4WZGFp1yZ/s862/Original%20idea%202_1.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="104" data-original-width="862" height="77" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh-g_od5LSUXe0fpL4MV6YqFH4uRtVBioEJaRH1HKBgg3n2ZktVHL8a4uuoFvViNqu28ZSmscMKACJ1aDtjqWuHp6Lfeg_oNRWHCQEkVaQxrTUscy-Jh_u-fTDOlJGmy1XcZcn0LI64AgExfCEMMZYSV6W_1bG2z0u2I-daUds044Mv0We4WZGFp1yZ/w640-h77/Original%20idea%202_1.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>The germ of an idea...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The first time I managed to get my act together was for the review of <a href="https://hole-in-my-head.blogspot.com/2022/08/analyse-this-performance-of-spider-man.html">Spider-man's performance</a>, almost a year later. I was spurred into action when a load of people claimed that performance was better with hyperthreading off on the CPU. Intial probings by just screencapping weren't working... a snapshot doesn't capture an experience! So, I went back to that idea and made it work... mostly.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Things weren't 100% right.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Sure, I think the frametime differential graphs are really easy to read by the consumer of the data. They are great for doing comparisons where the difference observed can be large. They are not so great when you have a lot of data to compare (e.g. when testing multiple GPUs): you lose clarity when too many plots are overlaid upon each other.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The idea of using process performance and demarcating a control space within three standard deviations of the average was also good... but, again, we're talking about a single CPU/GPU combo. Though, I haven't given up on that idea just yet, I need to spend more time to see if it can be utilised outside of narrow applications - as was the case for the Spider-man analysis.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">One thing I <i>absolutely</i> effed up were the static numbers: instead of comparing frametime numbers, I was converting everything into calculated "fps" from those static frametime numbers. From then on, I moved to speaking about frametimes - and that lines up with the analysis I do when using frametime differential plots to see the actual smoothness of the game running.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">From that point on, when <a href="https://hole-in-my-head.blogspot.com/2022/12/analyse-this-does-ram-speed-and-latency.html">I was performing</a> my <a href="https://hole-in-my-head.blogspot.com/2023/01/analyse-this-does-ram-speed-and-latency.html">RAM testing</a>, I also began using the median frametime value in order to show a bias from the average for which the player will experience - will it be better, or worse? I also combined this data with the natural log of the frametime in order to have a visual reference for this bias: Is the distribution normal? How narrow is the peak? How high is either "end" of the plot? Each of these visual aspects has a bearing on how "good" the performance of the application is within each given test.</div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjR9CPL7aW2uJLh7cqOh_mg02Bj9NN4jxmwDj8Y9qfLEdGmrewXz2PxpQTJDhBpE2ORZw4IRgKxwsnQhM3PY7EpWIVPTCBdc8a6q0CD2162aPhCUoOF5tJ7dlxA78GI2FH0MDN1cOnbd0k3fQpJUBKh8u8SEtmoUFyFeEHmkPfriTduA55p7OhnxCgR/s1008/RT_Intel_comparison.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="432" data-original-width="1008" height="274" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjR9CPL7aW2uJLh7cqOh_mg02Bj9NN4jxmwDj8Y9qfLEdGmrewXz2PxpQTJDhBpE2ORZw4IRgKxwsnQhM3PY7EpWIVPTCBdc8a6q0CD2162aPhCUoOF5tJ7dlxA78GI2FH0MDN1cOnbd0k3fQpJUBKh8u8SEtmoUFyFeEHmkPfriTduA55p7OhnxCgR/w640-h274/RT_Intel_comparison.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Yes, with this many experiments, the frametime differential plot becomes messy. The ln(frametime) plot becomes more useful...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;">It was around this time that I began to want to focus on the user experience of achieving a given performance level in a game. E.g., The player is using a 60 Hz monitor, how stable is the presentation of the game with "X" hardware when targetting that refresh rate?</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">So, summing this all up, I have the following conclusions:</div><div style="text-align: justify;"><ul><li>In general, it is preferable to speak about frametime consistency than frames per second.</li><li>Visualising the frametime consistency can give a better user understanding than static numbers.</li><li>The statistical tools for process performance can work well to understand how well a hardware combination can meet a performence expectation from the user's side.</li><ul><li>Applying the concept of process control through <a href="https://sixsigmastudyguide.com/process-capability-pp-ppk-cp-cpk/">PpK/CpK</a>.</li></ul></ul><div><br /></div><div>Now to get to today's new topic: percentile fps numbers.</div><div><br /></div><div><br /></div></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">FPS Metrics...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The reason that it's better (and safer) to speak about frametime numbers is because these are <i>actual</i> data that is provided to the reviewer/user by the hardware and software combination under their control. However, the whole point of gaming, much like movies and TV shows is that we want to create the illusion of fluid movement from combining a stack of still images. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Get the the number of still images high enough per period of time and humans no longer experience the images as individual, but instead stitch them together into moving dioramas. This is where things get a bit murky.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The concept of refresh rate is pretty simple. You have a certain number of presentations on the screen per second - usually measured in Hertz. But what about frames per second? How are you measuring that? By definition, it is an averaged metric but there are different types of averages we use in statistics to understand populations of data when certain criteria are met.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">For the average fps result, it's easy: just divide the number of frames obtained by the total time of testing. However, one of the more recently (as in, for over ten years, now) favoured metrics has been 1% low fps, or something similar (e.g. 0.1% low fps). How is that number calculated?</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Gamer's Nexus <a href="https://youtu.be/uXepIWi4SgM">posted their methodology</a> back in 2016 and they explained that they used the average of the "Xth" percentile frametimes to then calculate the "Xth" percentile fps number. It appears other outlets also have the same methodology - and applications like <a href="https://twitter.com/CapFrameX/status/1636792570310455296">CapframeX</a> appear to also use it, as well. This can be derived from <a href="https://techreport.com/review/21516/inside-the-second-a-new-look-at-game-benchmarking">work done by Scott Wasson</a> over on Techreport back in 2011*, though I'm sure others came to the same conclusions separately.</div><blockquote><div style="text-align: justify;"><b><i><span style="color: #274e13;">Unfortunately, the images of the article are lost to the mists of time...</span></i></b></div></blockquote><div style="text-align: justify;">Further work was also pioneered by the team at PC Perspective - something which I have only very recently discovered. They were thinking deeply about this issue as well and applied a more nuanced approach than Techreport. However, <a href="https://pcper.com/2013/03/frame-rating-dissected-full-details-on-capture-based-graphics-performance-testing/4/">Ryan Shrout had some different perspectives</a>* on the issues we face in terms of understanding the output. He instead focussed on the output to the display and what the user will experience. </div><blockquote><div style="text-align: justify;"><b><i><span style="color: #274e13;">*Ba-dum-PISH!</span></i></b></div></blockquote><div style="text-align: justify;">This isn't incorrect by any means. In fact, I'm pretty sure this is a version of the type of analysis that Digital Foundry perform in their tech analysis. Unfortunately, this methodology precludes much of the userbase from any testing themselves as it requires quite a lot of knowledge in terms of using the software, interpreting the data, and access to not insubstantial amounts of expensive hardware in addition to the user's gaming rig.</div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgd6DXVwYMTBy0Q_JQUv8wEOmXCAhSBq8dC3BwZSpvbsjrnb6tfpCXYJFIBz07snYmd8IQlKqzLaR55EqRUiQmQzsnte5g3uBtP0JYSB9_ODyStd7qTPyg91CJN1m7h7AqY_VPfZM_sxMXaSPnRKIbQR9NKYWHHzdvusMIlUJydPRztTfpmtpIJmkxV/s603/12400_rebar_differentials.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="357" data-original-width="603" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgd6DXVwYMTBy0Q_JQUv8wEOmXCAhSBq8dC3BwZSpvbsjrnb6tfpCXYJFIBz07snYmd8IQlKqzLaR55EqRUiQmQzsnte5g3uBtP0JYSB9_ODyStd7qTPyg91CJN1m7h7AqY_VPfZM_sxMXaSPnRKIbQR9NKYWHHzdvusMIlUJydPRztTfpmtpIJmkxV/s16000/12400_rebar_differentials.PNG" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>In the use of process control, we define upper and lower performance limits through the use of the statistical tool of standard deviation...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEikgu5F_UCKbf0Dsj_oLYXEmCWH1XZHxLnkT2K9iRmprT-FkwAxjKlNN5hVQnXy0dOFl_oYQto3yAMQ5mPU6tgk4zXFoo869KTm55Vbyea3MTz2DqwguE-KROCfXk2ieRMITnGAAx5KGUL3X-v2GVgfS604UxGeGTk-oNy-t4g0n4wQexcVDDZrUttc/s264/Rebar_Upper_lower_limits.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="211" data-original-width="264" height="211" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEikgu5F_UCKbf0Dsj_oLYXEmCWH1XZHxLnkT2K9iRmprT-FkwAxjKlNN5hVQnXy0dOFl_oYQto3yAMQ5mPU6tgk4zXFoo869KTm55Vbyea3MTz2DqwguE-KROCfXk2ieRMITnGAAx5KGUL3X-v2GVgfS604UxGeGTk-oNy-t4g0n4wQexcVDDZrUttc/s1600/Rebar_Upper_lower_limits.jpg" width="264" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>And the excursions above and below the UDL/LDL (or more correctly UCL/LCL) can be neatly tabluated...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The issue lies in the fact that any frametime value is just a snapshot of the time to deliver a specific frame. It has very little relevance to the player of the game in terms of what they experience - unless there is a <i>large</i> sequential frame time difference, which will result in visible stutter - which is why I plot those frametime differentials!</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Knowing that the game suffers from stutters <i><b>IS</b></i> an important piece of data for the consumer... but that can be presented as frametime data, not fps data.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Going back to the "Xth" percentile fps figures, since fps is a time averaged metric, it only makes sense to take the average of the player's experience around each frametime spike. The reason for this is simple - context. If you are running a game with a frametime of 16.666 ms (60 fps) and see a single frametime spike to 33.333 ms (30 fps), that means you lost a frame. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">What was your fps for the period encompassing the frametime spike? 59 fps. Instead of getting 60 frames per second, you dropped one. Your fps <i>didn't</i> drop to 30 fps - the presentation of the game was only momentarily affected and, for many people, the loss of one frame out of sixty would not be noticeable. However, if you take the first methodology for describing "Xth" percentile figures, if you were asked the minimum framerate, you would have replied 30 fps... which is false. <u><b>You never experienced a period in the game where you were presented with 30 frames in a 1 second period.</b></u></div><div style="text-align: justify;"><u><b><br /></b></u></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjqbcHc9_hpOtS6PrDWT4tnqlHv1aqOxt9KIAnXvrMLPZJ3QPjPpf1EVDy5PQln-lFhIRoXPp9UVeAJWisgZjiDMa9Ul6TzcuNyVF4nKwrG1GTXipQgyWtrMXAytWi6MtBsbe67AjXi3N9_ubM0_JsjXySIGBtsOaw1Ozs1JIZuLBHGN3_Id36Jmc0y/s1186/7950X3D_tomb%20raider%20comparison.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="515" data-original-width="1186" height="278" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjqbcHc9_hpOtS6PrDWT4tnqlHv1aqOxt9KIAnXvrMLPZJ3QPjPpf1EVDy5PQln-lFhIRoXPp9UVeAJWisgZjiDMa9Ul6TzcuNyVF4nKwrG1GTXipQgyWtrMXAytWi6MtBsbe67AjXi3N9_ubM0_JsjXySIGBtsOaw1Ozs1JIZuLBHGN3_Id36Jmc0y/w640-h278/7950X3D_tomb%20raider%20comparison.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b><strike>There isn't any way that two pretty much identical systems have VASTLY different 1% lows... this is likely due to different methodology in deriving those values.</strike> Hardware Unboxed have corrected my incorrect assumption about the two tests - the scenes used for testing are completely different here, which can explain the differences in results. [<a href="https://youtu.be/PA1LvwZYxCM?t=669">GN</a>] [<a href="https://www.techspot.com/review/2636-amd-ryzen-7950x3d/">HUB</a>]</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Other outlets, such as <a href="https://www.eurogamer.net/digitalfoundry-2022-amd-radeon-rx-7900-xt-7900-xtx-review?page=5">Digital Foundry</a>, <a href="https://www.tomshardware.com/reviews/pny-rtx-4090-xlr8-rgb-review/5">Tom's Hardware</a> and Techspot/<a href="https://www.youtube.com/c/Hardwareunboxednow">Hardware Unboxed</a> appear to use an averaging technique. This makes sense - as pointed out above, if you're talking about a time-averaged metric, you should time-average it, not percentile average it!</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">However, this assumption is also fraught with danger because no one, and I mean literally zero publications I've been able to search (aside from Gamer's Nexus and historical PC Perspective*) actually report how they get to any of their metrics. Okay, I can <i>see</i> that the framerate graph for Digital Foundry is smoother than that for the frametime graph but what is the period of time that is being averaged? Is it a 1 second moving average? Is it 0.5 seconds?</div><div style="text-align: justify;"><b><i><span style="color: #274e13;"><blockquote>*They do not use this methodology any longer...</blockquote></span></i></b></div><div style="text-align: justify;">Hardware Unboxed and Tom's Hardware, on the other hand, have no such visualisations (I haven't become a subscriber so I don't know if this data is available to them...) and, aside from being able to see a difference between Gamer's Nexus' results to draw the conclusion that they must be using a time-averaged process, the actual process used is completely opaque.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Now, I don't really blame publications for not discussing the ins and outs of their testing methodology aside from minor, very shallow, and generally understandable principles. The larger outlets have HUGE followings and many open detractors who associate their ouput with personal attacks on their favourite <insert sytem/game here>.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">However, I am a scientist. This is what I do and experience from within and without the community on an almost daily basis. It's not easy and I can appreciate that... and it's why many institutions shield their researchers from the ravenous hordes.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">On the other side of the coin, I also <i>want</i> and <i>need</i> to understand the output of such publications. As a scientist and as someone who generally likes to know they <i>why</i> of things, I also understand the people who want to question and reach their own understandings... So, having this important data obscured renders the output of all of these tech review publications into a "trust us" approach... which people tend to not enjoy, as <a href="https://linustechtips.com/topic/1451415-linus-messed-up-with-the-trust-me-bro-warranty-t-shirt/">Linus Sebastian found</a>...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">While I'm not a pitchforks and bile kind of person and I am pretty forgiving of circumstances, I really want the quality of hardware and software reviewing to improve even further than it currently is. I'm not "calling people or publications out" here. What I am doing is questioning the status quo.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Why? Because many publications have since regressed to displaying or discussing <u style="font-style: italic;">only</u> the average fps when speaking about different setups. It's not the complete picture and it's not really the best metric to be focussing on, in my opinion.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">I believe that my suggestions, in part or in full, can aid in public understanding of the performance of a given hardware configuration within the tested software. But it needs adoption and understanding... and explaination.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">So, I will now attempt to illustrate it with some examples...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi4FTaUbdNgxiwmcAsICsRJkNw-PtcnrphXwMJRxXcRR9QXReF6fDlsCOgPWMLOtwlZ6Naje6SNhwyCD3WlADNupsTM8ZqQ90CHIvQgdOm92jJJds-X9Wumg7hJSv77vDFrFQL4_zZr5nggeOz4VWlW4Bl2CFddzIpRS1-udxoV2IKrLJway4y4aDHP/s577/Intel%20_frametime.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="345" data-original-width="577" height="239" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi4FTaUbdNgxiwmcAsICsRJkNw-PtcnrphXwMJRxXcRR9QXReF6fDlsCOgPWMLOtwlZ6Naje6SNhwyCD3WlADNupsTM8ZqQ90CHIvQgdOm92jJJds-X9Wumg7hJSv77vDFrFQL4_zZr5nggeOz4VWlW4Bl2CFddzIpRS1-udxoV2IKrLJway4y4aDHP/w400-h239/Intel%20_frametime.PNG" width="400" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>That looks pretty messy - but this was before the v-sync issue was fixed and with RT on...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjYGphtPOl4TzER5rxGywKKebWAk6cZ0IQaHsbHgD4eCDZdlvNV_JfwcdV__oolpPuUvx__8yOPClD8dXXX585GNwBoqWYE4FpD2YUy5vypxuo-8wK18svO8FRuOfBxhwKBM5xVnxh3baZ5RhiGt9y9cV5sB3b401aJCo4zslmN82B0V5BwLCz7PzpA/s565/AMD%20_frametime.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="340" data-original-width="565" height="241" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjYGphtPOl4TzER5rxGywKKebWAk6cZ0IQaHsbHgD4eCDZdlvNV_JfwcdV__oolpPuUvx__8yOPClD8dXXX585GNwBoqWYE4FpD2YUy5vypxuo-8wK18svO8FRuOfBxhwKBM5xVnxh3baZ5RhiGt9y9cV5sB3b401aJCo4zslmN82B0V5BwLCz7PzpA/w400-h241/AMD%20_frametime.PNG" width="400" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>The AMD system fares no better, here...</b></td></tr></tbody></table><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">E.g. You're Wrong...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Let's take some of the data I generated from the Hogwart's legacy testing that I performed recently. This will be used as just an example since I know that with this testing, HL had a problem with improper frame presentation when v-sync was enabled, data streaming was not optimised, and RT had a big hit on the performance of the systems. I imagine that the situation is a bit different <i>as of today</i>.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><a href="https://docs.google.com/spreadsheets/d/1FsQ2zF26fxLiWzx4pbgqrrWfO82VtsaZgnO6OwkGDaM/edit?usp=sharing">All data is presented here</a> for your perusal.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">If we were to take the data from the frametimes, we would get the results in the "overall" section in both tables below. This gives us the <i>correct</i> average fps but, if we take the percentile values as the average of the associated frametimes, without taking each of them into context, we see pretty dire "percentile" fps results.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Taking a look at <i>where</i> the maximum frametime happens in the benchmark run, we are able to extract the second of data surrounding that datapoint. In the case of the Intel system it's at 14-15 seconds and for AMD, it's 29-30 seconds.</div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhfI7wR7DbAoVhfNeq7pSJ4sgeoe0VomsFehmTWD7ZQvZwFN2Dbj8mdyd37hXx6-KTWNmJzguoSz1EovE7vTLCixZ590GRyg96l2oeT384zpIXiAoayfdZK55xgzUaTk5uWpm9gSIC3FxmIqpQwAH7OgRAuciRub_v_JAVkoqMfwxS0zuJ8jm5vCaQN/s522/Intel_summary.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="522" data-original-width="374" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhfI7wR7DbAoVhfNeq7pSJ4sgeoe0VomsFehmTWD7ZQvZwFN2Dbj8mdyd37hXx6-KTWNmJzguoSz1EovE7vTLCixZ590GRyg96l2oeT384zpIXiAoayfdZK55xgzUaTk5uWpm9gSIC3FxmIqpQwAH7OgRAuciRub_v_JAVkoqMfwxS0zuJ8jm5vCaQN/s320/Intel_summary.jpg" width="229" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Data obtained on the Intel system...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Of course, looking at the average fps for this approxmiate 1 second shows a completely different reality to the 1% low previously reported for the average of all the data, using the singular frametime data points to do so...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">For the Intel system, we see an average of 50 fps and AMD 59 fps. So, in reality, there is <i>never</i> a point at which the player of this game on either system would experience a framerate of 18 fps or 21 fps. Stutters? Yes! Of course, we observe stutters. However, these are best communicated in frquency and severity in terms of frametime, not fps.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Then, finally, applying a moving average to the data - encompassing a 1 second period gives us the true, experienced reality <i><b><u>from the perspective of the graphics card</u></b></i>. Somewhere around 45-49 fps across the two systems for the 1% low fps values.</div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhRbd7U_VBjTCYk7DWSIfCOq-kTW_wiqqV_V5LcQTzuqEZF6fxU66Pk5955I4sFGi7tUFDNAXvbkMOAgqF46l7DbI8Iox3_Yxss2ncQeQ856hBkDtKl-f2gjtOjo5Frz2FshY7GKEMgaEZ6v6YXZ0gSM_Bs10sFIjo5nG7-4lZRtUJlWNFsSY8iQVm9/s522/AMD_summary.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="522" data-original-width="371" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhRbd7U_VBjTCYk7DWSIfCOq-kTW_wiqqV_V5LcQTzuqEZF6fxU66Pk5955I4sFGi7tUFDNAXvbkMOAgqF46l7DbI8Iox3_Yxss2ncQeQ856hBkDtKl-f2gjtOjo5Frz2FshY7GKEMgaEZ6v6YXZ0gSM_Bs10sFIjo5nG7-4lZRtUJlWNFsSY8iQVm9/s320/AMD_summary.jpg" width="227" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Data obtained on the AMD system...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Lastly, knowing that the experience of the user is not related to individual events (in most normal working conditions) - as defined by Gamer's Nexus as ± 8 ms between frames* - we can change our analysis to a moving average. I've chosen a sixty frame moving average for this data since I am targetting sixty frames per second. If I had more coding skills, I would average the number of frames every one second. Unfortunately, as it stands, I am performing all of this analysis manually so this rough approximation can be applied to this dataset to get a feel for its use.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Further refinements can, of course, be applied!</div><div style="text-align: justify;"><i><b><span style="color: #274e13;"><blockquote>*Personally, I feel like this is a bit tight and dependent on the display refresh rate (and whether the VRR feature is present or not)...</blockquote></span></b></i></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">What we immediately see is that the user experience from the standpoint of the GPU is much smoother than when taking the individual frametimes into account. There are <i>definitely</i> no periods were we are seeing 22 - 30 fps on either system. These are nonsense numbers that should not be communicated to the consumer of a review!</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">What we <i style="font-weight: bold;">DO</i> observe, though, is that the worst period of performance may not actually be where we thought it was... Consider the periods we defined before as the worst. For the Intel system, it was the 14 - 15 second period. For the AMD system, it was the 29 - 30 second period. The understanding for the Intel system was pretty much correct. However, the AMD system now <i>also</i> experiences the worst performance at this same period.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Clearly, from a logical standpoint, this makes sense.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The section of a game benchmark that is heaviest is irrelevant to the hardware you have to throw at it. How the hardware deals with that section matters... The other benefit of changing to a moving average is that it <u style="font-style: italic; font-weight: bold;">removes</u> the random frametime spikes we were observing in Hogwart's Legacy from the equation. They are an important data point for the user, for sure... but they are not related to an average experience.</div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEga7qwHdRst1KHYkxKIQQU3VTO1dOp44nTDXXIbQzy0wrUh24_7fmHuEmgSmdiPZTCslskUy2uSI7bAc-RHIEpDWl-iplMvkCpVLWu1khA2kbjY8wYrLTS5-MY_ep-OiPm3JQTESxGrjyUYoO0mRK-ECZzlDOhZ0VhLZPLjeYf7_Fe9aTr1nRKWfch-/s565/Intel%20_moving%20average.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="343" data-original-width="565" height="243" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEga7qwHdRst1KHYkxKIQQU3VTO1dOp44nTDXXIbQzy0wrUh24_7fmHuEmgSmdiPZTCslskUy2uSI7bAc-RHIEpDWl-iplMvkCpVLWu1khA2kbjY8wYrLTS5-MY_ep-OiPm3JQTESxGrjyUYoO0mRK-ECZzlDOhZ0VhLZPLjeYf7_Fe9aTr1nRKWfch-/w400-h243/Intel%20_moving%20average.PNG" width="400" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Sure, we get a lot of frametime spikes, but we can see that the outputted data from the graphics card per second is nowhere near as bad as those erroneously reported data would be...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgkaUqHcb-XaO9O-LNVHYUVA-Z61PCPZ5Cm16qI1Q-RTjpnZH9yTt-ZgNQkaBaJVMOhc8MU0kZ_Hubv1hE3ayUwDl6jZAR-02w_qoUDVoWkzQ6x5zDbJmUaHWMB1-KPOI5hoZm4v_Eqo-3sdTG6uba3GZETtvR8233TrJXuJdRXLJ6jG9b4xbV79cNH/s563/AMD%20_moving%20average.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="340" data-original-width="563" height="241" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgkaUqHcb-XaO9O-LNVHYUVA-Z61PCPZ5Cm16qI1Q-RTjpnZH9yTt-ZgNQkaBaJVMOhc8MU0kZ_Hubv1hE3ayUwDl6jZAR-02w_qoUDVoWkzQ6x5zDbJmUaHWMB1-KPOI5hoZm4v_Eqo-3sdTG6uba3GZETtvR8233TrJXuJdRXLJ6jG9b4xbV79cNH/w400-h241/AMD%20_moving%20average.PNG" width="400" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>On average the AMD system is performing better but has that huge bulge between 14-16 seconds... but that wasn't identified as the worst frametime experience?</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><div>The final problem with all this analysis, is that that we have some time between the frametime presented and to that actually displayed. There are, as Ryan Shrout points out in his explainer article linked above, times where rendered frames are discarded by the system as a whole because they fall between (or outside) of the required refresh rate of the display.</div><div><br /></div><div>This is where reviewers like Ryan/PC Perspective and the people at Digital Foundry and NX Gamer come into their forte with analysis tools, external to the system in question, where the experience of the gamer can be qualified. They have the absolute "ground truth" of the experience down to a science.</div><div><br /></div><div>But what if we could approximate that and simplify it for the general user in a way that would not require massive amounts of overlayed graphs? For me, that would enhance the data that other tech reviewers pump out at rates that defy anything that the more detailed and thorough analysis can maintain.</div><div><br /></div><div>Luckily, Frameview (the tool I am using) has an approximation for what is sent to the screen from the graphics card.</div></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg0TlZxbtXyajkED7-wSnGD4il1dvznuBpFUT7bBY41CYM-HZSRF_UXaUFYOHRoTzS77GOrQ04qAK4Hb2EjUxHFigMaIHz-vHCslvZG3U91DUdNv1-_EkHIr8mmAMVcXg_IDtKM3Ryjmk6Yve_2SwwpxI7rAETKcaM2-EAjb_I5TtmLtapbUKF33Ovc/s956/Intel_ground%20truth.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="635" data-original-width="956" height="426" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg0TlZxbtXyajkED7-wSnGD4il1dvznuBpFUT7bBY41CYM-HZSRF_UXaUFYOHRoTzS77GOrQ04qAK4Hb2EjUxHFigMaIHz-vHCslvZG3U91DUdNv1-_EkHIr8mmAMVcXg_IDtKM3Ryjmk6Yve_2SwwpxI7rAETKcaM2-EAjb_I5TtmLtapbUKF33Ovc/w640-h426/Intel_ground%20truth.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>The data for the Intel system, fitted with the RTX 3070...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;">This is as close as you can get to the absolute ground truth user experience without using external lossless capture hardware and post-processing analysis software. For my example, here, we can see that by defining both a specification limit of 3 standard deviations from the average frametime (USL/LSL) we can observe the areas of the benchmark run where we encounter the difficulties in experience presentation to the user in screenspace. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">i.e. Wherever the frametime differential spikes cross the dotted lines, we see worse presentation of the application to the user.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The Intel system is struggling to maintain a consistent output throughout the benchmark, with many skipped frames and periods where fluidity is lacking. The individual frames are not as much at issue, as prolonged skips where multiple frames are missed (e.g. around 14 - 19 seconds there are four periods were 3 sequential frames are missed).</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">By comparison, the AMD system manages the benchmark reasonably well, though I would say <i>just barely</i>! The system is running the edge of the USL/LSL throughout the benchmark run and, as a result, manages to maintain the presentation to the user observing the screen more consistently. (This is a more powerful GPU with more VRAM!).</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh_UONCz2LB1XbZQHtQH-LDwEAlGzryg94YwkDdJ8uE2L1b0j3z24IkUU7QfyThd2DTiTxtqueiO-OAFCCEm5tnuDFIoPtp4i8kSiyrLk_AYSc-L6p_KgZ3QqskgY0dlXdPl3w1gIhqkmdXTqBtC50_mCJ837XZ7xEzxouu22TT0Rhkq01ByulpOpsy/s943/AMD_ground%20truth.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="629" data-original-width="943" height="426" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh_UONCz2LB1XbZQHtQH-LDwEAlGzryg94YwkDdJ8uE2L1b0j3z24IkUU7QfyThd2DTiTxtqueiO-OAFCCEm5tnuDFIoPtp4i8kSiyrLk_AYSc-L6p_KgZ3QqskgY0dlXdPl3w1gIhqkmdXTqBtC50_mCJ837XZ7xEzxouu22TT0Rhkq01ByulpOpsy/w640-h426/AMD_ground%20truth.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>The data for the AMD system, fitted with the RX 6800...</b></td></tr></tbody></table><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">I have also defined a control limit (UCL/LCL) for the process as the median of the frametimes ± 3*stdev of the frametime differential. This is something which is arbitrary and could be tightened but not below the baseline 3 stdev from the frametime average - this is more just a tool to see if the process is in control. We would say that it is not in this scenario...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">As I mentioned above, we cannot present all this information for comparisons of multiple graphics cards - it just gets too messy. It's fine for individual tests or comparisons. However, by measuring excursions below and above the USL/LSL, we can adequately quantify the performance of the benchmark data for each piece of hardware.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">So, Where Does That Leave Us...?</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Using these methods and types of analysis, I believe that we are able to approximate the ground truth user experience with minimal investment of hardware, and with the appropriately designed automated spreadsheet or application, the analysis could be performed for the user, themselves.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">An application like <a href="https://www.capframex.com/">CapFrameX</a> could easily incorporate these ideas or, at the very least, correct the issues with data presentation that currently exist in the application's current form.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">So, what would presentation of review data look like with these ideas in mind? Well, I've drawn up a simple bar chart below to present this concept. The average fps are unchanged, but the minimum fps are now incorporated as an averaged statistic - meaning that the user will experience this fps over the course of 2 seconds (1 second moving average). The maximum frametime spike is presented to show the absolute amount of stutter that could be expected and the consistency of the presentation in frametimes is represented by the deviations above or below the USL/LSL (3 standard deviations from the average).</div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgtko_nKunNnvQkjhBGCznVPzMoZokGeMBR52bQ0whi7LdFVl75MPmc2blJwBa4qXw9AGuaAG7O0000lFWuUFntmdLrD8Z8JmGc5OhRH5jPsVHx5li66MC1bRsybPh4YVYE9Mf9KnNFXznFXbZmDB6bW5-JI2o0U_3NnS614GdewN_uh-W-ct7i_9nP/s562/summary_example.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="344" data-original-width="562" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgtko_nKunNnvQkjhBGCznVPzMoZokGeMBR52bQ0whi7LdFVl75MPmc2blJwBa4qXw9AGuaAG7O0000lFWuUFntmdLrD8Z8JmGc5OhRH5jPsVHx5li66MC1bRsybPh4YVYE9Mf9KnNFXznFXbZmDB6bW5-JI2o0U_3NnS614GdewN_uh-W-ct7i_9nP/s16000/summary_example.PNG" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>This bar chart covers all the major aspects of the testing - highest frametime spikes, average fps, minimum experienced fps, and process controls...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;">In order to make sense of this new presentation style, I will now go through the summary of this data in order to give a simulated reviewer's context:</div><blockquote><div style="text-align: justify;"><span style="color: #274e13;"><b><i>Overall, the AMD system performs better, with a higher average fps closer to the system refresh rate of 60 Hz. Minimum experienced fps is actually better on the Intel system, though stutters are also considerably worse on that system as well. The AMD system is more performant, with consistently lower frametimes than required to maintain a 60 fps average, though it is - overall - not achieved in this benchmark.</i></b></span></div></blockquote><div style="text-align: justify;">And that, is where we leave this point in my understanding. This isn't the be-all and end-all - this is a journey to improvement. I don't consider myself 100% correct - some of this is open to interpretation, as much as the human experience is. However, I hope that I have demonstrated some aspects of current hardware and software testing methodology that is, in my opinion, lacking and actually incorrect.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Maybe this post will inspire or inform someone else to actually change their interpretation of data. I can only hope!</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Until next time...</div></div>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-7560610393342650347.post-58610452635975773132023-03-05T19:49:00.006+00:002023-03-05T20:49:39.366+00:00Analyse This: The Technical Performance of Hogwart's Legacy...<div style="text-align: left;"><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh9iaGr_QAjhZkL4JduOOyGAhD24wKXgpaSevhd7mORPDIOUbxZnepc0V0FBipbgscDHu0xTh93nllnjjyrxrWOs0xVRFeOWNqGt99wh5BNVBbwaq0kwh-jkEssClKg7fc-b54kWFF7weWnzSMiZ-OcQollLDscCZ62BO-oju98Q87z9JSIqEpbd3hO/s1920/Title.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="1080" data-original-width="1920" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh9iaGr_QAjhZkL4JduOOyGAhD24wKXgpaSevhd7mORPDIOUbxZnepc0V0FBipbgscDHu0xTh93nllnjjyrxrWOs0xVRFeOWNqGt99wh5BNVBbwaq0kwh-jkEssClKg7fc-b54kWFF7weWnzSMiZ-OcQollLDscCZ62BO-oju98Q87z9JSIqEpbd3hO/w640-h360/Title.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"></td></tr></tbody></table><br /><div style="text-align: justify;">I've never really been a fan of the Harry Potter series - I've never read the books, though I have seen most of the films and thought that they were okay. However, I was immediately interested in Hogwart's Legacy when I saw the game for the first time due to the graphical effects used (i.e. ray tracing) and also once I saw the recommended system requirements for the game. I wanted to test it to see <i>why</i> the requirements were so relatively high and to see how they run.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">This isn't a review of the game - I might do one of those at a later point, this is a look at how the game scales with certain system resources...<span><a name='more'></a></span></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">Test System and benchmark...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Hogwart's Legacy ("HL" from now on!) doesn't feature an in-game benchmark that can be used to make a comparison run between different hardware setups. However, many different reviewers have identified Hogsmeade village as one of the more strenuous sections in the game due to the number of NPCs and other graphical effects going on. I've chosen an arcing route from the southern bridge entrance, through the streets around to the left where the potion shop is located. It's approximately 32 seconds worth of data each run, all-in-all but can give us a decent look at how the game performs.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The important thing to note about this testing is that both DLSS and FSR 2.0 are active - running at the Quality setting. The game settings are set to high, with all RT options enabled, with material and texture quality set to ultra for the RX 6800 and with only the material quality set to ultra for the RTX 3070 (due to its lower quantity of VRAM).</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">One last thing to note about the game settings is that V-sync is enabled - and both monitors are set to a 60 Hz refresh rate*.</div><blockquote><div style="text-align: justify;"><span style="color: #274e13;"><b>*Since we're using all RT features, and FSR/DLSS, we are not guaranteed to manage to keep this rate of frames consistent so it is a good test...</b></span></div></blockquote><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The test systems are as below:</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><ul><li>Intel i5-12400</li><li>2x 8 GB Patriot Steel series DDR4 4400 (Samsung b-die)</li><li>Geforce RTX 3070</li></ul><div><ul><li>AMD Ryzen 5 5600X</li><li>4x 8 GB Corsair Vengeance LPX DDR 3200 (Samsung c-die)</li><li>Radeon RX 6800</li></ul></div></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">I will be using the RAM configurations that <a href="https://hole-in-my-head.blogspot.com/2023/01/analyse-this-does-ram-speed-and-latency.html">I identified last time</a> in my analysis of the best settings in Spider-man (ray tracing). Both graphics cards are slightly overclocked, undervolted and power limited (both perform above stock in these configurations). The CPUs are running at stock settings.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">What I hope to show with this testing is where the bottlenecks are for this game and perhaps guide other players in how to optimise (or what to change) in the cases where they feel like they aren't achieving the performance they would like. Additionally, I will show the ability of this hardware, using these settings to be able to maintain an approximate 60 fps experience for the user.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">I feel like this is an under-represented type of analysis, with only <a href="https://www.youtube.com/channel/UC7Jo0VTzeyYbZ8cVk3k-EhA">a couple</a> <a href="https://www.youtube.com/@DigitalFoundry">of outlets</a> focussing on this aspect - most look at raw, <a href="https://youtu.be/qxpqJIO_9gQ">unlimited performance</a> <a href="https://www.techpowerup.com/review/hogwarts-legacy-benchmark-test-performance-analysis/6.html">of GPU hardware</a>, without any eye towards <a href="https://www.dsogaming.com/pc-performance-analyses/atomic-heart-pc-performance-analysis/">stability of the presentation</a> on more realistic configurations... or the graphical effects are analysed, <a href="https://www.gamesradar.com/hogwarts-legacy-fidelity-vs-performance-graphics-modes/">without any real view towards performance</a>.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhzbv_rfKZbTQS0emUOaH9lIjdVcWl-OtUZXsAu6bUop2vLwV8Hx9eigx7Du6UvKfNmacW3iS_G-vZnBdhsQDVXhuUor_qIbS9xy_Uw3-8l5CVn0PTwnq7XLccS0ncttGSrcn4V5pfp1b-jvv0l0FSXJf7_udkzdCofQScgJwDg8sSGsfMkqyQG2tT_/s1920/20230212011658_1.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="1080" data-original-width="1920" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhzbv_rfKZbTQS0emUOaH9lIjdVcWl-OtUZXsAu6bUop2vLwV8Hx9eigx7Du6UvKfNmacW3iS_G-vZnBdhsQDVXhuUor_qIbS9xy_Uw3-8l5CVn0PTwnq7XLccS0ncttGSrcn4V5pfp1b-jvv0l0FSXJf7_udkzdCofQScgJwDg8sSGsfMkqyQG2tT_/w640-h360/20230212011658_1.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>RT shadows, light sources and high texture and material qualities, all in combination with a well-chosen near-realistic art style make the game seriously good-looking...</b></td></tr></tbody></table><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">Visual Design...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">HL is a heavy game - especially with raytracing enabled - and was not without its share of bugs at launch. Lower quality settings for materials, textures, and models result in some pretty poor visual experiences but luckily these have been fixed with patches released just after launch and it is clear, now, that the game can visually scale <i>incredibly</i> well.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The game features densely packed interiors and quite a large open world with a good draw distance and a primary feature of the game which I find particularly 'next generation' in terms of experience is the number and detail of NPCs in the various locations of the game world. Both the castle and the villages feel alive in a way that many last gen and older games did not. Sure, Spider-man has a LOT of generic NPCs walking around, fading into and out of existence, but these are copy-pasted entities of pretty low quality in terms of materials and textures, and you will easily spot repeating instances of NPCs. HL has a combination of randomly generated NPCs and hand-crafted NPCs, which you can run into repeatedly throughout the game.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Combined with the high level of geometric and texture detail to make those NPCs and environments look good, ray tracing options for shadows, reflections, and light bouncing put a real strain on system resources which have historically been neglected by both Nvidia and AMD on lower-end hardware. This is the essential crux of why the system requirements may be considered so high by many, but in my opinion, HL is one of the first truly next gen experiences and, while not perfect, the game is asking a lot from the PC hardware it is running on.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgmEL4PbXVRkcPRz0OrZK7GhLRRSbbvDJxFATBmnvWRKqWcDc-Y1_fYZIX3Moi3zmcx6ERHRE6T9dNtUNCs73iaXMb5_Tdu-cYHFQbKOHvotjbrL98Wtp_4RUX-xCjqOUgV1QeP_4uTfV3FYukw_fjKMlpsme83T1s51k9kbUWHI2u9WjDfjxTrNsZL/s3840/RT%20comparison.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="1080" data-original-width="3840" height="180" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgmEL4PbXVRkcPRz0OrZK7GhLRRSbbvDJxFATBmnvWRKqWcDc-Y1_fYZIX3Moi3zmcx6ERHRE6T9dNtUNCs73iaXMb5_Tdu-cYHFQbKOHvotjbrL98Wtp_4RUX-xCjqOUgV1QeP_4uTfV3FYukw_fjKMlpsme83T1s51k9kbUWHI2u9WjDfjxTrNsZL/w640-h180/RT%20comparison.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Without ray tracing enabled, the game looks dull, flat, and washed-out...</b></td></tr></tbody></table><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Quite honestly, I believe that the ray tracing (RT) implementation in HL is almost vital to the visual make-up in practically every scene. I've seen many games where the RT implementation results in very slight differences between that mode and the purely rasterised presentation, with Spider-man being a standout with respect to the RT reflections available in that game. However, while I find it unfortunate that the RT reflections in HL are muddier and lacking clarity in comparison with the implementation in Spider-man, overall, in combination with the dynamic weather and time of day systems it gives rise to visual experiences that are at the very top of the game in the industry today - assuming you have the hardware to run the experience at an acceptable level.</div><div style="text-align: justify;"><div><br /></div><div><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgBS5qAmadRApw0aNhpJhehpy-A4F_gfN3VSFl-KtJeHevLCxkdTFNlTR0vWRAAbBn_fm7IdCQdi1an_csnOS0hgzGOIFCZ1-usal-knDZrbjTYrk6SPnpJVlIvbQIhHQ8gp9l9fjl_zgntC4CKhZjOnxl6dvJUm_9MC0Qh1anYgfs4jDmeNTFtRHZD/s3840/RT%20comparison%203.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="1080" data-original-width="3840" height="180" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgBS5qAmadRApw0aNhpJhehpy-A4F_gfN3VSFl-KtJeHevLCxkdTFNlTR0vWRAAbBn_fm7IdCQdi1an_csnOS0hgzGOIFCZ1-usal-knDZrbjTYrk6SPnpJVlIvbQIhHQ8gp9l9fjl_zgntC4CKhZjOnxl6dvJUm_9MC0Qh1anYgfs4jDmeNTFtRHZD/w640-h180/RT%20comparison%203.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Spider-man's reflections were impressive on a level I don't think I've seen before...</b></td></tr></tbody></table><br /><div>Added to this, the RT ambient occlusion (RTAO) and shadows/reflections are so superior to the terribly distracting screen space alternatives that I am willing to suffer the increased amounts of stutter and judder (due to data management), and slight softness of the image due to upscaling, along with lower overall performance in frames per second from enabling all three RT techniques.</div><div><br /></div><div><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi8B0ZO0gkIzxWbCWjPh9jgvMvBnf-GB38bAVAHDm5cIdDf5dnB2FmRqGuLorckCJrjgT1PQu9ZCMJQ-purisuawbCPOIejN92NNmAdALpLTXJHon1cAUkgyLLN5-h3uknkaZIzpX_xEyNurUatckvkdLZWHCi4SkWjvzMgsi_bxBHS2IWgnJZGlb_B/s3840/RT%20comparison%202.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="1080" data-original-width="3840" height="180" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi8B0ZO0gkIzxWbCWjPh9jgvMvBnf-GB38bAVAHDm5cIdDf5dnB2FmRqGuLorckCJrjgT1PQu9ZCMJQ-purisuawbCPOIejN92NNmAdALpLTXJHon1cAUkgyLLN5-h3uknkaZIzpX_xEyNurUatckvkdLZWHCi4SkWjvzMgsi_bxBHS2IWgnJZGlb_B/w640-h180/RT%20comparison%202.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>The ambient occlusion is exceptionally better than the traditional methods... Just look at those trowels!</b></td></tr></tbody></table><br /><div>I've casually mentioned elsewhere that I believe that <i>this</i> game, out of all the games that "core" gamers might want to play, will likely be the one that could encourage a large population of people to upgrade their PC hardware. Back in the day, that would have been a game like Quake or, later, Half Life/Crysis... but these days, we really need a knock-out property with huge mainstream appeal to affect the needle in terms of more modern hardware adoption.</div><div><br /></div><div>But what do users need to upgrade to? What is important for this game when aiming for the best experience*? I'm choosing to test at the high-end of the settings options as the bottlenecks become more obvious at that point.</div><blockquote><div><b><span style="color: #274e13;">*Good quality visuals, combined with a good framerate.</span></b></div></blockquote><div>First up, let's take a look at the GPU scaling in Hogwart's Legacy. Then move onto the RAM scaling.</div><div><br /></div><div><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj3Qkre5HsCEH8VXIGjPiQ_zUrVFhiw9X9JXpGXabscIpYwB2ddqpK6hs_3EZ1PGwQG9pBK5zWe4htkN9vOEHtEfMSZhKYpbej0F2Ct_QN9tIPWWXShFopPYwFWp-QIVcvVfpI6xddCgBdkH6mb7UHWwDUneMf5YPqOm4hEX7zRVzHawSLQMMTMRxpV/s855/Nvidia%20frametime%20summary.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="207" data-original-width="855" height="155" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj3Qkre5HsCEH8VXIGjPiQ_zUrVFhiw9X9JXpGXabscIpYwB2ddqpK6hs_3EZ1PGwQG9pBK5zWe4htkN9vOEHtEfMSZhKYpbej0F2Ct_QN9tIPWWXShFopPYwFWp-QIVcvVfpI6xddCgBdkH6mb7UHWwDUneMf5YPqOm4hEX7zRVzHawSLQMMTMRxpV/w640-h155/Nvidia%20frametime%20summary.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>A small comparison between day/night, as well as power/memory frequency scaling for the RTX 3070...</b></td></tr></tbody></table><div><br /></div><div><br /></div><div><br /></div><div><h3><span style="color: #274e13;">GPU clout: Nvidia's Nightmare...</span></h3></div></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">As I mentioned before, I'm doing a run in Hogsmeade. However, it's important to note that I'm performing this run in the daytime. With ray tracing enabled, I observed a difference in strain on the system equipped with the RTX 3070 between day and night.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">It's possible that this is because in the day, we appear to mostly have a point light source in the form of the sun. Whereas, in the night, we're looking at multiple light-casting objects (in the form of candles, lamps, etc.) which may eat into the hardware budget of the RTX 3070's 8 GB of VRAM.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">I've not seen another outlet mention this difference - nor at what time of day they are testing. Anyway, just a note - night appears to be the worst case scenario for VRAM-limited graphics cards and I'm seeing a drop of approximately an average 6 fps on my RTX 3070, but the stutters the player experiences are much worse during play*: an effective timespan of 5 missed frames at an expected 16.666 ms frametime target (60 fps) is quite an evident stutter!</div><div style="text-align: justify;"><span style="color: #274e13;"><b></b></span><blockquote><span style="color: #274e13;"><b>*Though, inconsistent - which I will get to in a little bit...</b></span></blockquote></div><div style="text-align: justify;">Regarding the capability of the GPU - back when I described the <a href="https://hole-in-my-head.blogspot.com/2022/06/the-power-curve-of-rtx-3070-and-ampere.html">power scaling of my RTX 3070</a>, I showed that at 50% performance, we are approximately at the level of a stock RX 6700 XT/RTX 3060 Ti. At the 85% level, with the undervolt, I am at essentially the stock performance for the RTX 3070, only with less power drawn and less heat produced. The recommended specs ask for an RX 5700 XT at high quality settings to enable a 1080p/60fps experience - which is approximately 25% slower than an RTX 3060 Ti.</div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg3ClvgmAUm8WzG0nLIXsMVb05XQwIc4NP9FCHIexTzfKDWYS746qycDxA1-I3una5c8GgOv3-DiWdOKaz_5foetIFaX-XaEiMCmWowBUjugWxGxzUyyD-RapGE871zbqnxtfLbO_94i_ytwO2DWHTGVy-zvg3nQ2sP2Wf_SvKNbDgB3bzxrpz0S8Ax/s537/Nvidia%20bar%20chart.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="312" data-original-width="537" height="233" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg3ClvgmAUm8WzG0nLIXsMVb05XQwIc4NP9FCHIexTzfKDWYS746qycDxA1-I3una5c8GgOv3-DiWdOKaz_5foetIFaX-XaEiMCmWowBUjugWxGxzUyyD-RapGE871zbqnxtfLbO_94i_ytwO2DWHTGVy-zvg3nQ2sP2Wf_SvKNbDgB3bzxrpz0S8Ax/w400-h233/Nvidia%20bar%20chart.PNG" width="400" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>This is the same data as in the table above...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;">However, when we look at the individual values in the table above, we see that a "stock" 14 Gbps RTX 3060 Ti is performing better than a "stock" 3070 in every primary metric other than those damnable frametime spikes. That doesn't make sense!</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Let's take a look at the effect of video memory speed on the values trended: we are sometimes seeing a very slight regression in the faster memory configurations but the standard deviation and frametime spikes are, once again, tighter/reduced - indicating an overall better performance because we're seeing fewer stutters. So, clearly the increased frequency is doing <i>something</i> beneficial - even if, as many outlets do, we looked at the average and 1% or 0.1% lows*, we'd be saying the performance is worse...</div><div style="text-align: justify;"><div><blockquote><span style="color: #274e13;"><b>*This may be presented in a slightly confusing manner, here. Because I'm labelling the 1% low (fps) as the 99th percentile (high frametime).</b></span></blockquote></div><div><br /></div><div>Sequential frame presentation is also improved, with fewer spikes observed with the faster GPU configuration and VRAM frequency. This can be observed in the graphs below and is <i><b>exactly</b></i> the reason why I started doing this sort of analysis in the first place - you just <i><u>cannot</u></i> get all the information from a static number like average, minimum or percentile figures. There is more going on than just those in the moment to moment experience...</div><div><br /></div></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgej6zK4Mca_urxuGfFKWbKT0kAcsZYOdN3DqnDPNFuJB5JMG3yr2WmFqaVrAaTV9oqKlx0KMUUffu_I3jz0dzg8puKg7SAZ5uWYA7U1ku9da86RlJcXF_aY7xszFfSAZhSLxCTzpbdaY2IkdIp6HvcyuTWqtUhxxghYAPum-lMD9Ox6MmpxwKkpItE/s1074/Frametime_day_night.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="927" data-original-width="1074" height="552" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgej6zK4Mca_urxuGfFKWbKT0kAcsZYOdN3DqnDPNFuJB5JMG3yr2WmFqaVrAaTV9oqKlx0KMUUffu_I3jz0dzg8puKg7SAZ5uWYA7U1ku9da86RlJcXF_aY7xszFfSAZhSLxCTzpbdaY2IkdIp6HvcyuTWqtUhxxghYAPum-lMD9Ox6MmpxwKkpItE/w640-h552/Frametime_day_night.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>For the RTX 3070: Ignoring frametime spikes, there's not a big difference between the two performance targets except that it can be seen that both increased GPU power and memory speed result in less inconsistent sequential frame presentation...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><div>Getting back to those inconsistencies in the numbers - what could be happening?</div><div><br /></div><div>Well, what seems most likely to me is that, while raw GPU performance and VRAM speed are helping in terms of smoothness, the bottleneck is elsewhere in the system - causing instances where the GPU/VRAM are waiting for data to operate on and this is occurring, essentially, at random times. Meaning that, on one run, we might not see a spike, on another we might see two (given the short duration of the testing period). </div><div><br /></div><div>If we are seeing that the frametime spikes are reduced by increasing GPU performance and memory speed, and that the overall presentation is becoming a little more consistent, but that there is a slightly lower overall average performance, that would indicate that we need to provide things to the GPU in a faster/more consistent manner. i.e. the GPU is not the absolute bottleneck for this game.</div><div><br /></div><div><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh6wDOB5MMrldsOG-s1HzEZrjLvw_hOBZGtsWj31HWJbBFLPU7p-YcvJGbKAtU8Gbw70NT9zRfOgXlWHx1vYT0gv1nMD70Yh3UyG-CIdoBedtZD93k7oV2BRfCkOZlst60wqyVD3ri1HfwWy7f0NTd--ab_DgMFLH3HssVhITnnzZCJM50H_tcqh78C/s747/AMD%20frametime%20summary.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="164" data-original-width="747" height="141" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh6wDOB5MMrldsOG-s1HzEZrjLvw_hOBZGtsWj31HWJbBFLPU7p-YcvJGbKAtU8Gbw70NT9zRfOgXlWHx1vYT0gv1nMD70Yh3UyG-CIdoBedtZD93k7oV2BRfCkOZlst60wqyVD3ri1HfwWy7f0NTd--ab_DgMFLH3HssVhITnnzZCJM50H_tcqh78C/w640-h141/AMD%20frametime%20summary.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>We see a similarly confusing story for the RX 6800 as we did with the RTX 3070...</b></td></tr></tbody></table><br /><div><br /></div><div><br /></div><h3><span style="color: #274e13;">AMD's Saving Grace...</span></h3><div><br /></div><div>Moving over to the RX 6800, we observe similarly strange behaviour as we did with the Geforce card though, as expected for a stronger graphics card, results are improved somewhat. </div><div><br /></div><div>Back in my <a href="https://hole-in-my-head.blogspot.com/2023/02/the-power-curve-of-rx-6800-and.html">power scaling article</a>, I found I could push my card from stock to around 9% faster by increasing the frequency and memory speed a little. I didn't push the memory to the maximum in the slider, but I have also included some testing to see if the excess memory speed is actually helping in this particular title.</div><div><br /></div><div>To add some value to the testing, the types of tests are slightly different between the two setups due to the different possibilities offered by the software controls available to the user... i.e. I'm not so much interested in testing a lower performing part, here. Instead, I wanted to look at frequency scaling.</div><div><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhQRO1OyMaVQsj69IswYS4O2vv_DIDrwCUgJtLlgAiRxjry0iW7SRlpF6oriFFVhD39pwb3f3m1XQftzl0WlJBflHPSPmb4zpmvPChXjDMa6Z02Jf1ARvrSBbtP1XXAASNmKQ7HsT7n6cZwKeF8lnKpLAL7r9x0Y5zXV0GIIfesdLL-F16UJhtWfvU8/s535/AMD%20bar%20chart.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="310" data-original-width="535" height="231" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhQRO1OyMaVQsj69IswYS4O2vv_DIDrwCUgJtLlgAiRxjry0iW7SRlpF6oriFFVhD39pwb3f3m1XQftzl0WlJBflHPSPmb4zpmvPChXjDMa6Z02Jf1ARvrSBbtP1XXAASNmKQ7HsT7n6cZwKeF8lnKpLAL7r9x0Y5zXV0GIIfesdLL-F16UJhtWfvU8/w400-h231/AMD%20bar%20chart.PNG" width="400" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>The data from the table above, plotted. As with the RTX 3070 results, it doesn't really make a lot of sense...</b></td></tr></tbody></table><br /><div><br /></div></div><div style="text-align: justify;">I would like to be able to say that I was no longer observing any frametime spikes, but we have already deduced from multiple runs on the RTX 3070 that these are not originating from the GPU so, while the max frametimes listed in this table <i>appear</i> lower than on the Nvidia part, it was merely the luck of the draw, as you will see in just a moment.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">In the interim, the frametime differential plots show us that we are getting more consistent sequential frame presentation with increasing memory frequency (and bandwidth) - even without the extra GPU performance granted by increasing the core clock frequency.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjg1dwDdE03AfW1uj8X0pivFS_cNhWvRFSxUyiKEyCcpwUKZWoq8duCVCtuBIl1wCSaxicxLuRyMwllq9VZgY-dhbKjmYfBU6qsFidnQmJpt-U-rFYOatJ9gQSr93ocq2yLAj9sYktWznhU1sw2MC4PfV0TGMq6nwjCA_GVT-6YlngwD1hGN0k7zQ7a/s1072/Frametime_day.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="920" data-original-width="1072" height="550" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjg1dwDdE03AfW1uj8X0pivFS_cNhWvRFSxUyiKEyCcpwUKZWoq8duCVCtuBIl1wCSaxicxLuRyMwllq9VZgY-dhbKjmYfBU6qsFidnQmJpt-U-rFYOatJ9gQSr93ocq2yLAj9sYktWznhU1sw2MC4PfV0TGMq6nwjCA_GVT-6YlngwD1hGN0k7zQ7a/w640-h550/Frametime_day.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Aside from being very lucky not to get many spikes in these tests, we can see the effect of increasd video memory speed (and thus bandwidth) on narrowing down the inter-frame lag...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The question now, is whether we see the same hiccups we saw at night when testing the RTX 3070. The answer is, "no". We see no issues with VRAM occupancy on the RX 6800 - the 16 GB video memory is keeping us in good stead in this regard...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">However, we are observing the <i>opposite</i> of what we saw during the daytime. We see a tighter sequential frame presentation with the slower memory configuration. Now, I'm not sure if this is down to errors being introduced as the memory is clocked faster, or whether there is some other aspect that I'm not understanding... but the essential take away is that we're not seeing a difference when faster memory is present on the RX 6800.</div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjoGSvzcBTYlulazea3ILEfNXiXDlUNXYdixAqGL06WTXV_hyYF12iVp_-U_j99BiKXeXJ6zr_0skG18NYjNJzRUcOqKT9gwLdoxHua_q4J6T-kdrFTtyeNOQUo5M0tk0VYsj0X5BJ9Mh3GhyTOfFwUk2ju4OfqdrtWZlmQruwqGi6-6_GhDjcAHvLl/s765/AMD%20frametime%20summary_NIGHT.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="105" data-original-width="765" height="88" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjoGSvzcBTYlulazea3ILEfNXiXDlUNXYdixAqGL06WTXV_hyYF12iVp_-U_j99BiKXeXJ6zr_0skG18NYjNJzRUcOqKT9gwLdoxHua_q4J6T-kdrFTtyeNOQUo5M0tk0VYsj0X5BJ9Mh3GhyTOfFwUk2ju4OfqdrtWZlmQruwqGi6-6_GhDjcAHvLl/w640-h88/AMD%20frametime%20summary_NIGHT.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>There's nothing to say, here. They're essentially equivalent by these metrics...</b></td></tr></tbody></table><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjqU2mn5XXaeWwQUKVjus_OlWHk9JLoKr8gALWbP2bZ6fcYjO9R-XSbydgYl0bOBTzunsJvrlcG6-XVYyPRV0ltnb4tMPY4zCUjcVtlQ_7HG0IH_ENwomZr-7UZKLSr10vBZmphztboyT4uPozw3a6UVCUd5tIqL5zJ_bQY-vT-ffDHtFj5ia5xrseg/s919/Frametime_Night.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="919" data-original-width="535" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjqU2mn5XXaeWwQUKVjus_OlWHk9JLoKr8gALWbP2bZ6fcYjO9R-XSbydgYl0bOBTzunsJvrlcG6-XVYyPRV0ltnb4tMPY4zCUjcVtlQ_7HG0IH_ENwomZr-7UZKLSr10vBZmphztboyT4uPozw3a6UVCUd5tIqL5zJ_bQY-vT-ffDHtFj5ia5xrseg/w372-h640/Frametime_Night.jpg" width="372" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Backwards results? I'm at a loss to explain the difference observed between night and day...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">To Cap It All Off...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Going back to the point that the frametime spikes can strike any time and anywhere: note the 102 ms spike in the data below...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">While searching for an answer to the thorny issue above, I came across something interesting. Enabling v-sync within the game, appears to introduce quite a lot of frame scheduling overhead when ray tracing is enabled. When v-sync is disabled, sequential frametimes become much more consistent! The unfortunate side effect of this is that on a non-variable refresh rate display, the experience for the player is absolutely horrendous because of the screen tearing...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">This is a shame because, I'm seeing an improved average framerate performance here... meaning that I <i>could </i> be having a better experience!</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh-Eu81A4w0_ifOV5c91TD8Wqj8TKK9jYCVYBG5I_Qpx6SY5nbFlRAzfzdHygB6AOBqFn-7DZVdC0Wn-3WkfAYDXZpiyFzPNQBabDAiOzM2KXpRCildFb9XlmCKAYoJb35V_kug0VPw30q3dAvoZmRuy44O4FbHyURfx6bVCHW868yeCIZOSV02EedI/s894/AMD%20frametime%20summary_frame%20cap%20latency.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="153" data-original-width="894" height="110" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh-Eu81A4w0_ifOV5c91TD8Wqj8TKK9jYCVYBG5I_Qpx6SY5nbFlRAzfzdHygB6AOBqFn-7DZVdC0Wn-3WkfAYDXZpiyFzPNQBabDAiOzM2KXpRCildFb9XlmCKAYoJb35V_kug0VPw30q3dAvoZmRuy44O4FbHyURfx6bVCHW868yeCIZOSV02EedI/w640-h110/AMD%20frametime%20summary_frame%20cap%20latency.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Without ray tracing enabled, the framerate cap</b></td></tr></tbody></table><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhXlOcb58sH2f-PK1u0hU4sj_yiKPDxfK0sccCxz9ZobnBccXUTRZEWvAzcEZjkJt69S5sSUpTvdh0N1nyozdJAktRet7sPJof-I6TTyIhXJtwyf_V8horEiPMTwxihLaLfZJWwytRRjd02wxPOUpBNW7XIkKaRzVWDJR9jZSvLkOO6MuTAY1UtGxCZ/s1071/Frame%20cap_no%20RT.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="617" data-original-width="1071" height="368" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhXlOcb58sH2f-PK1u0hU4sj_yiKPDxfK0sccCxz9ZobnBccXUTRZEWvAzcEZjkJt69S5sSUpTvdh0N1nyozdJAktRet7sPJof-I6TTyIhXJtwyf_V8horEiPMTwxihLaLfZJWwytRRjd02wxPOUpBNW7XIkKaRzVWDJR9jZSvLkOO6MuTAY1UtGxCZ/w640-h368/Frame%20cap_no%20RT.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>The framerate cap introduces a lot of overhead in terms of sequential frame presentation when ray tracing is enabled...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;">As part of this testing, I also discovered that in non-ray tracing situations, the frametimes are consistent with the "RT on and v-sync disabled" scenario... and achieve a smooth 16.666 ms frametime, when v-sync is enabled. This means that the game can achieve much better scheduling, along with the better presentation of zero screen tearing!</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">While I might be wrong, I believe that this indicates a CPU bottleneck due to the extra demands driven by ray tracing when trying to schedule rendered frames to match the v-sync setting (60 Hz in this case)...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgYNJKAL4SqBsX9jjZhAm-yUVzUWFvssYKOgaL7yQxpUgkE2kylZoXaTWJPXcQaQzbI0KbWygu0tK5088oh3CY0ThnqpmcmqDHwib7B6zyidO0Vc7y9htuwQ1QO5x-NB4c7Tkpp88N-CrXbQm2V5i9AenK7KosUihtXgJjrsvbjn9H6yqla0V3uF-e1/s533/Frametimes.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="148" data-original-width="533" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgYNJKAL4SqBsX9jjZhAm-yUVzUWFvssYKOgaL7yQxpUgkE2kylZoXaTWJPXcQaQzbI0KbWygu0tK5088oh3CY0ThnqpmcmqDHwib7B6zyidO0Vc7y9htuwQ1QO5x-NB4c7Tkpp88N-CrXbQm2V5i9AenK7KosUihtXgJjrsvbjn9H6yqla0V3uF-e1/s16000/Frametimes.PNG" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Frametimes compared between the various DDR4 frequencies tested..</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">RAM Speed...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Like the open world of Spider-man, the game relies heavily on asset streaming to get all this graphical data into RAM/VRAM, and on RAM/PCIe bus speed and bandwidth to deal with the ray tracing calculations more effectively. Consequently, and given we've seen odd behaviour in our GPU performance scaling tests, I am going to explore the effect of both quantity of RAM and RAM bandwidth/speed in order to see their importance on stability of presentation for the end user.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">This testing was performed on the RTX 3070 system with ray tracing enabled.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Unfortunately, because the frametime spikes are effectively random, it is difficult to draw a definitive conclusion from this testing - the absence of spikes in any given test, does not mean that there are no spikes present during the gameplay of that configuration, just that we did not capture one.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjOHf9-9MIYZVbgWWiwLIAIhEjfS_500q6sOpmNquT2-B03xvwxiuhp7ucSf8gwlzk3UBCKsgi2oVsaqetceQBQYj0PgEnFAKq7diZSHw-X6bKOe13pIrsfwm4UzcDhtBQJuIksLVl4nFAlkTef5e9cqDpVJDsi2hI8FHenHNrNAK0vCBRxnKUW_s6K/s1069/Memory%20frequency%20scaling.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="930" data-original-width="1069" height="556" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjOHf9-9MIYZVbgWWiwLIAIhEjfS_500q6sOpmNquT2-B03xvwxiuhp7ucSf8gwlzk3UBCKsgi2oVsaqetceQBQYj0PgEnFAKq7diZSHw-X6bKOe13pIrsfwm4UzcDhtBQJuIksLVl4nFAlkTef5e9cqDpVJDsi2hI8FHenHNrNAK0vCBRxnKUW_s6K/w640-h556/Memory%20frequency%20scaling.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>The majority of the memory speeds keep the frametime differential within +/- 10 ms but we can see that the DDR4 3800 configuration has more sections where it is much lower than that - though with more small frametime spikes to break </b><b>+/- 20 ms than DDR4 4400...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;">What I can say, though, is that the spikes and firsthand gameplay experience was worst at DDR4 3200. In contrast, it appears that DDR4 3800 gave the best experience and frametime presentation. However, in terms of subjective experience, I could not tell the difference between any configuration above DDR4 3600. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">For DDR4 3800: this particular tuned configuration has the lowest primary timings of all the setups I tested, despite them all being validated as "the best" <a href="https://hole-in-my-head.blogspot.com/2023/01/analyse-this-does-ram-speed-and-latency.html">during my Spider-man testing</a>... but, from the results obtained here - it does not seem like Hogwart's Legacy is all that RAM sensitive.</div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi5dLsECGybX8oJLM3IHO1tMxURbtBI0tjx2PfYaqgdV9oI9HeiJxw8qsJhE7Un7IJ8XrZNpjuJu7nOFJ_twqKBs5cs2mXoPhizMdJyOXrg-7FT5W1XJBjSxW50GBoTtaoflcM-Sq7sKBnPD_6yrg8yDXtIoW3CZatF3ZOVK7p1_Aes8N-VOrN5xJT3/s512/RAM%20timings.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="253" data-original-width="512" height="198" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi5dLsECGybX8oJLM3IHO1tMxURbtBI0tjx2PfYaqgdV9oI9HeiJxw8qsJhE7Un7IJ8XrZNpjuJu7nOFJ_twqKBs5cs2mXoPhizMdJyOXrg-7FT5W1XJBjSxW50GBoTtaoflcM-Sq7sKBnPD_6yrg8yDXtIoW3CZatF3ZOVK7p1_Aes8N-VOrN5xJT3/w400-h198/RAM%20timings.PNG" width="400" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>The tuned RAM configurations...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiZPLN8TWCz_EC7IBxSt7n_Mr_x-w3QBDS9q0H6Grl6W9xncoANj9tAqXSTkDbOLi7JyIHdnNF6YuWUjt7MPGW3VV7iOXGt-ljDOwlOwRWzlnWuHfBYrFVM5aMGVGKnFfDzdPTbBW1WMXWvyHrxpOTuu_z-aKc9n-Zfs7tV2eWabTpyBIE6kdzhv_gZ/s519/RAM%20latencies.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="308" data-original-width="519" height="238" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiZPLN8TWCz_EC7IBxSt7n_Mr_x-w3QBDS9q0H6Grl6W9xncoANj9tAqXSTkDbOLi7JyIHdnNF6YuWUjt7MPGW3VV7iOXGt-ljDOwlOwRWzlnWuHfBYrFVM5aMGVGKnFfDzdPTbBW1WMXWvyHrxpOTuu_z-aKc9n-Zfs7tV2eWabTpyBIE6kdzhv_gZ/w400-h238/RAM%20latencies.PNG" width="400" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>And the associated latencies and bandwidths...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Lastly, let's take a quick look at the effect of system memory quantity in this game.</div><div style="text-align: justify;"><div><br /></div></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">RAM Quantity...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Playing with v-sync and RT enabled, we are still seeing that frame presentation overhead, resulting in more messy sequential frame differential plots (the DDR4 3800 being a strange exception). However, the experience when playing with the 8 GB of RAM was not fun. Not only were the stutters <i>really bad</i>, they were constant. 16 GB of RAM is both the minimum and recommended system requirement for this game and my results confirm that - 32 GB adds a bit of stability when running at DDR4 3200 but going to 8 GB is a whole lot of pain...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjBk2DAbpVueeEbvwXBAJClzBv766L6gtWBXTm99umaX_NDst1ubobqCwQE-2rh7iwW0kD1KPL8EJJqTPA9sxI2eQtC7BwxY5vL0RBVRVlW8ldzPAAFB5GW3GS9VKC8O-P6rf5VElOiY4zzxqPzlu75ohjLjudeOGNEI8NAT0-OQOu5mlrah5SoNt3Y/s679/Frametimes_memory%20quantity.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="112" data-original-width="679" height="106" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjBk2DAbpVueeEbvwXBAJClzBv766L6gtWBXTm99umaX_NDst1ubobqCwQE-2rh7iwW0kD1KPL8EJJqTPA9sxI2eQtC7BwxY5vL0RBVRVlW8ldzPAAFB5GW3GS9VKC8O-P6rf5VElOiY4zzxqPzlu75ohjLjudeOGNEI8NAT0-OQOu5mlrah5SoNt3Y/w640-h106/Frametimes_memory%20quantity.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Oof!</b></td></tr></tbody></table><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhxuopNw_m5iUMCNhd6SmQ_uZSRRQEKNAmLPgdtXIl0ym8cPxLI5gWnT4JcO1qkZSdpSY2XLenaveseD9c2QHyvyvbeMXGdMcdCjePQD6ajmVWKhv_i52KA1afyfrhfnXl6pyNvT1rVLu0ZamMjrBnSNmanszJx7xEIj-PbEE5UhfSYXD3u1eX1yINe/s1076/Memory%20quantity%20scaling.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="938" data-original-width="1076" height="558" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhxuopNw_m5iUMCNhd6SmQ_uZSRRQEKNAmLPgdtXIl0ym8cPxLI5gWnT4JcO1qkZSdpSY2XLenaveseD9c2QHyvyvbeMXGdMcdCjePQD6ajmVWKhv_i52KA1afyfrhfnXl6pyNvT1rVLu0ZamMjrBnSNmanszJx7xEIj-PbEE5UhfSYXD3u1eX1yINe/w640-h558/Memory%20quantity%20scaling.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Don't play this game with 8GB of RAM... just don't do it, please!</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">CPU demands...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">I haven't really ever delved into CPU overclocking, and I wasn't about to blindly do that for this particular analysis but just looking at the on-screen display during testing, I can see that the game isn't utilising the processor all that well. On one hand, it's really not demanding for non-RT play, and we can see that there is better utilisation in that scenario.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Turning on ray tracing effects drops overall utilisation, as I believe that this is reflective of a bottleneck caused by something being tied to a single processing thread. This also significantly increases system memory usage...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">It's also fun to note how much more energy RT causes to be used on both the CPU and GPU! That's got to be good for my energy bill.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEilsp9gO4LnSkdIK5VOw02PHU1a00VG76HGrL8g94OqukkyUPbGZe7iuq-HWetPzVWff6fcZm1N9rF5B9rdjpNz99xlmwe7eOpQICMfzvOEdYnWq29tvaP5VCN7KmGyGv4GR5kyJmgO0nCWD9fr1BbHtAaTWFB3dQBn9VOuU_QoOXB9aZj3e0ZKPYED/s849/RT%20comparison%202_thread%20use.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="672" data-original-width="849" height="506" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEilsp9gO4LnSkdIK5VOw02PHU1a00VG76HGrL8g94OqukkyUPbGZe7iuq-HWetPzVWff6fcZm1N9rF5B9rdjpNz99xlmwe7eOpQICMfzvOEdYnWq29tvaP5VCN7KmGyGv4GR5kyJmgO0nCWD9fr1BbHtAaTWFB3dQBn9VOuU_QoOXB9aZj3e0ZKPYED/w640-h506/RT%20comparison%202_thread%20use.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Performance monitoring on the i5-12400 + RTX 3070 system...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">Conclusions...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">I apologise for the long-winded, sometimes meandering post. Given the data generated, the amount of testing performed and the amount of hardware configurations checked (and double-checked) I feel like I've done an okay job trying to go through everything without going crazy. The amount of text here is not terrible, it's just that the graphics don't lend themselves to a short format.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">However, I feel like there are some takeaways for people who want to play this game and get the most out of their current or soon-to-be-upgraded hardware:</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><ul><li>V-sync is essential if you don't have a VRR display but this setting incurs a performance and presentation penalty when ray tracing is enabled.</li><li>The game isn't <i>that</i> graphically demanding. It really isn't! I had no issues managing to achieve approximately 60 fps for the majority of the time based on GPU hardware that scaled between the performance of an RTX 3060 Ti and an <i>almost</i> RX 6800 XT at 1080p. Yes, those are now mid-range cards in 2023 but I was also using high settings... turn those settings down to get more performance.</li><li>Graphics cards with faster VRAM will help - if you can overclock your video memory a bit, do it!</li><li>Graphics cards with 8 GB or less will struggle with certain parts of the game at certain times of day when ray tracing is enabled. However, even with RT disabled, at higher settings, you are pushing the limit on an 8 GB card... Either lower texture settings to medium or use a GPU with more than 8 GB if you want to use RT effects.</li><li>The game is CPU-bound, not only due to the large numbers of NPCs in the most demanding scenes such as the castle and Hogsmeade, but also due to the demands of data management and ray tracing. Having a higher clocked CPU will likely alleviate some of the problems.</li><li>You need 16 GB of RAM, no more, no less. Ideally, that RAM has as tight timings as possible and should be at least DDR4 3600 - but that's just a general idea rather than a specific observation. Try not to run 16 GB of DDR4 3200 - if you must run that slow a configuration of RAM, try and use 4x 8 GB sticks (32 GB total) or dual rank 2x16 GB sticks to improve bandwidth and stability a little.</li></ul><div><br /></div><div>Lastly, and though I didn't test this aspect here - use an NVMe SSD. Do NOT use HDDs or SATA SSDs for newly released modern games like Hogwart's Legacy. They will make the experience better by a rather large margin.</div><div><br /></div><div>Anyway, it's time for me to go an veg-out. I'm a little tired after all of this. I hope it was useful to some of you!</div></div></div>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-7560610393342650347.post-82749185876514314542023-02-09T04:35:00.005+00:002023-02-19T17:48:28.028+00:00The power curve of the RX 6800... and improving system energy use.<div style="text-align: left;"><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjN1YJkvnxoYuQn7AxDP_-_lFkHCqSPDpq1WLkffbYY0EjFIfwbyw8G134Q6Id2gkKn4yweo6elvg7tHV2d7sbaETUxABopVv2K_Gv8dZs1bPX0PWbcMeo9Ae2d5-TPSByG_jMwU-p63bus99FYasXxDx3HlJyFE6ToJJnHbfFEVkI8e1seMb3L0Lv0/s1920/Title.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="1080" data-original-width="1920" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjN1YJkvnxoYuQn7AxDP_-_lFkHCqSPDpq1WLkffbYY0EjFIfwbyw8G134Q6Id2gkKn4yweo6elvg7tHV2d7sbaETUxABopVv2K_Gv8dZs1bPX0PWbcMeo9Ae2d5-TPSByG_jMwU-p63bus99FYasXxDx3HlJyFE6ToJJnHbfFEVkI8e1seMb3L0Lv0/w640-h360/Title.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Again, literally what happens...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;">It may have escaped your notice that I picked up an RX 6800 last November. It's not been entirely smooth sailing, however: Two inexplicably dead DisplayPorts later*, I think I've managed to get a handle on the way in which to scale the RDNA 2 architecture.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><b><i><span style="color: #274e13;">*Completely unlinked to any tinkering on my part - since they happened when bringing the computer back from sleep - I don't keep my PC in sleep mode any longer. Never had a problem with it on my RTX 3070!</span></i></b></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Last year, I looked at the <a href="https://hole-in-my-head.blogspot.com/2022/06/the-power-curve-of-rtx-3070-and-ampere.html">power scaling of the RTX 3070</a> because I was conscious about getting the most out of my technology, with a focus on efficiency (both power and performance). So, I'm applying that same focus to the new piece of kit in my arsenal. Come along for the ride...</div><span><a name='more'></a></span><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">Princes and paupers...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">There's quite a difference between the process of optimising Radeon cards and their Geforce counterparts. Astonishingly, Nvidia's cards are not really locked down, whereas AMD's offerings are pretty limited in what options are available to the consumer through 1st or third party apps... at least, at a surface level.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">This was something that I was not generally aware of until after I purchased the card and wanted to optimise the performance per Watt and found that, as per stock, the user is only able to minimally affect the power limit - though clock frequency and core voltage are relatively free for the user to adjust. On the other hand, the memory frequency can only be increased (though, to be fair, I would not want to decrease it... but it could be useful for scientific purposes).</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgPmFwPzhXD29nyScNNvTGub4rHeZTJF1t4Jf88m45-DL29wzNVYPznqDRKEdA2M8NGJlMvWB56iFoBoXEoUH2cn2CSLxZa59BKRIxvQVDwfKYcN0DW5AYxfsPt2Sed9t0Xbl-kiTgYuEoA_KKY1e5LUAbPhCLyoEs1xIwQG5_Gv5pWnxj5yX2g7CuT/s1243/Adrenaline.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="849" data-original-width="1243" height="438" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgPmFwPzhXD29nyScNNvTGub4rHeZTJF1t4Jf88m45-DL29wzNVYPznqDRKEdA2M8NGJlMvWB56iFoBoXEoUH2cn2CSLxZa59BKRIxvQVDwfKYcN0DW5AYxfsPt2Sed9t0Xbl-kiTgYuEoA_KKY1e5LUAbPhCLyoEs1xIwQG5_Gv5pWnxj5yX2g7CuT/w640-h438/Adrenaline.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>A lot of people say that you should raise the minimum core frequency along with the maximum target... but I found that it introduces unnecessary instability in certain scenarios.</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;">In order to actually do what I set out to do for this article, I had to rely on the community tool called <a href="https://www.igorslab.de/en/red-bios-editor-and-morepowertool-adjust-and-optimize-your-vbios-and-even-more-stable-overclocking-navi-unlimited/">MorePowerTool</a>, hosted over at Igor's lab. It's a very detailed programme that allows the user to really delve into the inner workings of their GPU and affect It's performance in many ways. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Unfortunately, the documentation isn't that great and the example walk-through given at the download page explicitly tells potential users NOT to do the thing they must do in order to use it as I have done. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">I.e. You must press "Write SPPT" (Soft Power Play Table) to actually affect the accessible range of the exposed settings on your GPU. A very simple step but one that took me a lot of time to actually confirm...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">At any rate, I'm not currently that interested in playing around with things other than wanting to be able to power limit the card, to be able to perform my testing. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">This is where the second difference between Nvidia cards and AMD cards rears its head:</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">An Nvidia card will effectively throttle the core clocks when power is limited on the card, without the user having to put a hard limit on the card's performance. Instead, the user needs to provide the card with enough voltage to achieve whatever frequency is defined in the voltage frequency curve. AMD cards, or at least the RX 6000 series, will attempt to boost to the maximum set core frequency, whether or not there is enough overhead to the power limit or enough voltage to do so. However, even <i>if</i> there IS enough voltage to maintain the clock, if the power limit is not sufficient, it will also result in a crash of some sort...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">This difference in behaviour means that stability in one programme, does not mean stability in all programmes (in a less predictable manner than for Nvidia cards), and may indicate why AMD have chosen to lock down their graphics cards so heavily (even moreso for the RX 7000 series*), to avoid support threads for people mucking around with the operation of their cards and causing driver and/or game crashes. </div><div style="text-align: justify;"><span style="color: #274e13;"><b><i><blockquote>*Though this is looking more likely to be the result of AMD's relative lack of gen-on-gen performance increase, leading to them having to artificially limit the potential performance of the products in the RX 7000 series, so that the user cannot bump up their purchase to meet or exceed the product above...</blockquote></i></b></span></div><div style="text-align: justify;">That doesn't make the choice to do that any easier to swallow for their more experienced customers, though. </div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiSObbHtVebhibX8gcy3-g9VMy2qVeVEyUPb-4V1e4Xox8aiBe59NRdTTaIMywh55HU8W196zz5dNh7hMfdrTY8JyBnNXvO5oSXG7ebJZnplJaStSDhcW8thPPU5tZgIYHr4_hQ2J__vcxS7U5bvY34ehDgqPqC1Pa5UqWrrsctg6FIRAnKwFlLGpce/s686/Scaling_Heaven%201.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="283" data-original-width="686" height="264" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiSObbHtVebhibX8gcy3-g9VMy2qVeVEyUPb-4V1e4Xox8aiBe59NRdTTaIMywh55HU8W196zz5dNh7hMfdrTY8JyBnNXvO5oSXG7ebJZnplJaStSDhcW8thPPU5tZgIYHr4_hQ2J__vcxS7U5bvY34ehDgqPqC1Pa5UqWrrsctg6FIRAnKwFlLGpce/w640-h264/Scaling_Heaven%201.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Unigine Heaven just didn't scale until I was forced to drop the core frequency at the 55% power limit...</b></td></tr></tbody></table><br /><br /><h3 style="text-align: left;"><span style="color: #274e13;">Performance scaling...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">So, knowing what I just laid out above, you can understand why the scaling tests performed here are as they are: just as when overclocking, the stable boost clock requires enough board power to not crash, as the power limit is decreased, the same scenario is reached, but in underclocked situations. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">First up, to explore the scaling of the RX 6800, I went to the trusty Unigine Heaven benchmark. Attempting to repeat what I did with the RTX 3070, I set a modest overclock on both the core and memory, whilst undervolting, and attempted to scale back the power limit to a minimum of 50% in comparison with the stock settings.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Unfortunately, Heaven is just too old and too light a workload for the RX 6000 series and I essentially saw no real scaling down to the 65% power limit - performance was flat, almost identical to the stock result. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">What this appears to show is that because Heaven is not demanding enough, the card is able to reach high clockspeeds, without needing more power to keep those speeds stable. Additionally, I was worried that portions of the die, or more accurately functions of the silicon design, that would require more power or voltage to work at high clocks are not activated and, thus, not stressed. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">So, I needed to corroborate this result with another type of workload.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhjm0KSL6blQeZXkBI_Br4huHZpxk7YEWZ9PSJMot17pd5OTL3m5QknTGg7m9UCX7-ZaJTPlZ-v8XuWmIDMMpvNZCiGTiOutZPgFvvTZvSLeCqAd00GZid0MtFRtHyeHOuZ9auEEeEjnlJJvEZupyl8VZcMHrAS8tkVGxfDcD-tTlPxR0nAcKrGqAyS/s684/Scaling_Heaven%202.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="332" data-original-width="684" height="310" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhjm0KSL6blQeZXkBI_Br4huHZpxk7YEWZ9PSJMot17pd5OTL3m5QknTGg7m9UCX7-ZaJTPlZ-v8XuWmIDMMpvNZCiGTiOutZPgFvvTZvSLeCqAd00GZid0MtFRtHyeHOuZ9auEEeEjnlJJvEZupyl8VZcMHrAS8tkVGxfDcD-tTlPxR0nAcKrGqAyS/w640-h310/Scaling_Heaven%202.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Performance is flat... Heaven just isn't demanding enough. </b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Picking out Metro Exodus Enhanced Edition as a demanding RT benchmark, I set out to see how the scaling would progress. I was rewarded with a similar scaling of performance with power... (actually a little worse) and so I concluded that could be normal behaviour for the RX 6000 series.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">With this application, in order to be able to not have crashes and artefacts at lower power limits, I had to play with both core voltage and frequency - showing that, indeed, this type of workload was more stressful on the silicon than Heaven.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEizIdDFhDVo5C0XQXfkAJhnF6cLMUQ8hcSC5BDXY7G3qe6jXAy4L4FKFhQEZQTHOXC9cvhLooniQqk3D-PIMHUdD541dVt3GjGVAxdd1hdwRgJ784rVdJqaM5EvNrhAqH8PvHdUjNY9ySXc1zrwnjxeUk_TiAF_n1oYdkMxxNJ7DTR1Whngu764K2mX/s612/Scaling_Metro%201.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="286" data-original-width="612" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEizIdDFhDVo5C0XQXfkAJhnF6cLMUQ8hcSC5BDXY7G3qe6jXAy4L4FKFhQEZQTHOXC9cvhLooniQqk3D-PIMHUdD541dVt3GjGVAxdd1hdwRgJ784rVdJqaM5EvNrhAqH8PvHdUjNY9ySXc1zrwnjxeUk_TiAF_n1oYdkMxxNJ7DTR1Whngu764K2mX/s16000/Scaling_Metro%201.PNG" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Performance drops very quickly at 55% power limit....</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;">But this is a very specific type of game engine... would I get a flatter profile, like we did in Heaven, at the same power in a more generalised stressful workload? Last time, I had used Unigine Superposition to test the power scaling of the RTX 3070 and found that performance started to properly drop around the 75% power mark (165 W). </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Of course, total power draw is not accurately comparable between the two products because of the way that Nvidia and AMD report power measured to the available sensors. So, in theory, we always need to add some amount on top of the reported value in programmes like HWinfo. I'm not going to do that here because I would just be guessing the amount. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">However, the ratio between 100% and each step can be compared and in Metro Exodus we're seeing a drop at around the 65% mark, which would mean that, potentially, the RX 6800 can be much more efficient in its operation... but let's compare apples to apples.</div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhSZWXQwpWnhhsaOJWbYBOYKiqG3OxDFetJLTbb9ct3n70O8sSTy-dlR5V_aypIlAtIWtdRj4T0IuVoJUwu9LjsgLTTwz5wdbzi__Feqi9Leb4kRatbyMAUD64E1h2Drq_VU5FW5sJaL2TrD9CRP_BiHvhpgfpdN2_0AUuYLbXtNQR11Rr8L2RRI-ft/s683/Scaling_Metro%202.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="335" data-original-width="683" height="314" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhSZWXQwpWnhhsaOJWbYBOYKiqG3OxDFetJLTbb9ct3n70O8sSTy-dlR5V_aypIlAtIWtdRj4T0IuVoJUwu9LjsgLTTwz5wdbzi__Feqi9Leb4kRatbyMAUD64E1h2Drq_VU5FW5sJaL2TrD9CRP_BiHvhpgfpdN2_0AUuYLbXtNQR11Rr8L2RRI-ft/w640-h314/Scaling_Metro%202.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>RT performance drops quite quickly with the reduced power limit...</b> </td></tr></tbody></table><br /><div style="text-align: justify;">Knowing that the card was now stable in a demanding workload that tests all parts of the silicon and not throwing errors and artefacts with these clocks, voltages, and power limits, I was now able to apply these to the same benchmark I used when looking at the RTX 3070 - Unigine Superposition.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhLtY258_xzYXF1U_0p-NhIG2gau1f2q-t46WaaHzclwr1TZXtvr8650_M1L82_Ao3tgquT_GE968H1cgOr6rtyTVuwqpVCZsM7gvll6hx5bzHpDjrZLs38mn8LalwmNtZmSPxYyBahTt3RavdrNgzD0_44eQOhbrGni_tiilE1-_CbKvE-_roNX7r9/s727/Scaling_Superposition%201.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="375" data-original-width="727" height="330" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhLtY258_xzYXF1U_0p-NhIG2gau1f2q-t46WaaHzclwr1TZXtvr8650_M1L82_Ao3tgquT_GE968H1cgOr6rtyTVuwqpVCZsM7gvll6hx5bzHpDjrZLs38mn8LalwmNtZmSPxYyBahTt3RavdrNgzD0_44eQOhbrGni_tiilE1-_CbKvE-_roNX7r9/w640-h330/Scaling_Superposition%201.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Unigine Superposition scale nicely, but if we didn't have that substantial overclock the results would be much worse...</b></td></tr></tbody></table><br /><div style="text-align: justify;">Something that I observed during this testing, and I'm not sure if it's a 'me' thing or not, is that RDNA2 can get relatively large gains in performance from undervolting and overclocking compared to Ampere. Here, we're looking at 9% performance increase above stock at the same power draw. That's pretty impressive given that I was not seeing the same when testing the 3070. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Sure, I'm getting a 9% uplift with 231 extra MHz, but with Ampere I was getting 1% uplift with 60 MHz... if it scaled the same, we'd expect 2.25%.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">So, Ampere doesn't scale or clock as well as RDNA 2 does. The big difference is that Ampere didn't need its hand holding to determine what stable core frequency it could maintain at any given power/voltage limit.</div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEii_N1IsNowanK1zZ0HBcaSGlU0yipstqIc-vNkd2iSKyxV_7CmhrzL18-cBA5Ekgp7NVKDiaHTYruN_Qen79dQY13FBHJz3wZDyA8o2G81zylHRH-72tvYanin38iTx7WTsJFyqBEvsihRYQ3CJu_QU6gaNo6ifj_S2S2zqInbE8eOJ6QGyoCmP5F6/s808/Scaling_Superposition%202.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="391" data-original-width="808" height="310" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEii_N1IsNowanK1zZ0HBcaSGlU0yipstqIc-vNkd2iSKyxV_7CmhrzL18-cBA5Ekgp7NVKDiaHTYruN_Qen79dQY13FBHJz3wZDyA8o2G81zylHRH-72tvYanin38iTx7WTsJFyqBEvsihRYQ3CJu_QU6gaNo6ifj_S2S2zqInbE8eOJ6QGyoCmP5F6/w640-h310/Scaling_Superposition%202.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Scaling performance with power shows the efficiency of the architecture...</b></td></tr></tbody></table><br /><div style="text-align: justify;">If I had just jumped into running Superposition straight away, I wouldn't even have gotten these numbers because I actually tried raising the clock during this testing and I found that the card could be stably benchmarked in configurations that were actually, provably not stable in Metro Exodus (and presumably other applications, too).</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">In a way, this makes RDNA 2 more dangerous for casual overclockers like myself because there is a higher chance that they could be running an unstable configuration on their card, thinking that any crashes they encounter are due to poorly optimised software. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Going back to stock scaling, without an overclock, the RX 6800 starts really losing performance at the 85% power limit and, from the above chart, we can see that it's because the core cannot maintain the specified frequency. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">If that were the whole story, then the AMD card would be the loser, here. As it stands, it's really the winner because that drop happens from a 9% advantage - noting the caveats mentioned above. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">So now that we've explored the scaling and found correct and stable configurations at each power limit, let's look at how much performance we're losing across the same applications as we did for the RTX 3070...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">50% scaling...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Looking over the results, we get a geomean of around 0.85x - approximately the same as with the RTX 3070. The big difference is that Metro is the application that is dragging the average down in contrast to it bouying the results of the 3070.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">If there's one take away from that, it's that in future game applications that incorporate ray tracing, performance scaling will be worse for the AMD card... and that's not a surprising result. </div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgqLBxDcBmRbQVLuXGNgY4uy327YK5ks9ZgkZYqyzi9f0o_KY12j9QH2qGBQ_y83Af3CqpUcq_9TjMbuSklQz5Lb1ItmNN0Zp4UaI5akELXEXzW2yI1DdrmQ933t93mXT-xoT5ff98Lfs64A2YrS2_i5pabb1dkGitK7VCjDpBuCwVJzCFX_aD1tgWz/s432/Results_summary.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="432" data-original-width="370" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgqLBxDcBmRbQVLuXGNgY4uy327YK5ks9ZgkZYqyzi9f0o_KY12j9QH2qGBQ_y83Af3CqpUcq_9TjMbuSklQz5Lb1ItmNN0Zp4UaI5akELXEXzW2yI1DdrmQ933t93mXT-xoT5ff98Lfs64A2YrS2_i5pabb1dkGitK7VCjDpBuCwVJzCFX_aD1tgWz/s16000/Results_summary.PNG" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>I find it impressive that minimum fps is not affected as much as the maximum...</b></td></tr></tbody></table><br /><div style="text-align: justify;">What I will note, though, is that due to the massively oversized cooler, temperature and associated fan noise were never really a concern during the entirety of this testing - in fact, I stopped listing the fan speed in later tables because of the very low rpm when in operation.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">There are two big things to note, though. First, the fan curve out of the box for this XFX card was ABSOLUTELY HORRENDOUSLY LOUD... *ahem*. Seriously, it was obnoxious to the point of driving me insane. That was luckily fixed by tweaking the fan curve in Adrenaline. Which brings us to the second point...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The Radeon software is just more buggy than Geforce's. Even at stock settings, with just my fan curve applied, Adrenaline's tuning settings will crash (sometimes at system boot), returning everything to stock - without the user even being aware unless they load up the application to double check. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Adrenaline also had some small conflicts with MSI Afterburner being installed (the aforementioned fan curve could not be set, initially, until Afterburner was uninstalled).</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjgXduR7mcgcl73ouIqs7jYl_nLR-Jy0pRdR0dLgbDSOtfvtf-uOHn1vSlnCrONftS82tGzyKlq7q3w7tMej8cZIMgBYOVEBTxo-YKJknrAFqeQ3R4u3kxr9ftIwsaK6-sOKsF6LO-UXEq1LRGK8hq-lMXd2Ifmph4XPy8XQfpgElqkycKDHj275X-d/s499/Average_summary.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="44" data-original-width="499" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjgXduR7mcgcl73ouIqs7jYl_nLR-Jy0pRdR0dLgbDSOtfvtf-uOHn1vSlnCrONftS82tGzyKlq7q3w7tMej8cZIMgBYOVEBTxo-YKJknrAFqeQ3R4u3kxr9ftIwsaK6-sOKsF6LO-UXEq1LRGK8hq-lMXd2Ifmph4XPy8XQfpgElqkycKDHj275X-d/s16000/Average_summary.PNG" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Scaling in various applications is pretty good...</b></td></tr></tbody></table><span style="text-align: justify;"><div style="text-align: start;"><br /></div><div style="text-align: justify;">I also could not work out how to get the monitored parameters showing on the screen in-game without recording them, like you can with Frameview, Afterburner/RTSS, and CapframeX. Maybe you can't even do that... which seems a huge oversight. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><span style="color: #274e13;"><b>[EDIT 19/Feb/2023]</b></span></div><div style="text-align: justify;"><span style="color: #274e13;">It seems that this is a software bug I encountered in Adrenaline and the actual setting I needed to select was, for some reason, hidden until it was enabled. The only way to find it and select it was to search in the application search bar for the keyword "monitoring". Now, I have the full functionality available and the option shows up in the menu like it should. </span></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">I also dislike how much AMD is locking down their GPU hardware. I understand that they wouldn't let people dangerously over clock/volt, leading to hardware damage, but undervolting and down clocking can do no such thing... there's no reason to stop any enthusiast trying to also save energy as much as they wish.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">Conclusion...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">AMD's RDNA2 architecture has a good amount of overclocking potential hidden behind the curtain but that power needs to be thoroughly tested in multiple applications in order to actually prove what is stable, due to the way that performance is tied to the core and memory clock frequency. In contrast, Ampere appears to have less extra in the tank, so to speak, but is much more adjustable and friendly to manipulate by the user. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Unfortunately, for the user to really get the most out of their expensive graphics, they need to go behind AMD's back and get help from the community. This shouldn't be the case and, unfortunately, <a href="https://wccftech.com/community-driven-morepowertool-for-amd-gpus-will-not-support-rdna-3-due-to-hard-lock-users-to-pay-for-power-limits-features/amp/">it's worse for the current generation of GPUs</a>.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Interestingly, one you've gotten around the hurdle that AMD have put in your path, both RTX 3070 and RX 6800 manage to achieve approximately 85% of their stock performance at 50% of their power limit, showing that both architectures are equally power efficient when they are set to be.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Looking at the benchmark, Superposition, at stock the RTX 3070 leads in performance, but once overclocked, the RX 6800 blasts it out of the water. Moving to a 50% power limit, both cards deliver approximately the same performance for around the same power draw (approx. 110 W).</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">It has been said many times that Ampere was a very power inefficient architecture, but my experience shows that it's basically just as efficient as RDNA2, with the caveat that it cannot clock as high due to the Samsung process node. Additionally, as we have known for a while now, Ampere's doubling of the FP32 units does correspond to an actual doubling of use due to the dual nature of the second set. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">So, taking an approximate 1.5x modifier to the shader unit number (5,888) gives us 3,925 - very close to the number in RDNA2's RX 6800 (3840), which explains their relatively close performance scaling (except at the high end).</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">With this knowledge obtained, it seems that, in hindsight, it was obvious that Nvidia's Ada GPUs would be massively more efficient than AMD's RDNA3. But that's it for now - hope you enjoyed the post!</div></span></div>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-7560610393342650347.post-24341113951680624812023-02-04T15:05:00.004+00:002023-02-10T15:22:50.097+00:00Analyse This: Forspoken Demo analysis...<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhczzFykb69lIaPMelNuhN_OqyikhjxaCe80xccvCsNEJpAKS85v_gVtWS7xAziPwV7k7KA0szyD5PJbwIEPZA0xQAHuzp84X-2z3FAbkhS181AF0e6PBfK7uK8ThHO1NL0NACHsCXMmNhfHKTIodaJHoL0qxjviFjPpDZaKxi8emmkwwzH-4hSrYFj/s1920/Title_alt.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="1080" data-original-width="1920" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhczzFykb69lIaPMelNuhN_OqyikhjxaCe80xccvCsNEJpAKS85v_gVtWS7xAziPwV7k7KA0szyD5PJbwIEPZA0xQAHuzp84X-2z3FAbkhS181AF0e6PBfK7uK8ThHO1NL0NACHsCXMmNhfHKTIodaJHoL0qxjviFjPpDZaKxi8emmkwwzH-4hSrYFj/w640-h360/Title_alt.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"></td></tr></tbody></table><br /><div style="text-align: justify;">I've been eagerly anticipating the release of the first DirectStorage title, mostly to see whether my prediction and/or understanding of the tech was correct or not. However, it seems like this particular DirectStorage implementation, much like Forsaken, itself, is a bit of a disappointment...</div><div style="text-align: left;"><div style="text-align: justify;"><br /></div><div style="text-align: justify;">In the end, I did not splash out on the main game, given the very mixed reception and <i>absolutely jam-packed</i> release schedule for the first quarter of the year - I chose to spend my limited money on other titles that I actually might want to play/test, instead.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Luckily for me, Sqaure Enix released a PC demo and, while I am not <i>entirely</i> sure that everything is <i>exactly</i> the same between it and the main release, it is what I am able to test in this scenario. Perhaps the conclusions I will draw are limited because of it but I do sort of question whether there will be large codebase or engine optimisations available for the main release that are not part of the demo... But, let's see.<span><a name='more'></a></span></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">What this is, and what it isn't...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The specific DirectStorage implementation on PC included in Forspoken is version 1.1 compatible... but the actual featureset used in the game appears only applicable to the version 1.0 of the API - i.e. there is no GPU decompression of assets - only CPU. So, the reported "version" of the .dll is not relevant, here.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">I am testing the loading of a save game which is not the latest autosave point. This is important because the game pre-loads the last autosave, meaning that when a user presses "continue" on the title screen, they will be put into the gameworld instantaneously. I haven't and have never made this mistake.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Additionally, I am testing the performance of a run through the world, across an area that requires streaming of new world data. Both tests are taken as an average of three repeats.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The testing has been performed across two systems: </div><div style="text-align: justify;"><ol><li>Ryzen 5 5600X, RX 6800 (undervolted, overclocked, power limited), 32 GB 3200 Mbps DDR4</li><ul><li>Coupled with a Western Digital SN750 1 TB; Crucial P1 1 TB; Samsung 860 SATA 1 TB</li></ul><li>Intel i5-12400, RTX 3070 (undervolted, overclocked, power limited), 16 GB 3800 Mbps DDR4</li><ul><li>Coupled with a Western Digital SN 570 1 TB</li></ul></ol></div><div style="text-align: justify;">Why are my graphics cards undervolted and power limited? Because I like saving money and power, and I can see that <a href="https://hole-in-my-head.blogspot.com/2022/06/the-power-curve-of-rtx-3070-and-ampere.html">there is a decent amount of potential headroom</a> in modern GPUs when heat is not a problem*. The point is - neither card is performing <i>worse</i> than the stock, out of the box performance.</div><div style="text-align: justify;"><b><i><span style="color: #274e13;"><blockquote>*The article covering my foray into optimising the AMD RX 6800 is still coming...</blockquote></span></i></b></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">As pointed out, in the introduction, I am testing the demo - not the main release. I <i>do not</i> expect there to be huge differences in the actual code and optimisations very close to release between the two, but I cannot discount it. So, some of the comparisons with reported numbers from other commentators may be suspect. However, the numbers I am presenting myself will be internally comparable.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgqu8LHJSPRIucBPkXQm9LMiqbDEZmVwe1baJeUj-GZvHIWDsZsAPm-EJIvD-S3ZmD1YLDjJar004wWG4RHa2QEb0o9snV-79EiE2-mCZaFi6VctM-3yhG-npSHQNtcCrpF8pqYDfQoShYiaBRpum8LV_0Mrlqg6K3uho5gVkxOOhOE_BOdt9DIsH9L/s1920/Digital%20Foundry%201.png" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="1080" data-original-width="1920" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgqu8LHJSPRIucBPkXQm9LMiqbDEZmVwe1baJeUj-GZvHIWDsZsAPm-EJIvD-S3ZmD1YLDjJar004wWG4RHa2QEb0o9snV-79EiE2-mCZaFi6VctM-3yhG-npSHQNtcCrpF8pqYDfQoShYiaBRpum8LV_0Mrlqg6K3uho5gVkxOOhOE_BOdt9DIsH9L/w640-h360/Digital%20Foundry%201.png" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Loading comparison <a href="https://www.youtube.com/watch?v=j8_HcLb4ajY&ab_channel=DigitalFoundry">provided by Digital Foundry</a>...</b></td></tr></tbody></table><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">Loading times...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">I had <a href="https://hole-in-my-head.blogspot.com/2022/03/directstorage-again.html">previously noted</a> that Forspoken did not appear to be a good choice for the display of the benefits of DirectStorage (DS). It appears that the game is <i>very</i> heavily optimised in terms of I/O, resulting in minimal impact of the technology. <a href="https://hole-in-my-head.blogspot.com/2023/01/yearly-directstorage-rant-part-3.html">I have also pointed out</a> that DirectStorage is very dependent on the quality of the storage medium being utilised by the player - SSDs are not created equal - with DRAM, controller, and NAND all having an impact on the performance of the device.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">In the case of a DS implementation utilising the CPU to decompress data, this should also mean that the performance level of that CPU will also have an effect on the loading performance of the software in question... and, in the case where a GPU is utilised to decompress the data, I expect the performance level of that GPU will have an effect as well - though this is yet to be shown in a real-world example.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg5ffRcQaNqEbDoTmchj0uYleDUF86jtpiZQGuZZjLXFR8Czo-3pLymoQm4mGkM2xsFP_PXqICqhfGsD6ALi8JupQcotl0M96gmnxQwW7lttZwlXlxJzgcTe4CKWYBDqe3upDmzOV3WymRcmzO541GmlN3KUE63UFvN-U-hN0bkxKogPhm9B9QLoGLK/s1200/PCworld%201.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="697" data-original-width="1200" height="233" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg5ffRcQaNqEbDoTmchj0uYleDUF86jtpiZQGuZZjLXFR8Czo-3pLymoQm4mGkM2xsFP_PXqICqhfGsD6ALi8JupQcotl0M96gmnxQwW7lttZwlXlxJzgcTe4CKWYBDqe3upDmzOV3WymRcmzO541GmlN3KUE63UFvN-U-hN0bkxKogPhm9B9QLoGLK/w400-h233/PCworld%201.jpg" width="400" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b><a href="https://www.pcworld.com/article/1486755/tested-microsofts-directstorage-tech-signals-the-sunset-of-sata-ssds.html">PC world</a> performed the above tests on the game... tested with a 13900KF + RTX 4090 + 32 GB DDR5 5200 Mbps.</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj8SvXJ4hzyn0FI1zC1tLxtDMshZH0hLV4ZV1Q35QE6bCcIWVS-oIaZl6Tr0SC0FlS2Lp3n1DoqxnkmcrOe0b2uRdvHI0QnPN4XFhyncnVyLQj_Kvg8KqOPrD7TWKaCI1osPz9ltVWtQhFIoeU_8dfVXE0RwLLmPxFXQ8wlUdcKDPhQoq1HEboXOg3Q/s1920/Techtesters%202.png" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="1080" data-original-width="1920" height="225" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj8SvXJ4hzyn0FI1zC1tLxtDMshZH0hLV4ZV1Q35QE6bCcIWVS-oIaZl6Tr0SC0FlS2Lp3n1DoqxnkmcrOe0b2uRdvHI0QnPN4XFhyncnVyLQj_Kvg8KqOPrD7TWKaCI1osPz9ltVWtQhFIoeU_8dfVXE0RwLLmPxFXQ8wlUdcKDPhQoq1HEboXOg3Q/w400-h225/Techtesters%202.png" width="400" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b><a href="https://www.youtube.com/watch?v=ZN5m1_A08JQ&t=451s&ab_channel=Techtesters">Techtesters</a> performed the above tests on the game... tested with a 3900K + RTX 4090 + 32GB DDR5, at max settings, 4K resolution.</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The problem I have had, post-launch, is that various sources of commentary/analysis have not produced very consistent results. Mostly, purely because the storage devices they are all testing are completely different!</div><div style="text-align: justify;"><div><br /></div><div>I would LOVE for someone with more access to disposable (i.e. their personal data is not on the particular drives in question) hardware to actually test the scaling of a specific drive across multiple CPUs and, when the implementation is ready - testing across multiple GPUs.</div></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">However, saying all of that... I do believe that we can join some dots here... Looking at the available (and VERY sparse) data, we have the following conclusions:</div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiMZh1pCuNrDJGSnLazDi39Pl1h4bFeGQoCFabspregCNlxI2m0ph3lukQLEPYOD6WZdPiUdBnRlkrluCVGXgsi8V3oUOpukdDmjN9oJ0mpxB_WrcU01lBKKCEacKhE8fEPdMwpj8xcaHFK-2L2vtMeAuAIEbu9xvOloBTEKB8T9la0kBqUOiyHmlrq/s390/CPU%20scaling.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="126" data-original-width="390" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiMZh1pCuNrDJGSnLazDi39Pl1h4bFeGQoCFabspregCNlxI2m0ph3lukQLEPYOD6WZdPiUdBnRlkrluCVGXgsi8V3oUOpukdDmjN9oJ0mpxB_WrcU01lBKKCEacKhE8fEPdMwpj8xcaHFK-2L2vtMeAuAIEbu9xvOloBTEKB8T9la0kBqUOiyHmlrq/s16000/CPU%20scaling.PNG" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Taken from the above-linked video from Digital Foundry...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br />We can see that using the same hardware, the DirectStorage 1.0 implementation (utilising CPU decompression) indeed has a scaling with regards to loading times when less powerful CPUs are installed in the system. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">However, my own testing shows that the effect of the storage device is of much greater importance than the relative power of the CPU. A 12400 is considered to be more performant than the 5600X. In fact, <a href="https://hole-in-my-head.blogspot.com/2022/08/analyse-this-performance-of-spider-man.html">my own testing corroborated this</a> during my testing of Spider-Man. However, I have to report some strange results in this regard:</div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjeZIS4n6A4NvpGIQKqybzWwB2YNz9u__yUNxH_ObY_hfqlxU22LFrks0-wB1cEFtlTUhYY6cA8WR9CwvGy4lFW_v_mKYR-N4mRGOyVTlfejEYiIY-vyVfNVWmBUzORMdjCz-KFkHbu-MaRiMMw5uEnkdWXmPCW0Oe2ATD927W9mOg1MBSUrlZ1aycW/s390/Loading%20comparison_alt.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="149" data-original-width="390" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjeZIS4n6A4NvpGIQKqybzWwB2YNz9u__yUNxH_ObY_hfqlxU22LFrks0-wB1cEFtlTUhYY6cA8WR9CwvGy4lFW_v_mKYR-N4mRGOyVTlfejEYiIY-vyVfNVWmBUzORMdjCz-KFkHbu-MaRiMMw5uEnkdWXmPCW0Oe2ATD927W9mOg1MBSUrlZ1aycW/s16000/Loading%20comparison_alt.PNG" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>My testing, performed on a R5 5600X + RX 6800... and i5-12400 + RTX 3070 - both on Win 10.</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br />Yes, the slower drives perform worse, in general, but (and this was repeatable across multiple tests) the DRAM-less WD SN 570 performed worse with DS <strike>disabled</strike> <span style="color: #274e13;"><b>enabled</b></span> on Win 10, than with the feature <span style="color: #274e13;"><b>disabled</b></span> <strike>enabled</strike>. I actually do not know the reason for this strange behaviour.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Ignoring this one aspect, it seems that the performance of the drive has more of a weight on the loading performance than the performance of the CPU - though it does also have a role to play in this tale.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Secondly, I did not see any real improvement with regards to DirectStorage being on or off. Even with the SATA drive, DS only had a difference of around 0.5 seconds. I <i>would</i> hypothesise that this might be due to the fact that the DirectStorage flag doesn't work in the demo - considering that Digital Foundry showed a tangible difference between Win 10 and Win 11 environments in the released game - but I <i>can</i> see a slight, repeatable difference between the flag turned on and off... meaning that it is doing <i>something</i>.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">This is a point I am unsure of and I really need someone with better access to hardware to really test.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj3kJZaJUrQfov8ICL0HT4IuQ8MvZleDug5KsJtgB7csSc44o379PJt0ElnyH0U9VksDMDWgeK_ncn9q_Ml3KKgigiILon4CSe4ci80I40Py_0ZW0xW0e-d62SUxsZS1URw9pEx4Jhp-wgmt8GhMe59s-dRI8QtH5E5C3_L5TrnHf4QeYvk6g3UYS3m/s1920/Techtesters%201.png" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="1080" data-original-width="1920" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj3kJZaJUrQfov8ICL0HT4IuQ8MvZleDug5KsJtgB7csSc44o379PJt0ElnyH0U9VksDMDWgeK_ncn9q_Ml3KKgigiILon4CSe4ci80I40Py_0ZW0xW0e-d62SUxsZS1URw9pEx4Jhp-wgmt8GhMe59s-dRI8QtH5E5C3_L5TrnHf4QeYvk6g3UYS3m/w640-h360/Techtesters%201.png" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>From the Techtesters video linked above...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">Gameplay Performance...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">This is where things get weird. While I had always predicted that shifting a load such as this onto the GPU would reduce graphics performance, I never thought that we would get a reduced performance from such a scenario on the CPU - after all, modern CPUs have a lot of parallelised scheduling that is hardly utilised by the majority of games (Forspoken included) so, there should be no equivalent reduction in game performance. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">However, despite the <a href="https://youtu.be/u_3hTbE482M">erroneous claim</a> that enabling DS reduced performance by around 10% on high end hardware.... <a href="https://www.youtube.com/watch?v=ZN5m1_A08JQ&t=451s&ab_channel=Techtesters">Techtesters</a> have also published that they observed a <i>very</i> (and I mean incredibly) slight reduction in average framerate and 1% low framerate during their manual in-game benchmarking. But, like my strange, repeatable results - they must be reported as-is, without any real understanding of why they are observed.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">From my side, in Windows 10 and using the Demo, I see no reduction in average performance when using slower storage devices, though there is an increase in the maximum frametime spikes (which correspond to minimum fps) as we proceed to the worse devices:</div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgtU_6EQNwMpcYqrswUtJuWyRr5mD_5UZFTuu916brl87Rtxum48gOX-MZx6eLAhanfX_p4hx9g2yrFe4VUE8gG-9YEkF5h2KfYIJqEDMFZQ_Gc5KOjDAMIWacGHijhGkZUYdg_naRWhpN9LaVE7MnQLY49W1sMWGbZ0ErF-PayW0QYvGic4jAER3AO/s756/RX%206800%20performance%20comparison.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="126" data-original-width="756" height="106" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgtU_6EQNwMpcYqrswUtJuWyRr5mD_5UZFTuu916brl87Rtxum48gOX-MZx6eLAhanfX_p4hx9g2yrFe4VUE8gG-9YEkF5h2KfYIJqEDMFZQ_Gc5KOjDAMIWacGHijhGkZUYdg_naRWhpN9LaVE7MnQLY49W1sMWGbZ0ErF-PayW0QYvGic4jAER3AO/w640-h106/RX%206800%20performance%20comparison.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"></td></tr></tbody></table><br /><div style="text-align: justify;">However, I would not ascribe this to DirectStorage, rather the fact that the engine/game is designed to expect data to be present for processing at any given point... and the slower devices are less likely to be able to provide that data in a timely manner. Alternatively, as seen in the table below, we can see the effect of bringing the RX 6800 back to stock settings, resulting in worse 10%, 5%, 1%, and minimum framerate values - which would indicate a GPU bottleneck... which seems strange, and so I would presume that this is actually some CPU bottleneck or inefficiency in transfer of data to the VRAM (despite the larger quantity on the 6800 compared with the 3070).</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">In fact, I find the allocation and use of VRAM to be quite strange in this title - with 11+ GB utilised by the game when running on the RX 6800 but only 6+ GB on the RTX 3070... I encountered no places in my short testing where stutters occurred due to running out of VRAM, so I wonder if it is possible that the 6800 might run into issues with respect to data management between RAM and VRAM at inopportune times... which can be alleviated by increasing the memory frequency.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Additionally, I see that even the system with the SN570 (which gave very strange results) performed just as well as the AMD system - though it appears that the i5-12400 grants a slightly better maximum frametime when the system is CPU-constrained.</div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhYS8bBw5ob-70B2po8I8e3WtS2NQWC_Yd7vgRJqV6_HRjG9u8em99pCX8GjddV3SfzSQ3MXpLMiq6g1z0ufhQtyvOQHcenHtLCWJp5Utw5Uz7lLWdtlaM83rQ1PoiLN7gCL5_53Xbxx56EJH0UowvJ_9V0dc8pUqpmx1d1Azpw91CcMrLzE3ic0SK-/s757/Performance%20comparison.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="173" data-original-width="757" height="146" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhYS8bBw5ob-70B2po8I8e3WtS2NQWC_Yd7vgRJqV6_HRjG9u8em99pCX8GjddV3SfzSQ3MXpLMiq6g1z0ufhQtyvOQHcenHtLCWJp5Utw5Uz7lLWdtlaM83rQ1PoiLN7gCL5_53Xbxx56EJH0UowvJ_9V0dc8pUqpmx1d1Azpw91CcMrLzE3ic0SK-/w640-h146/Performance%20comparison.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"></td></tr></tbody></table><br /><br /><h3 style="text-align: left;"><span style="color: #274e13;">Conclusion...</span></h3><br /><div style="text-align: justify;">All-in-all, the impression I have is that this title is mostly CPU-bound and that adding the additional overhead of Directstorage decompression and batch loading is actually not helping it, at all. In the meantime, I am looking forward to future implementations of DS, in order to properly analyse them and how they affect performance scaling on various system components. Of course, storage will <i>always</i> be a bottleneck... which is why I am primarily against the use of this technology on PC in the first place...</div><div style="text-align: justify;"><br /></div></div>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-7560610393342650347.post-56975681520654082352023-01-22T15:53:00.002+00:002023-01-22T16:44:10.837+00:00Next Gen PC gaming requirements (2022 update)<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgASyxrg_ENxpn4_vlKV6KUhuWD_kj1tsY0zYUN19muzTtaL5K-7XT_CXuq3fSFo8WCL8p4KWRP7tLKuIrOTPT50msGwWRuCNAYjE4ds1dL0KJ32IrYSwnYNa0p--jqOoCp8cNvwKxnkRW5eeu8gi3pATr6Ifo6aMNcLAPOSzuYQyW230shTXy00tEr/s1024/Header%202022.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="681" data-original-width="1024" height="426" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgASyxrg_ENxpn4_vlKV6KUhuWD_kj1tsY0zYUN19muzTtaL5K-7XT_CXuq3fSFo8WCL8p4KWRP7tLKuIrOTPT50msGwWRuCNAYjE4ds1dL0KJ32IrYSwnYNa0p--jqOoCp8cNvwKxnkRW5eeu8gi3pATr6Ifo6aMNcLAPOSzuYQyW230shTXy00tEr/w640-h426/Header%202022.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"></td></tr></tbody></table><br /><div style="text-align: justify;">It's time for the yearly round-up of game recommended system specifications trending data! </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">First up, I want to pay tribute to the person who helped inspire me to begin this yearly endeavour. It was <a href="https://www.shamusyoung.com/twentysidedtale/?p=49109">his catalogging</a> of the games released over a ten year period on Steam that jump-started <a href="https://hole-in-my-head.blogspot.com/2020/07/next-gen-game-pc-hardware-requirements.html">this whole thing</a>. <a href="https://www.shamusyoung.com/twentysidedtale/?p=54513">He will be missed</a>...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">However, games keep being released and keep requiring more demanding hardware... so my sisyphean task remains.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Let's jump in!<span><a name='more'></a></span></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">Covering the basics...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">I have to quickly go over some things before relating the data - <a href="https://docs.google.com/spreadsheets/d/1O1_0bsnKrmazhoxtyJzuzXRKGFTIjvnRheF3nlVFDu4/edit?usp=sharing">which can be found here</a>. I have refreshed the data from the current <a href="https://browser.geekbench.com/v5/cpu/search?utf8=%E2%9C%93&q=Ryzen+7+3700X">Geekbench</a> and <a href="https://www.videocardbenchmark.net/gpu_list.php">Passmark</a> databases as there has been a significant amount of drift over the last year as these benchmarking programmes have evolved. Similarly, Userbenchmark (whose actual data I never had a problem with, and found lined up quite nicely with other benchmarking services) has also evolved in how they apply their biases/assumptions, etc. However, updating all the numbers for just two benchmarks was enough work. So, if I get around to it, I will update the Userbenchmark data at a later point. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The games that are included in the assessment are hand-picked by myself, so there is an inherent bias there to this data. However, I am picking games that I feel are popular for whatever reason (compeitive, cultural, etc), require some amount of hardware to run (i.e. not a game that can run in a browser), or are representative of the latest challenges placed on PC gaming rigs.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Since a LOT of high profile games were delayed from a 2022 release to, well, just about <i style="font-weight: bold;">now</i> I think the relative increase this year has been quite low compared to what it <i>should</i> or <i>could</i> have been. Looking at the first few games released in 2023, if they indicate any sort of trend, we are looking at a generational leap in system requirements as the inter-generational period finally ends...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Buckle-up! The 2023 is shaping up to be a wild ride...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">Boy, don't hurt your brain...</span></h3><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjNCDdOAfMG9jeV8j3hOBs28jT_j-hncKvHEcR2odAgOQYw0RPXtcFdrKLCBf9FyxdbQNBYmaxp9I3fj9HZ3aczb6jbQXnPcPeWfiTRGzjJe6dGZ2p0cMTF9qLvaAJFl6lM0j1oen1rnPvBmg6wt77IZA3PHk-YrwPo_6wDHDbv08qbC77bXYLxKX4B/s667/Passmark_CPU_single_predicted_poly_EOY_2022.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="467" data-original-width="667" height="448" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjNCDdOAfMG9jeV8j3hOBs28jT_j-hncKvHEcR2odAgOQYw0RPXtcFdrKLCBf9FyxdbQNBYmaxp9I3fj9HZ3aczb6jbQXnPcPeWfiTRGzjJe6dGZ2p0cMTF9qLvaAJFl6lM0j1oen1rnPvBmg6wt77IZA3PHk-YrwPo_6wDHDbv08qbC77bXYLxKX4B/w640-h448/Passmark_CPU_single_predicted_poly_EOY_2022.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Last year, we saw a drop in the trend, this drop is observed once again... (Grey line was the fit from 2021)</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;">As noted above, the delay of some of 2022's most anticipated titles into 2023 has severely reduced the single core requirements for games released this year. We do see an increase in requirements, at least - unlike last year were we observed a slight regression. However, we are still looking at single core performance of greater than an Intel 10600K/10700K.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgn92qotZvTrnIkAwIOYiuplt4ZM7rI7myQh0rMpLF6JUpW88DDYCSDAydZ-4nhPhVseYUrb1wRguWJgpMSVZHAPCLOGzl5GkcI3BZQ30iXK0tpd2XJ9GBuwo5fOuxLHxANrVDLrpAApRkdnRHE73tMKfiN1rs71C_IBO6uiyzPPCxssxFxiX6Yed0C/s661/Passmark_CPU_multi_predicted_poly_EOY_2022.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="461" data-original-width="661" height="446" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgn92qotZvTrnIkAwIOYiuplt4ZM7rI7myQh0rMpLF6JUpW88DDYCSDAydZ-4nhPhVseYUrb1wRguWJgpMSVZHAPCLOGzl5GkcI3BZQ30iXK0tpd2XJ9GBuwo5fOuxLHxANrVDLrpAApRkdnRHE73tMKfiN1rs71C_IBO6uiyzPPCxssxFxiX6Yed0C/w640-h446/Passmark_CPU_multi_predicted_poly_EOY_2022.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>We also see an associated drop in multicore predictions, too but the jump in actual requirements this year was much larger, here... </b><b>(Grey line was the fit from 2021)</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Multicore sees a smaller decrease in predicted average recommended requirements in 2025 but we're still looking at around the performance of a Ryzen 3700X.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">I haven't included any performance thresholds for the latest generations of CPUs but the short of it is that they pretty much all blow the included CPUs out of the water. Anything Intel 12th gen or newer will absolutely smoke these parts.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The one caveat here is the Ryzen 7 5800X3D. On paper, it has worse single core performance than the 5600X and worse multicore score than the 10700K.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Due to the way the cache is configured on this chip, there are no standardised synthetic benchmarks, that I'm aware of, that will accurately capture its real-world performance in games. Going forward, these parts with vastly inflated cache sizes, will have non-standard scaling in comparison with other chips in the same stack or even between generations.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The only data point that I am able to see that reflects the real-world performance, that is available in a large standardised database is the memory latency figures provided by Userbenchmark. The performance difference, in those games that can take advantage of the 3D cache, is approximately equal to the difference in the memory latency test. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">It's something that I need to look into, especially as more parts are released (from both AMD and Intel) that incorporate more on-die cache. There is a possibility that, somewhere down the road (in 2025 or so...) that I would need to apply a scaling factor based on this metric...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">Brawn for the beast...</span></h3><div><span style="color: #274e13;"><br /></span></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhc3dtbTRlxtEgk_XrAyjDI_vqmqOzZT4npkzAj4qQyaFxyASkSTmlxsjnRrhNEK9Hr1TdIhiZTzyRgddJq8S19Ejb0zdwMHs-eFmVobwXUzXf2cFr09hde6zs5M1yLomYDV2TFqKwKJshmhp559Q2GxzlDRgNuQnlchfQYLqj4fG5OJujOv7UO-jhD/s670/Geekbench_GPU_predicted_poly_EOY_2022.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="436" data-original-width="670" height="416" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhc3dtbTRlxtEgk_XrAyjDI_vqmqOzZT4npkzAj4qQyaFxyASkSTmlxsjnRrhNEK9Hr1TdIhiZTzyRgddJq8S19Ejb0zdwMHs-eFmVobwXUzXf2cFr09hde6zs5M1yLomYDV2TFqKwKJshmhp559Q2GxzlDRgNuQnlchfQYLqj4fG5OJujOv7UO-jhD/w640-h416/Geekbench_GPU_predicted_poly_EOY_2022.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Predicted GPU requirements also decreased, somewhat...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;">So, the good news is that CPUs are cheap, performant, and plentiful. Even the lower-end of both current product stacks is far beyond what is expected to be required for the average recommended system requirements in 2025. The problem, here, is the GPU situation...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">GPUs are expensive, lacking in performance*, and <i>still </i>somehow not that plentiful in supply. Sure, we had a glut of cheaper sales in September and October, but those appear to have dried up. Yes! Graphics settings scale decently well on most games... but it's not an absolute truth. We're talking about the recommended system requirements for games at 1080p! Not some outlandish scenario!</div><div style="text-align: justify;"><b><i><span style="color: #274e13;"><blockquote>*Except at the high-end, where we're still making huge strides each generation...</blockquote></span></i></b></div><div class="separator" style="clear: both; text-align: justify;"><br /></div><div class="separator" style="clear: both; text-align: center;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhYCGVn9hoF6JAANhev2wIw4zdCiFwg_eusLc-jFE0kqFmJwg8-0m5wFVtcFDXzRAgR0Zi7P-SN4uM0TgDOyQb7AmJRbpKt3dQMbrU7_EbC8chG9bFHh72LTpiJhk8iThCeVVn1QjT9Dir3Nd0br_U_xX8D1IaBFjQiBIjvaeKX_Yrs23-V5wvumbFf/s762/Steam_TPU_GPU.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="762" data-original-width="700" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhYCGVn9hoF6JAANhev2wIw4zdCiFwg_eusLc-jFE0kqFmJwg8-0m5wFVtcFDXzRAgR0Zi7P-SN4uM0TgDOyQb7AmJRbpKt3dQMbrU7_EbC8chG9bFHh72LTpiJhk8iThCeVVn1QjT9Dir3Nd0br_U_xX8D1IaBFjQiBIjvaeKX_Yrs23-V5wvumbFf/s320/Steam_TPU_GPU.jpg" width="294" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>TechPowerUp's database has become less representative of the performance scaling between GPUs more recently, but it's still one of the better databases for visually representing this metric... Steam's hardware survey must be taken with a pinch of salt but until we get another source of installed marketshare, this is the best we have...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Added to the absolute cost of getting a certain level of graphics performance in your computer, we have issues, further down the stack with VRAM quantities - but I'll get to that next.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">And the memories bring back you...</span></h3><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgkCdjxvpC3WjfNNg7tBl3PMsjEuo2sv14tOrJLy4mVjKUrGt4W4kOk1DS1TtcaP9cSIPNcZkpn_4sDzXOq_rR8bUXk_reL7uM--wSuLevy6hv3f4Sh30ByLY_0n9XBUaubuWFuRflWbSW4RNqiLuIIOZ60aSxfdHAIOgu-At-Cp48tt6OfjE6hzWHB/s585/RAM_system.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="369" data-original-width="585" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgkCdjxvpC3WjfNNg7tBl3PMsjEuo2sv14tOrJLy4mVjKUrGt4W4kOk1DS1TtcaP9cSIPNcZkpn_4sDzXOq_rR8bUXk_reL7uM--wSuLevy6hv3f4Sh30ByLY_0n9XBUaubuWFuRflWbSW4RNqiLuIIOZ60aSxfdHAIOgu-At-Cp48tt6OfjE6hzWHB/s16000/RAM_system.PNG" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>The required amount of system memory is increasing...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Given the vastly divergent memory design paradigms of the current gen consoles and the PC, we haven't observed any unexpected movement in system memory requirements over the last three years. Yes, we were in the inter-generational period, where games were still catering for the last gen consoles, so it's not surprising that requirements haven't shot up (though I have been advocating for it). </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">But, there is a small, interesting movement in the second most required specs - from 8 to 12 GB of RAM. That's interesting on two levels - one, because there is basically no <i>recommended</i> way to build a PC with 12 GB of RAM... and two, because it gives <i>me</i> a message that developers want to push things further but are trying to boil the frog, so to speak.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Yes, system requirements will have to go up over time as games get more demanding... but any system that has 12 GB of RAM will actually, most likely, have 16 GB. But, somehow, a quarter of the games I polled are requesting only 12 GB. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">I can see this from the perspective of developers, as well: some gamers often have backlashes at system requirements that <i>appear</i> too high... the problem being that gamers have gotten used to games not advancing in requirements due to the lacklustre PS4/Xbox One and PS3/Xbox 360 console generations*. They do not remember the times where you upgraded your GPU every year or two years... So, maybe, there's a logic to <i>slowly</i> increasing specs so that they can be pointed back towards, later, when more demanding games come out.</div><div style="text-align: justify;"><b><i><span style="color: #274e13;"><blockquote>*In terms of hardware performance...</blockquote></span></i></b></div><div style="text-align: justify;">The other side of things is what worries me slightly. Are developers stating system requirements as an absolute requirement for the game to run or are they factoring in system overhead, too? If it's the latter - are there developers out there that state 16 GB of RAM as a requirement, not including OS and other background tasks overhead? I sincerely doubt it, but I had to explore the thought...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhGTE2klxDLUyQZqZxqYP7xpoUE9CKMjy4eeVZqNzHd3Tx8Qj_kYXG-kzEvGvrH8q1tR506LX3sDD-3CntJBkGVDMGciwp24UQV_fRhp8GFDZ9Ddc1m75sfBP8ykxktzuBeYL1t2MPIWI7saqnRyMqchBvpHGfRV0-kvL_yFDz3s8No9zvZKaLPWlmc/s650/RAM_video.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="370" data-original-width="650" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhGTE2klxDLUyQZqZxqYP7xpoUE9CKMjy4eeVZqNzHd3Tx8Qj_kYXG-kzEvGvrH8q1tR506LX3sDD-3CntJBkGVDMGciwp24UQV_fRhp8GFDZ9Ddc1m75sfBP8ykxktzuBeYL1t2MPIWI7saqnRyMqchBvpHGfRV0-kvL_yFDz3s8No9zvZKaLPWlmc/s16000/RAM_video.PNG" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>... and video memory is increasing at a faster rate.</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;">VRAM requirements are a bit more shocking - mostly because the advancement in VRAM quantities on low-end graphics cards has been incredibly slow since 2016. Not only has the recommended amount jumped to 8 GB but this is the first year where a game's recommended system requirements listed GPUs with a minimum of 10 GB VRAM*.</div><div style="text-align: justify;"><b><i><span style="color: #274e13;"><blockquote>*Often two GPUs are listed in requirements and I will take the VRAM quantity from the card with the lower amount... in this case, only the RTX 3080 was listed, so maybe it's not that representative.</blockquote></span></i></b></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Going back to the GPU trending analysis, we're in a bit of a strange transition period: graphics card performance requirements are increasing at a steady pace but actual, on-card VRAM quantities are stagnating because of the need to pair chips with a certain bus width and the need of the GPU chip designers to limit the bus width on lower-end cards to avoid larger costs because the memory controllers/interfaces take up large portions of the GPU silicon die.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Taking into account that APIs like DirectStorage could increase VRAM requirements, along with general increases due to higher graphical fidelity and implementations of ray tracing, there appears to be a movement towards requiring higher VRAM quantities, as standard. However, up until the current generation of cards, GDDR6 module capacities are limited to 8 Gb (1 GB) and 16 Gb (2 GB). This means that virtually all low-end and mid-range cards (ignoring the anachronistic RTX 3060 12 GB) are not suitable for higher graphical settings at 1080p, going forward, due to their typical 4-8 GB framebuffers.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Thankfully, AMD's mid-range RX 6700, and XT, jumped to 10 and 12 GB respectively, meaning that these provide enough performance to meet the expected requirements in just a couple of years for newer, triple A releases. Owners of even the similarly powerful RTX 3070 will likely have to limit some of their graphical settings in order to maintain performance - as I had to do for Deathloop.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiTaAJFYp0IUR-If0Tubb4dPP3CzY_ElolzZDKFKGlqzLwVt5FQjDEfS7C482BESqMJ-hr0hlc1cgoYZaov3S0r0PSSfCx5bjGssfNRpcaaDpQkfTylsdH7ON-p-TtYCq_qc0V8XKLIFt4euoBFi7Vqu1DUtIF564_T_vXYTRAwGEaasXp3KP9Te2DR/s1493/RAM%20percentages.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="374" data-original-width="1493" height="160" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiTaAJFYp0IUR-If0Tubb4dPP3CzY_ElolzZDKFKGlqzLwVt5FQjDEfS7C482BESqMJ-hr0hlc1cgoYZaov3S0r0PSSfCx5bjGssfNRpcaaDpQkfTylsdH7ON-p-TtYCq_qc0V8XKLIFt4euoBFi7Vqu1DUtIF564_T_vXYTRAwGEaasXp3KP9Te2DR/w640-h160/RAM%20percentages.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Percentage data for each RAM configuration per year saw a first for the VRAM requirements...<br /><br /></b></td></tr></tbody></table><br /><div style="text-align: justify;">Now, while this is a potential issue for owners of current and prior generation cards, there is a potential solution on the horizon: <a href="https://semiconductor.samsung.com/newsroom/tech-blog/a-bridge-between-worlds-how-samsungs-gddr6w-is-creating-immersive-vr-with-powerful-graphics-memory/">GDDR6W</a>. This DDR technology allows for modules of up to 32 Gb (4 GB) capacities, meaning that larger capacity modules could be paired with more anaemic memory setups - as is found on lower-end cards.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The issue with this is that it is not certain which memory technology will be applied for the next generation graphics chips. The high-end dies (used for the most expensive graphics cards) will mostly benefit from higher bandwidths - meaning that <a href="https://news.samsung.com/global/samsung-electronics-envisions-hyper-growth-in-memory-and-logic-semiconductors-through-intensified-industry-collaborations-at-samsung-tech-day-2022">GDDR7</a> seems like the more obvious choice for those future releases since it will increase speed up to 36 Gbps from GDDR6's current 16-18 Gbps. (Though, Samsung also <a href="https://www.allaboutcircuits.com/news/samsung-launches-industrys-first-24-gbps-gddr6-dram/#:~:text=A%20Brief%20Overview%20of%20GDDR6,-GDDR%20SDRAMs%20share&text=Furthermore%2C%20GDDR6%20is%20the%20successor,a%20decrease%20in%20power%20consumption.">announced 24 Gbps</a> speed GDDR6 late last year). Neither GDDR7 or this faster GDDR6 are confirmed to come in higher capacity modules, meaning that if either technology is chosen, the VRAM drought on lower-end cards might not change, unless cards such as the RTX 4050 increase their memory bus to 192 bit but, given that the RTX 4070 Ti is already at a 192 bit bus width, this seems unlikely.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">All in all, I think that memory technology and interfaces look like they will be the most interesting facet of PC gaming over the next few years...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">Prediction, prediction foretold...</span></h3><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg5r5wKMKx4mdndbAf9Cp5HtMqYIJ9feAn3jmqPfdsB2gN9v16_srZGEJGruUg5Zy1tXRwgNnpaYYYoo9QXycXXjfq-9dzNFWRqjupMb95o_KO9x3U_M4WvBajqbOmgf-3A8h3AlLtg6Cyp9AB-oiI4gSXy9_cbUPAYzgN-f4Hj0W2w3swNq4JlVZrI/s586/RAM_system_predicted.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="368" data-original-width="586" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg5r5wKMKx4mdndbAf9Cp5HtMqYIJ9feAn3jmqPfdsB2gN9v16_srZGEJGruUg5Zy1tXRwgNnpaYYYoo9QXycXXjfq-9dzNFWRqjupMb95o_KO9x3U_M4WvBajqbOmgf-3A8h3AlLtg6Cyp9AB-oiI4gSXy9_cbUPAYzgN-f4Hj0W2w3swNq4JlVZrI/s16000/RAM_system_predicted.PNG" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>I was quite close for RAM requirements, this year...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Just like the CPU and GPU prediction graphs, I also draw up RAM and VRAM capacity prediction graphs too. The difference, here, is that I am the one predicting the curve and most required, secondmost required and thirdmost required quantities because, unlike CPU and GPU performance - these are discrete quantities and not a scalable number. You either have 8 or 16 GB - you can't have 14 GB.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">These predictions are, essentially, everything I believed would come to pass, way back in 2020...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">So, how am I doing? Well, the most required 16 GB is correct but, to my surprise, 12 GB has overtaken 8 GB RAM as the secondmost required by games in 2022. Things look like they're advancing faster that I thought.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Looking forward, though, I think I was likely over-ambitious to expect 24 GB quantities being sceondmost required next year, despite <a href="https://www.tweaktown.com/news/90077/micron-has-announced-new-ddr5-memory-modules-in-24gb-and-48gb-capacities/index.html">24 GB DDR5 modules being announced</a>.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjhLhZWKeW5oR4izYSxWTby9jdS74YElH-UHwjraKrYoTIp2gZdRDw2Pl9OSG4fEd6iOMUNHSEwfiuMrzr9MxHGmRBEUMyrkw1HCV4havkorwCqUItCnZsRkKFKiSLnoaXjBceqW6N7Z4eqpzqco91oShEiTGV8tM8MU58gZYLbQA2D-FROXbajOgk5/s647/RAM_video_predicted.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="371" data-original-width="647" height="367" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjhLhZWKeW5oR4izYSxWTby9jdS74YElH-UHwjraKrYoTIp2gZdRDw2Pl9OSG4fEd6iOMUNHSEwfiuMrzr9MxHGmRBEUMyrkw1HCV4havkorwCqUItCnZsRkKFKiSLnoaXjBceqW6N7Z4eqpzqco91oShEiTGV8tM8MU58gZYLbQA2D-FROXbajOgk5/w640-h367/RAM_video_predicted.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>I was more bullish for VRAM predictions...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;">While I was mostly accurate for the first two required VRAM quantities, I was over-optimistic for the thirdmost required quantity. 10 GB VRAM may become commonly stated as a recommended requirement one day, but when only the most expensive graphics cards have framebuffers of sufficient size to accomodate that, this seems like an improbable outcome - even for next years' secondmost required prediction.</div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh4lp4qsPdUXE9MHgTpOUm8NK4xl73_xePEs89f618EYyE-jNWGU1wPQwXerWrSWqPOFQq8xv1E48-eSBl-nhbdyLGY5IlpmkklAvsKItrvjjZOIXqU6tJv6j8eJDfapR14B3PWDTtiOIMtVVTFxemUg6mOnbyhGaDdvKnW62FniUG8DVc1GPTs-8DY/s582/Predicted%20cores.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="370" data-original-width="582" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh4lp4qsPdUXE9MHgTpOUm8NK4xl73_xePEs89f618EYyE-jNWGU1wPQwXerWrSWqPOFQq8xv1E48-eSBl-nhbdyLGY5IlpmkklAvsKItrvjjZOIXqU6tJv6j8eJDfapR14B3PWDTtiOIMtVVTFxemUg6mOnbyhGaDdvKnW62FniUG8DVc1GPTs-8DY/s16000/Predicted%20cores.PNG" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>From the graph, I am FAR away from reality...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;">I also took a swing of trying to predict the number of cores most required from games, too. The graph, above, looks like I got things pretty incorrect. However, looking a the raw data, we're almost at my predictions: 12 games had 4 cores and 11 games had 6 cores, while 10 games had 8 threads and 9 games had 12 threads required.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">What we are seeing, though, is an increase in the average cores and threads required to run games. Sure, these don't actually correspond to an actual processor you can buy (at least I'm not aware of any 6 core 10 thread SKUs!) but they do indicate a trend of increasing CPU resources being required.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">So, next year, I am expecting to be correct on this one.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">Wrapping up...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh6lSTdJ4rEDTmq5zxG5IDFZIVRJ40C3Mc_uQhUzV4yziV80LjrzLEVkSgQhe9McAJZhZ1zQjekCVWZE7OVmpnZTUjf8kn6kzlK3e60UnmiN8nCz35KgFhOvZjDGfI642HHeMd5VQvj9EGdcwllaPJ9fbUTepBRaLgbQMa9V_EjV-uLRXUyLHoFBNUg/s739/Relative%20performance%20to%20consoles%20.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="444" data-original-width="739" height="385" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh6lSTdJ4rEDTmq5zxG5IDFZIVRJ40C3Mc_uQhUzV4yziV80LjrzLEVkSgQhe9McAJZhZ1zQjekCVWZE7OVmpnZTUjf8kn6kzlK3e60UnmiN8nCz35KgFhOvZjDGfI642HHeMd5VQvj9EGdcwllaPJ9fbUTepBRaLgbQMa9V_EjV-uLRXUyLHoFBNUg/w640-h385/Relative%20performance%20to%20consoles%20.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>I think that this graphic shows how underpowered the prior generations of console CPUs were...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;">We're now moving into the third year of this current generation of consoles and we're only now, in 2023, really beginning to get ports of the games that are exclusive to this generation of hardware. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">I really don't have an issue with the expense and performance of either processors or system memory - as gamers, we appear to be in good stead. Graphics cards are a different story and, despite their hugely increasing expense, I am left wondering if we are really getting enough value for our money. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">In the here and now, we're doing okay. An RX 6600 or an RTX 3060 8 GB will play 1080p games at the recommended settings <i>now </i>but, in just two short years, will they be able to? </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Even mid-to-high end cards like the RTX 3070 and 3070 Ti lie on that cusp of being functionally hindered by their VRAM capacity, whereas the AMD equivalents all have sufficient memory to last - just as we saw back in 2016 with the RX 400 series releases.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">In the past, I've advocated for spending a lot (and keeping your GPU for years) or spending as little as possible (and upgrading more often). </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">I think that advice still stands.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Next time, I want to update the cost of buying a computer that meets the current averaged recommended system requirements for gaming, covering 2020 to 2022. I'm also working on a blogpost covering the power scaling of the RX 6800 as I would like to see how the efficient chip of RDNA 2 compares to the efficient chip of Ampere - the RTX 3070.</div>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-7560610393342650347.post-4963651712807105722023-01-15T14:46:00.004+00:002023-07-20T16:27:20.336+01:00Yearly DirectStorage rant (Part 3)...<div style="text-align: left;"><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjxFvzNixfxbc9NXwTUbpb_XlgI8lqZGolFumbMU0VzlMt8ZDeZtaVmATYkJzjvVvc05xXP0ola-UVq5IlC3nFI_YPzTv0haakrUdZuIYFd52OVFBvdNPgal57QKDFsAAevidIs_M8VEKXoS5A8TGd6_Goynz72h51I2_SPhpNvRfUnUdl-FtIWSHc2/s1911/Avocado%201.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="1037" data-original-width="1911" height="348" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjxFvzNixfxbc9NXwTUbpb_XlgI8lqZGolFumbMU0VzlMt8ZDeZtaVmATYkJzjvVvc05xXP0ola-UVq5IlC3nFI_YPzTv0haakrUdZuIYFd52OVFBvdNPgal57QKDFsAAevidIs_M8VEKXoS5A8TGd6_Goynz72h51I2_SPhpNvRfUnUdl-FtIWSHc2/w640-h348/Avocado%201.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"></td></tr></tbody></table><br /><div style="text-align: justify;">Yeah, <a href="https://hole-in-my-head.blogspot.com/2021/04/directstorage-and-its-impact-on-pc.html">I know this</a> <a href="https://hole-in-my-head.blogspot.com/2022/03/directstorage-again.html">is getting bothersome</a> and tiring but, as we finally approach the release date of Forspoken - the first game to include DirectStorage as a way of managing data from the storage device on your PC - I've noticed a trend of people posting about the topic in a very uncritical manner.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">So, let's take a quick look at that and let me tell you my doubts...<span><a name='more'></a></span></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">I don't hate it, I just don't like it...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Look, by now you probably have the feeling that I have some sort of vendetta or family blood feud with DirectStorage (DS for short); I don't! I really don't. But I do have the opinion that these developmental resources could have been better spent elsewhere and that developers could be demanding a larger quantity of system RAM and using more CPU cores instead of requiring more expensive NVMe SSD technology instead of using the already fully taxed GPU to improve data loading. The PC is not a console and shouldn't have console-focussed technologies applied to it because their purposes and strengths are largely very different.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">I also do not like how this technology has been presented... with very misleading comparisons and obfuscated improvements (<a href="https://hole-in-my-head.blogspot.com/2022/03/directstorage-again.html">as I detailed last time</a>).</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">This is, once more, one of those times.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjg_e4KfNFHVxMgfCqlb9FEZoLpaRs3s4pCE1Qn7dnThr-qfTyCjN0AfWdNqyaaBldY4R85xJ3RQUPhnOnss2k_ZAowXR-guu3paMeYR2yaY-a4Xeo077nDFWgqefRyZdhOI2ggOcZUZQKSwHbM0Gg_IKx8Ap0WmJz4sH699WGOzhVvcq-Di8Li92be/s1122/Social%20blow%20up.png" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="519" data-original-width="1122" height="296" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjg_e4KfNFHVxMgfCqlb9FEZoLpaRs3s4pCE1Qn7dnThr-qfTyCjN0AfWdNqyaaBldY4R85xJ3RQUPhnOnss2k_ZAowXR-guu3paMeYR2yaY-a4Xeo077nDFWgqefRyZdhOI2ggOcZUZQKSwHbM0Gg_IKx8Ap0WmJz4sH699WGOzhVvcq-Di8Li92be/w640-h296/Social%20blow%20up.png" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Why all the fuss?</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">Avoca-doh's...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The thing that got me to write about this subject again was the recent resurgence of the <a href="https://github.com/microsoft/DirectStorage/blob/main/Samples/BulkLoadDemo/README.md">Avocado bulk load demo</a> (that was put out last November) in the tech media/tech-affiliated people I have ended up in a circle with on Twitter. Oh, and another <a href="https://www.tomshardware.com/news/directstorage-performance-amd-intel-nvidia">follow-up post</a> from Tom's Hardware regarding Arc performance.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The trouble here is that I don't think these numbers mean what people think they mean.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The first, most important caveat is that the bulk load demo <i><b>is not doing any bulk loading of different data</b></i>. It's just a demo for how to implement the feature/API calls in a shipping piece of software. There is just a single model and (as far as I can see) two textures for that model which are batch loaded multiple times in a (presumably) single request.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">As a result there are several things that stand out to me:</div><div style="text-align: justify;"><ul><li>The whole programme, plus model and textures, clocks in around 30-40 MB, at most.</li><li>Many SSDs use a <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7051071/#pone.0229645.ref005">DRAM read cache</a>. Some do not.</li><li>System RAM is still used in this demo.</li></ul></div><div style="text-align: justify;">So, what is actually being measured here?</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">It's certainly not moving 8 GB of data into the VRAM (based on my 16 GB buffer). What it is doing is moving the repeatedly requested compressed files into RAM and then shoving them onto the VRAM/GPU for decompression.</div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhrKKz5P-O3jjg_AyE_YDKDP3M_yx99iiySV09Iv21imYuA4vPl5wQBzbiLH3yusrTgIoQEH3Gva9ztQION0GbR0jIjUVs8a6FK1VeDDjP6FZ4uTJa0i0MAhFaK6BRDiCNP-fOcJ1MBpPEnpeTyg-cXTcUs6tw1iQHCmqmHNoE1g0q6SqZhqDau-6Np/s1078/VRAM%20usage.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="627" data-original-width="1078" height="372" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhrKKz5P-O3jjg_AyE_YDKDP3M_yx99iiySV09Iv21imYuA4vPl5wQBzbiLH3yusrTgIoQEH3Gva9ztQION0GbR0jIjUVs8a6FK1VeDDjP6FZ4uTJa0i0MAhFaK6BRDiCNP-fOcJ1MBpPEnpeTyg-cXTcUs6tw1iQHCmqmHNoE1g0q6SqZhqDau-6Np/w640-h372/VRAM%20usage.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Seems my SN750 and RAM are working at the correct bandwidth... but what's with that VRAM value?</b></td></tr></tbody></table><br /><div style="text-align: justify;">It's easy to show huge bandwidth numbers when you're taking the final uncompressed number. First off, the numbers reported are just theoretical - it's the amount of data that is moved divided by the amount of time to move the compressed data and uncompress it. If we wanted to be more accurate, the values should be labelled "effective" instead of presenting them as an absolute number.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">My own calculations put the actual amount of data transferred from the storage device to be around 2.4 GB (taken as an average across three different drives). If you have a smaller VRAM quantity, then the number of textures transferred is adjusted appropriately - so my RTX 3070 sees only 2560 textures transferred, for example.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The problem I have now with people really loving these high numbers, is four-fold:</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><ol><li>The SSD doesn't need to look-up multiple addresses for myriad files spread across the storage media - so the latency and pentalty for doing so is not accounted for here.</li><li>Once read, the DRAM read cache on the SSD will likely be storing these files - they are no longer being read from the disk at this point.</li><li>The data is still being written into system memory before being read to VRAM and then wiped - it's an egregious waste of RAM. The following times the data is read in the subsequently repeating tests could be <i>MUCH</i> faster than it is.</li><li>For multiple of the same object with the same textures, there is no reason or need to even perform this (though, of course, I appreciate this is a concept demo - as I stated above).</li></ol></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">What we are essentially seeing here is a scenario which will <i>never</i> play out in a real-world application.</div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjgpAarhD6MSaI7rroOVAr97vVLccppIbKb1elwwl8jpXuKGStapglWvmwz6dQC_glx6ZAkNu0pFotGHWncXVBmea_vBYFImsdIN8PTdep7MGv0VgMPYPBGqqufMF5ecz94OTHG2tU45tzENCycJTMEiUANC2Cd59yPDnHF9lvIg7GUaYZq35ypFdq1/s1082/Crucial%20P1.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="634" data-original-width="1082" height="376" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjgpAarhD6MSaI7rroOVAr97vVLccppIbKb1elwwl8jpXuKGStapglWvmwz6dQC_glx6ZAkNu0pFotGHWncXVBmea_vBYFImsdIN8PTdep7MGv0VgMPYPBGqqufMF5ecz94OTHG2tU45tzENCycJTMEiUANC2Cd59yPDnHF9lvIg7GUaYZq35ypFdq1/w640-h376/Crucial%20P1.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>The Crucial P1 performs in-line with its specs relative to the WD750 (above)...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;">For points 1 and 2, there are many penalties when accessing large amounts of random data from the storage device. These inefficiencies, such as reaching a DRAM or controller bottleneck, are not accounted for or addressed due to the nature of this demo and, as such, the numbers we're seeing are inflated or "best case scenarios" because of this.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Even with the appreciably worse performance of the DRAM-less SN570, despite the numbers in HWInfo not making any sense* for a smaller amount of data transferred, we observe degradation of performance for each sequential test cycle down to a minimum. That's still better than the crucial P1, but again, this is one of the huge pitfalls of a non-standardised hardware system.</div><div style="text-align: justify;"><b><span style="color: #274e13;"><blockquote><i>*I double checked in Crystaldiskmark and can see that the drive is behaving normally but the numbers actually match up with </i><a href="https://www.tomshardware.com/reviews/wd-blue-sn570-review/2" style="font-style: italic;">Tom'sHardware's findings</a><i> of a large data transfer - potentially pointing towards a controller limitation and not </i>just<i> a cache issue.</i></blockquote></span></b></div><div style="text-align: justify;">Additionally, on the decompression side of things, virutally no assets will be this efficient in a released game and they will not all be nicely uncompressed in the same timeframe (due to size, complexity or whatever) across the GPU cores in the manner that these textures are. So, we are likely to see sizeable deviations from these idealised "effective bandwidth" numbers.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiykm3AZt2L3TzBhKjEDxcw51ipARFlX1g04NqdakZQVk4yT9TxTJ-i3x2u8Q9kTs5PZUl_7gdChMOtkWwE8RsSaDws-AoXV6N6armnbNVpecJVDHm3FNK7z0qZQesnxNlpFl33W2uHL7-PvhwTVhOF0z0zfAPNer7X_YUEY55mRpGzCpfW94LhqTNT/s1315/SN570_degradation.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="879" data-original-width="1315" height="428" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiykm3AZt2L3TzBhKjEDxcw51ipARFlX1g04NqdakZQVk4yT9TxTJ-i3x2u8Q9kTs5PZUl_7gdChMOtkWwE8RsSaDws-AoXV6N6armnbNVpecJVDHm3FNK7z0qZQesnxNlpFl33W2uHL7-PvhwTVhOF0z0zfAPNer7X_YUEY55mRpGzCpfW94LhqTNT/w640-h428/SN570_degradation.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>But the DRAM-less SN570 has relatively terrible performance, in comparison, as the SLC cache fills up...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Point 3 is the most egregious from my point of view: the data has been copied to RAM - there is no reason to discard it in case it is required again. It's easily provable that RAM has much faster read and write speeds than even the fastest PCIe gen 5 SSDs (and lower latency, to boot!)... in my book, it's a literal crime to focus on moving assets directly from storage to VRAM like this, ignoring one of the cheapest and most performant parts of the PC in the process.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhtaUwqP4lBP5yQWXr7TsBcZrenIEuEpbhW1XdiO-XLqL_XyiIZfxK5DJl-g8A4nqrtPyQwy1BvTXT4-wIPn7PszX8WBnYlFZq-3MYgTz6kmx08FsC5uxmtON8z-kC0WzZi_Ms1QNTCTXV2yPVT6oJ6gqFdCiaaT6UASxZ_Zaf2RmRB1bLILepbS0j_/s1074/Samsung%20EVO.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="624" data-original-width="1074" height="372" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhtaUwqP4lBP5yQWXr7TsBcZrenIEuEpbhW1XdiO-XLqL_XyiIZfxK5DJl-g8A4nqrtPyQwy1BvTXT4-wIPn7PszX8WBnYlFZq-3MYgTz6kmx08FsC5uxmtON8z-kC0WzZi_Ms1QNTCTXV2yPVT6oJ6gqFdCiaaT6UASxZ_Zaf2RmRB1bLILepbS0j_/w640-h372/Samsung%20EVO.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Finally, the SATA-based EVO 860 did an admirable job... but it just isn't designed for batch requests - though at least the DRAM cache made sure its performance was consistent.</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">...and your little dog, too!</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">My mind boggles at people gawping and gaping at the huge (non-real) numbers in this demo meant to help developers implement this function in their software. A little critical thinking would render these test results a little less impressive...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">However, even ignoring that, I am <i>again</i> brought back to what I wrote last time:</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><b><i><span style="color: #274e13;">"The last thing to add to this equation of confusing decision-making is that SSD performance can vary wildly between drives and even for the same drive depending on how full it is! With drives filled to 80% capacity losing up to 30% random read performance for queue depths larger than 64 and 40% sequential read performance for queue depths of less than 4. Games are big, they take up space!"</span></i></b></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">And <a href="https://hole-in-my-head.blogspot.com/2021/01/2021-pc-gaming-rant-or-how-directx-12.html">the first time</a>:</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><div><i><b><span style="color: #274e13;">"The PC gaming experience is floundering because of this hyper-optimised console crap and it's getting really frustating. Requiring an SSD to cover up your lack of using system memory? It's crazy! Once games have standardised 32 GB of RAM, it's there - for EVERY game. You can't guarantee the read throughput of an SN750 black (gen 3) or an MP600 (gen 4) or ANY SSD. In fact, the move to QLC NAND for higher densities completely destroys this concept.</span></b></i></div><div><i><b><span style="color: #274e13;"><br /></span></b></i></div><div><i><b><span style="color: #274e13;">Worse still, taking that degradation into account, you can't even fill up your SSDs! You need to leave them at around 70% (depending on NAND utilisation, DRAM presence, controller design, etc, etc...)."</span></b></i></div></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">We can add to those observations DRAM-less SSDs being unable to maintain a consistent performance when under heavy load and, <i>more worryingly, </i><a href="https://www.tomshardware.com/features/the-directstorage-advantage-phison-io-ssd-firmware-preview?utm_source=twitter.com&utm_campaign=socialflow&utm_medium=social">increase of wear</a> on the SSD due to the heavier access - something that Phison wants to mitigate with "special" firmware on specific drives. Which just screams to me "these will be more expensive for no reason"...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjdHjmfZGbrue7Dv_dGMm_puHW6lv5gUS5_2n0uIbAWSx0x-howa-yMsgXBoOJEWbWeNbxV7WDfDDmFNhS07jqHmwtQ8D3nlqwkvk61FTVkfnI-2nixiedHgjHBwDIkM9ZFYenjCFl0SSM5wNvSmpLZa-BoxP9ucipG9FfG7NorlHo2H49Mn5z964Wy/s778/Bandwidth%20comparison.PNG" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="91" data-original-width="778" height="74" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjdHjmfZGbrue7Dv_dGMm_puHW6lv5gUS5_2n0uIbAWSx0x-howa-yMsgXBoOJEWbWeNbxV7WDfDDmFNhS07jqHmwtQ8D3nlqwkvk61FTVkfnI-2nixiedHgjHBwDIkM9ZFYenjCFl0SSM5wNvSmpLZa-BoxP9ucipG9FfG7NorlHo2H49Mn5z964Wy/w640-h74/Bandwidth%20comparison.PNG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Last time I published this comparison, DDR5 wasn't even present. Now? There's even less reason to implement DirectStorage...</b></td></tr></tbody></table><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Adding fuel to the fire: DDR5 is now in wide circulation and adopted by both CPU manufacturers. The extra possible bandwidth and, in theory, more dense DIMMs* coupled with the proliferation of CPU cores, even at lower price points** means that the window for DirectStorage to be actually relevant (if it ever could be) should be essentially closed. This is a technology developed with standardised and limited hardware in mind (like the consoles) - a low-end PC will not have a super-duper expensive NVMe drive and instead of pushing developers to require more RAM (a one-time, inexhaustible purchase!) Microsoft is instead trying to implement a technology that is backwards-looking.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Still, I've made what I consider to be a bold prediction at the beginning of this year that <a href="https://hole-in-my-head.blogspot.com/2023/01/looking-back-at-2022-and-predictions.html">new games will begin recommending 32 GB of RAM</a> as standard... let's see if it comes true.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><blockquote style="color: #274e13; font-style: italic; font-weight: bold;">*Though this was a promise that has not come to pass, thus far...</blockquote><blockquote style="color: #274e13; font-style: italic; font-weight: bold;"><p>**The i5-13400 is now a 6+4 core part!!</p></blockquote><p>Because, if it doesn't, we will be in a situation where we are pushed to buy much more expensive storage than is necessary, worry about SSD wear, and be made to juggle games around on various drives in order to get the best performance... and I just don't want to see that happen.</p></div></div>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-7560610393342650347.post-79850348783076552672023-01-08T18:57:00.004+00:002023-01-08T18:57:55.419+00:00Analyse This: Does RAM speed and latency make a difference for gaming...? (Part 4)<div style="text-align: justify;"> <table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEggNoDgNi7ttW12Pg0-29rpgOgsoCBgDNmiVtOZPJ8E1jqld33wHqelgJoeji-EZ6LFRLW1ZXsSY4Dy9VdW8enPM7SMq44-GTjkTx0AJzGHAbzg3e2MxaQYZ1qtSPtfHP4a_PiEHnuQK7HlPkpWFxjWo42X0ZErqt9kpoh2YWzt-mNVGOuxqbTqv83M/s1920/Title%20image_pt%204.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="1080" data-original-width="1920" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEggNoDgNi7ttW12Pg0-29rpgOgsoCBgDNmiVtOZPJ8E1jqld33wHqelgJoeji-EZ6LFRLW1ZXsSY4Dy9VdW8enPM7SMq44-GTjkTx0AJzGHAbzg3e2MxaQYZ1qtSPtfHP4a_PiEHnuQK7HlPkpWFxjWo42X0ZErqt9kpoh2YWzt-mNVGOuxqbTqv83M/w640-h360/Title%20image_pt%204.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Uber RAM...</b></td></tr></tbody></table><br /></div><div style="text-align: justify;">I've looked at the performance of RAM speed <a href="https://hole-in-my-head.blogspot.com/2022/10/analyse-this-does-ram-speed-and-latency.html">over</a> <a href="https://hole-in-my-head.blogspot.com/2022/11/analyse-this-does-ram-speed-and-latency.html">the last</a> <a href="https://hole-in-my-head.blogspot.com/2022/12/analyse-this-does-ram-speed-and-latency.html">few entries</a> and come to a few conclusions:</div><div style="text-align: justify;"><ul><li>People are wont to interpret data incorrectly - or too little data.</li><li>On mid-range systems (or below): Pushing RAM to get the lowest possible latency (and system latency) in synthetic tests really does not correlate well with actual game performance...</li><li>On mid-range systems (or below): Pushing RAM to get the highest possible bandwidth (and system bandwidth) in synthetic tests really does not correlate well with actual game performance...</li><li>Intel and AMD architectures handle memory access in quite different ways - this may be an indication as to why Intel has historically had better gaming performance than AMD.</li><li>On mid-range systems (or below): RAM speed past DDR4 3200 really doesn't matter too much in gaming applications.</li><ul><li><u>What DOES matter is the quality of the memory IC!</u></li><li>Samsung B-die is well-known for its overclocking and latency-reducing ability... but even at the same stock settings as another chip show a marked improvement on both AMD and Intel systems for higher-framerate gaming. No overclocking or tightening of timings required!</li></ul><li>You cannot <i>just</i> look at static metrics like min, 1% low, average and maximum framerates to determine game performance - It doesn't show you the whole picture. </li><ul><li>Nowadays, we should be looking at the smoothness of the per-frame presenation. You can do this by adding simplistic numbers like standard deviation of the frame-to-frame variance... or you can plot nice graphs of the per frametime distribution during the benchmark run treated with the natural log (in order to normalise the results from the extremes).</li></ul><li>The differences are <i>pretty small</i>... when taking everything into account. Optimising RAM timings and speed is the sort of thing people who are obsessed with an activity will do. I did enjoy seeing synthetic benchmark numbers go up until I realised that, after looking at all the data, it was all pointless anyway. You get more out of your time by buying the best memory IC at a decent speed (DDR4 3600 or 3800) and spending more money on your CPU and GPU and overclocking them than you do from optimising your lower quality RAM. </li><ul><li>Of course, you probably wouldn't have known what was low or high quality RAM when you bought it! I didn't.</li></ul></ul><div>So, with that summary of conclusions out of the way, let's head into the final entry in this series - raytracing.<span><a name='more'></a></span></div></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">It's a game of two halves...</span></h3><div><br /></div><div>We only have two comptetitors in the desktop space and, really, though more might be preferable, the walled gardens of Apple and ARM advocates do not entice me... nor are their low-power efficiencies scaling as well as x64 does. So, we're stuck with two competitors who, seemingly, want to rip each other's throats out.</div><div><br /></div><div>It really is a strange dichotomy when looking at the CPU (AMD vs Intel) and GPU (AMD vs Nvidia) arenas but it's clear from the pricing that we have <i>really </i>good comptetition in the former and really poor competition in the latter.</div><div><br /></div><div>You can make the arguments that "researchers" and "miners" and other high-return/investment portfolios are subsuming the "relatively low margin" gamer market... and that prices are increasing and that "inflation is real".</div><div><br /></div><div>As I've pointed out, before. The long and short of it is that no other computer hardware segment is experiencing these forcings* as they are for the GPU. So, it's either <i>not a thing</i> or it's a highly specific thing that, for some reason, no one is specifically talking about... and they're not. No one is. Not even Jensen - it's just that things are getting more expensive and we should expect increased prices for increased performance.</div><div><br /></div><div>These arguments ring hollow for me...</div><blockquote><div style="text-align: justify;"><b><i><span style="color: #274e13;">*That's a hold-over from my climate research days...</span></i></b></div></blockquote><div style="text-align: justify;">Anyway, remember - these are my custom Spider-man benchmarks running with raytracing enabled (max settings and max LOD, etc), scaling over system RAM speed and then tightening the timings down, in order to prove whether tightened timings and/or faster RAM speeds actually result in better performance.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Yes, there are limitations on this study - it's all mid-range parts. At the high-end, you <i>will</i> most likely see performance gains when doing optimisations... the issue here is that the VAST majority of people will see the results at the high end and spend hours, days or <i>inordinate</i> amounts of time trying to improve their system looking at metrics (such as memory latency and bandwidth as defined by synthetic benchmarks, like AIDA64) and then <i>just playing</i> thinking that their overall game experience is better than it was before.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">This is the issue I have with those people who espouse memory tuning and optimisation for gaming: they spit and spout with very little data to back themselves up...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">I've covered that, mostly, now we're looking at the Raytracing performance of systems that are essentially "identical"... (as always, the data I used and collected is <a href="https://docs.google.com/spreadsheets/d/1Vsk0DI3SMw9S8me8eO99vezdTYRC1cPhkE5BoJyz2Ts/edit?usp=sharing">found here</a>)</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h4 style="text-align: justify;"><span style="color: #274e13;">DDR4 3200...</span></h4><div><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEieREZN-hv029rlQBJ3euiMlumzaaDhTXijhhYAKszrbXkX4rX_SIu3_XngH85ggvzCGpOXE1fdkVS_k4LCDFMRQXoS1I3OL1BB3SvFMlWCxWxk0ipFyaRvtkq_1UfSnKNHhx7QdR5E7Q43uQIY_KA5IHpzq0Co00rxvPX2aogEyOKEDXE4MWr9VUy8/s954/RT_Intel_3200.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="434" data-original-width="954" height="293" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEieREZN-hv029rlQBJ3euiMlumzaaDhTXijhhYAKszrbXkX4rX_SIu3_XngH85ggvzCGpOXE1fdkVS_k4LCDFMRQXoS1I3OL1BB3SvFMlWCxWxk0ipFyaRvtkq_1UfSnKNHhx7QdR5E7Q43uQIY_KA5IHpzq0Co00rxvPX2aogEyOKEDXE4MWr9VUy8/w640-h293/RT_Intel_3200.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>CL16, with some tightening wins... but not the lowest latency.</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;">We start with what is essentially the baseline of DDR4 system performance in 2022 and, arguably, for the last few years. The Intel system doesn't reach peak "performance" (i.e. best framerate, best sequential frame presentation) at the lowest latency, but, really there's not a lot of difference between all the memory timings here - <a href="https://hole-in-my-head.blogspot.com/2022/12/analyse-this-does-ram-speed-and-latency.html">as there was last time</a>. What is obvious, is that enabling RT causes a lot more system stress... but while that extra stress does affect min, max, average framerates, it doesn't materially affect sequential per-frame presentation. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Yes, sure - the per-frame presenation is <i>less</i> stable when running RT but NOT so much more terrible than it was with RT disabled. In fact, we can see that, despite RAM subtiming optimisations, raytracing is stymieing and normalising any memory timing "optimisations".</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjFak4ja74wRxma1jTOo1c66zl0mguhBWocu1V_0yNQ867f7oFt-FeKIOfA2h9Yfz3D5njJtYeeT0dW8cYK-I8P9KlwmwXJ2t_skpJzHC2-312PRPf5wbUofh9YLh2-Y2i02s1Q_iOTQBi9-_zFf9Plh-ZC_hH7TLdIwtBz_ozAyg8wOImlwT2wun-P/s1010/RT_AMD_3200.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="435" data-original-width="1010" height="276" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjFak4ja74wRxma1jTOo1c66zl0mguhBWocu1V_0yNQ867f7oFt-FeKIOfA2h9Yfz3D5njJtYeeT0dW8cYK-I8P9KlwmwXJ2t_skpJzHC2-312PRPf5wbUofh9YLh2-Y2i02s1Q_iOTQBi9-_zFf9Plh-ZC_hH7TLdIwtBz_ozAyg8wOImlwT2wun-P/w640-h276/RT_AMD_3200.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Lower is better for Ryzen...</b></td></tr></tbody></table><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The opposite is true for the AMD system. The 5600X definitely runs better with the lowest memory latency and total bandwidth throughput - though it's not a straight road to "better". Some subtiming optimisations result in worse performance... and, in all honesty - the difference between all tested settings <i>is tiny</i>.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h4 style="text-align: left;"><span style="color: #274e13;">DDR4 3600...</span></h4><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjQWLs0CwrBH25eiVsNbENNCnpgcyZUs9p0KgbH-KtKGZAW-EMh2HT-gtIFjJwz6rapG-_qeR3_f3xHbIK_Wqf3xCb1mDwcUskmpVuVvgMh9Ug1mPIRpnzsUikQLzMOnUIwASVmWCDjrpqIORXKbd6ICbrTZphLR72O_tcBL4u__ceavEbifowXCcvS/s1008/RT_Intel_3600.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="433" data-original-width="1008" height="274" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjQWLs0CwrBH25eiVsNbENNCnpgcyZUs9p0KgbH-KtKGZAW-EMh2HT-gtIFjJwz6rapG-_qeR3_f3xHbIK_Wqf3xCb1mDwcUskmpVuVvgMh9Ug1mPIRpnzsUikQLzMOnUIwASVmWCDjrpqIORXKbd6ICbrTZphLR72O_tcBL4u__ceavEbifowXCcvS/w640-h274/RT_Intel_3600.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Once again, with any sort of sub-timing optimisation, the performance essentially normalises around a central point (approximately 13 ms per frame). There's no benefit to the i5 having any sort of optimisation beyond the very minimal.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgIZJQj6AOFtw9CjKJMCL7lrWNAiiXPQ5JI0xKR7gGaLuaq03aAY9echou3MAhi0YNl9HP6N8tNs_J9mg7XfbyiRmnuxO-lLSITJ6XmiwBghoBT_3HRhS_nzRYT7kuyZJoWrKtUbTOsIX4s0D_P_mEb5ElrRS-7qiRNRz06VKtDGLOjnaHn8oZloKkZ/s1008/RT_AMD_3600.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="434" data-original-width="1008" height="276" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgIZJQj6AOFtw9CjKJMCL7lrWNAiiXPQ5JI0xKR7gGaLuaq03aAY9echou3MAhi0YNl9HP6N8tNs_J9mg7XfbyiRmnuxO-lLSITJ6XmiwBghoBT_3HRhS_nzRYT7kuyZJoWrKtUbTOsIX4s0D_P_mEb5ElrRS-7qiRNRz06VKtDGLOjnaHn8oZloKkZ/w640-h276/RT_AMD_3600.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"></td></tr></tbody></table><br /><div><div style="text-align: justify;">The R5 again sees the progression from worst to best does correlate with lower latency - again differences are small, with the frametime hovering around the mid-point of 14 ms. Unfortunately, it appears I didn't really think to test worse memory settings - I just tested stock Patriot specs versus optimised. If I'd have realised when I did all this testing last year (unfortunately, this was right at the start of my memory optimisation testing journey) I would have input some CL16 and CL18 settings to see the improvement*... which we <i>can </i> see in our next speed.</div><blockquote><div style="text-align: justify;"><b><i><span style="color: #274e13;">*This limitation to the dataset doesn't apply to the Intel testing since I was more experienced by that point and had refined my testing methodology...</span></i></b></div></blockquote><div style="text-align: justify;"><br /></div><h4 style="text-align: justify;"><span style="color: #274e13;">DDR4 3800...</span></h4><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEitQgLkZO4q-Qu9PD6Rw1dZOJKGk8-f0Zip22gUmiy7qCKUUtbyg_YdHWzNYewVaqC3icxIu7holW8vT62E28iuQnsHjgwt_x4lyqsX13zkp0kliyzl8mgPiisbYv_zA0ULpVHuHixujWEQujX-fJTJWDWBivao3UkHvEYiOwzNdISAe42REyNiojYF/s1010/RT_Intel_3800.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="435" data-original-width="1010" height="276" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEitQgLkZO4q-Qu9PD6Rw1dZOJKGk8-f0Zip22gUmiy7qCKUUtbyg_YdHWzNYewVaqC3icxIu7holW8vT62E28iuQnsHjgwt_x4lyqsX13zkp0kliyzl8mgPiisbYv_zA0ULpVHuHixujWEQujX-fJTJWDWBivao3UkHvEYiOwzNdISAe42REyNiojYF/w640-h276/RT_Intel_3800.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>This is the first time we're testing at 1:2 controller to memory frequency...</b></td></tr></tbody></table><br /><div style="text-align: justify;">The process repeats itself for the i5... but the improvement of the per-frame presentation (aka smoothness) can be seen. I didn't mention it for the prior two memory speeds but the best smoothness is achieved at settings which do not produce the lowest latency or highest bandwidth... but this is the first time that it does not hold true for the Intel chip - the lowest latency gives the best result and that might be tied to the fact that we've had to switch up into Gear 2 mode where the memory controller is operating at half the RAM frequency. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">I can imagine that latency is more important in this mode.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjzWh5FxrnH-_zq42q_qRg5kox2aNKt9VIljEKsNMBgadqBLQ1NgWyi0I5jbXUK5NRKjfcdVqVeKwmFgbhvHmDT7pN5Weqt7Jm6SMjwoLMbh2nS_mutPl-MpS2ZMhvEO4VUFrm_BwIS1S59xGNcBEICyRqAwScw3PMHDT9BkexiS8Jt_lO1nB4EgJYD/s1006/RT_AMD_3800.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="435" data-original-width="1006" height="276" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjzWh5FxrnH-_zq42q_qRg5kox2aNKt9VIljEKsNMBgadqBLQ1NgWyi0I5jbXUK5NRKjfcdVqVeKwmFgbhvHmDT7pN5Weqt7Jm6SMjwoLMbh2nS_mutPl-MpS2ZMhvEO4VUFrm_BwIS1S59xGNcBEICyRqAwScw3PMHDT9BkexiS8Jt_lO1nB4EgJYD/w640-h276/RT_AMD_3800.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>The R5 had no issue keeping controller and memory frequency at a 1:1 ratio...</b></td></tr></tbody></table><br /></div><div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">This time, however, the AMD system shows us a relatively* large change across the RAM subtiming optimisations that I tested. Not only does the difference between the settings with the highest bandwidth and lowest latency giving us a visual indication that performance is improving but we can see the average fps for the whole benchmark moving from 68 - 72 fps at CL15. If I were one to believe that stating percentages is useful - that's a 5% improvement! Unfortunately, I am not. This level of performance difference is not noticeable for any human playing a game.</div><div style="text-align: justify;"><b><i><span style="color: #274e13;"></span><blockquote><span style="color: #274e13;">*I said relative, not large! </span></blockquote></i></b></div><div style="text-align: justify;">Saying that, moving from CL19 to CL15 netted us a average 5 fps improvement, and to the best (no.2 to no.8) an average 9 fps improvement, with better lows, to boot. That's a 14% improvement (for those wanting that relationship). The issue here is that good quality RAM integrated ciruit dies will already ship with the lower CAS latency setting in their XMP profiles... so, we're not going to be talking about that as an improvement, we're back to the average 4 fps mentioned above.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">This just goes back to my summary in the intro to this article - buy the RAM with better ICs, there's no need to optimise. How to learn what those are? That's a different and more difficult story.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h4 style="text-align: justify;"><span style="color: #274e13;">DDR4 4000...</span></h4><div><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhbBCWd-8pQLedgl44kpmt-c5j9j3ZaPisDQGZKsnNWJ3DZBo2uuAnH9UzIZbUvtS67bqN8dsAc1Y6b8HvgIi4pNjJh_qUd41XZ2frdARyh8mlwtARkOq5PEWxNMIkeQ_FwUuki7fyNlHtkDdA37lyXzfBsmLC8JmBbPgK58s-PhuyrxWoMSJAbf6uN/s1008/RT_Intel_4000.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="433" data-original-width="1008" height="274" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhbBCWd-8pQLedgl44kpmt-c5j9j3ZaPisDQGZKsnNWJ3DZBo2uuAnH9UzIZbUvtS67bqN8dsAc1Y6b8HvgIi4pNjJh_qUd41XZ2frdARyh8mlwtARkOq5PEWxNMIkeQ_FwUuki7fyNlHtkDdA37lyXzfBsmLC8JmBbPgK58s-PhuyrxWoMSJAbf6uN/w640-h274/RT_Intel_4000.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>The 12400 wasn't really able to scale past DDR4 3800...</b></td></tr></tbody></table><br /><div>Now, although we're still operating under the Gear 2 memory regime, I found myself more limited in what the CPU was capable of in terms of memory timings - hence the reduced number of benchmarks. The best result is not at the lowest latency settings, and if you want to be pedantic about it, we're losing to an average fps from 80 to 77 (so 3 fps! OMG!) but when looking at the stability of the presentation and the worst frametime numbers, the lowest latency configuration performs worse in both of those aspects.</div><div><br /></div><div><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjf7ojEX-ZkyJewz5ecWMpHxr8nu8getxZMwO3A_Ictv1OPAtxiB3YKzsWNW0MWzhk8LszBNnHZfVruo9lw5-cffQoMagEk__uiryL9utwKyO1PunZZPul_0QWpZNcbZ55w5wvQKaQdFcKHzrdIGLT-zSKSz3weixAUlGnaIHzgUdWuGD8YzeAtPAOz/s1004/RT_AMD_4000.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="432" data-original-width="1004" height="276" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjf7ojEX-ZkyJewz5ecWMpHxr8nu8getxZMwO3A_Ictv1OPAtxiB3YKzsWNW0MWzhk8LszBNnHZfVruo9lw5-cffQoMagEk__uiryL9utwKyO1PunZZPul_0QWpZNcbZ55w5wvQKaQdFcKHzrdIGLT-zSKSz3weixAUlGnaIHzgUdWuGD8YzeAtPAOz/w640-h276/RT_AMD_4000.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Meanwhile, the 5600X was steaming ahead with the flexibility of its memory controller...</b></td></tr></tbody></table><br /><div><br /></div><div>For this memory speed, I forced a 1:2 ratio at the worst settings (CL19, no.1) thinking that it would show that Ryzen was better off in the 1:1 controller:memory frequency regime but, in reality, this performed better than 1:1 (no.2)! Something I did <i>not</i> expect. </div><div><br /></div><div>This is something that maybe I can explore in future because the prevailing wisdom has always been that 1:1 is preferable over a 1:2 ratio... Moving on from that result, the lowest latency and highest bandwidth once again gave the best results. So, pretty much par for the course.</div><div><br /></div><div><br /></div><div><br /></div><h4 style="text-align: left;"><span style="color: #274e13;">DDR4 4200...</span></h4><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Here's where I'm running into the limits of the memory controller on my chips. I'm posting the results of DDR4 4200 and 4400 for completeness and for the final comparison, rather than any particularly interesting insights...</div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjECsPq3BxTsDfqYuMyFZNkrN6EA-E1T5QWRq0OiLoaoDNdBZ4RRmi4QIsO6pUQsucoIebkuLl2IgBk6giY9h_1eje3ehlWwcZWI4fc3Sptg5t8l0RcTqRGDQ45235MXviCELbm1_XoTF_ctk-IEBz6tRDFBOJkHq6TGg7BNPilqhYN0f90XMrOWM4U/s1006/RT_Intel_4200.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="431" data-original-width="1006" height="274" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjECsPq3BxTsDfqYuMyFZNkrN6EA-E1T5QWRq0OiLoaoDNdBZ4RRmi4QIsO6pUQsucoIebkuLl2IgBk6giY9h_1eje3ehlWwcZWI4fc3Sptg5t8l0RcTqRGDQ45235MXviCELbm1_XoTF_ctk-IEBz6tRDFBOJkHq6TGg7BNPilqhYN0f90XMrOWM4U/w640-h274/RT_Intel_4200.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhKaNnhGMNri29nyNVIICFZnpfQTC3It5xoviTEXfsVH5s5KN1ctAr8IzrvxWbCX0mkUF3PsoUzGSgIoBr1FC9neKAocPAGYP1iw-zrtXhjo0SGnFM-hH3xg3uGvERSZ_Z_OhisOhiLXdQVY4gQKN_435t6e0Blh8zBobJ8-MpkdZHFVW1d-AUSB76C/s1006/RT_AMD_logarithmic_4200.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="433" data-original-width="1006" height="276" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhKaNnhGMNri29nyNVIICFZnpfQTC3It5xoviTEXfsVH5s5KN1ctAr8IzrvxWbCX0mkUF3PsoUzGSgIoBr1FC9neKAocPAGYP1iw-zrtXhjo0SGnFM-hH3xg3uGvERSZ_Z_OhisOhiLXdQVY4gQKN_435t6e0Blh8zBobJ8-MpkdZHFVW1d-AUSB76C/w640-h276/RT_AMD_logarithmic_4200.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Even operating at 1:2 ratio, I wasn't able to push down the memory further and, in all honesty, I wasn't really interested in running higher voltages in order to do so...</b></td></tr></tbody></table><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h4 style="text-align: justify;"><span style="color: #274e13;">DDR4 4400...</span></h4><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEigZZvPxiS-2VC_6vFeJI2gwmQXM0rfmx1IxKZfjuFMZDIY3IVcVtKW4mfeeUYwOdFHjLvhk8EDZHwRjyosmKAXnXl75HKZrjoISyoH7v-jVlES4vkv8Wiz8nAOJVFzVnPnrOYiglhxwh0VSpn-D-6Ah5ZAy26LwvTyzGHq8Xt0lKE4dVzspZnN4h7F/s1014/RT_Intel_4400.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="435" data-original-width="1014" height="274" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEigZZvPxiS-2VC_6vFeJI2gwmQXM0rfmx1IxKZfjuFMZDIY3IVcVtKW4mfeeUYwOdFHjLvhk8EDZHwRjyosmKAXnXl75HKZrjoISyoH7v-jVlES4vkv8Wiz8nAOJVFzVnPnrOYiglhxwh0VSpn-D-6Ah5ZAy26LwvTyzGHq8Xt0lKE4dVzspZnN4h7F/w640-h274/RT_Intel_4400.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"></td></tr></tbody></table><br /><div style="text-align: justify;">As pointed out in the spreadsheet - I wasn't able to get the Patriot back at 4400 stock settings on the 5600X. I'm not sure what happened to the system stability but it was after all of the messing around in the BIOS for all the other results. It's entirely possible that something "broke" in the BIOS and the entire setup just wasn't having it any more.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Now, let's get to the interesting bit!</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">The Best of the Best (RT EDITION)...</span></h3><div style="text-align: justify;"><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg_EwBzyAb-JbuNUnq1hVRBKMgTobXXOmUmYZmbO9p_C5Wx4V0Sdw_bOO-lruVxz3MlHWvY9Xe3YLdCZAkHv_FkD_51isNch0gSoXzchxTPAxsx37ns531WJgQnRnuzN1-d-dHzDZXnsq5DXt7m8_Fv4pokKTG9mDSvy4GjmJyAL0paykSXlUYwAmO9/s1008/RT_Intel_comparison.jpg" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="432" data-original-width="1008" height="274" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg_EwBzyAb-JbuNUnq1hVRBKMgTobXXOmUmYZmbO9p_C5Wx4V0Sdw_bOO-lruVxz3MlHWvY9Xe3YLdCZAkHv_FkD_51isNch0gSoXzchxTPAxsx37ns531WJgQnRnuzN1-d-dHzDZXnsq5DXt7m8_Fv4pokKTG9mDSvy4GjmJyAL0paykSXlUYwAmO9/w640-h274/RT_Intel_comparison.jpg" width="640" /></a></div><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;">As with all the other benchmark runs, we don't have a lot of variation for all tested speeds and between Gear 1 and 2... Sure, I can see that the DDR4 3800 optimised RAM is the best... but it's so close as to be unnoticeable by any human actually using the system for gaming.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">There was more variation for testing with ray tracing disabled... but even then, still the 3800 memory was considered to be best.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEibeesULpm-L5I66BEbg4npd-8cB36Y4Sbgx7gKPaFlMPNva9MG3kO5IResjIlRkHhqm50mbm8jXRLM3rF8RhxiJIzc6HdLmPTzCAroU7TRNgSHR8hVE94la8zU9d6Z1EF7eNvhs4V65302GCBNEH9fSiBS-nAJ3_tr5Kos0z59EMP_TunIqREmiYBL/s1006/RT_AMD_comparison.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="432" data-original-width="1006" height="274" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEibeesULpm-L5I66BEbg4npd-8cB36Y4Sbgx7gKPaFlMPNva9MG3kO5IResjIlRkHhqm50mbm8jXRLM3rF8RhxiJIzc6HdLmPTzCAroU7TRNgSHR8hVE94la8zU9d6Z1EF7eNvhs4V65302GCBNEH9fSiBS-nAJ3_tr5Kos0z59EMP_TunIqREmiYBL/w640-h274/RT_AMD_comparison.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"></td></tr></tbody></table><br /><div style="text-align: justify;">The 5600X, on the other hand, shows quite a bit of difference. We're talking average fps difference of 9 fps - still, not something I believe most would be able to tell but perhaps the difference between a stable 60-ish performance and one dropping below that (depending on the graphics card, of course!).</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Once again, DDR4 3800 is best, here - but it's a close call between that and DDR4 3200! Yes, sure, we gain 2 fps and the lows are potentially momentary dips of another 10 fps (which is categorically worse!)... but the frametime presentation is actuall very good for the DDR4 3200.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Looking back at the non-RT testing, we had the same stand-off and DDR4 3800 won that time as well, though more decisively...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;"><b>Conclusion...</b></span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">So, what have we further learned from today's additional data?</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">DDR4 3800 at CL15 is the best for both mid-range platforms in both non-RT and RT . However, the actual difference between optimised sub-timings and stock when using a good IC (i.e. Samsung B-die for DDR4) is minimal at best. We're talking a couple of fps in the average and a few more in the lows...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">On Intel, or at least the 12400, latency and bandwidth do not appear to be very important and neither is the Gear mode.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">On Ryzen, or at least the 5600X, latency and bandwidth are important when comparing RAM at a given speed*... but looking at the comparison between the best of the bests, the <i>best</i> result does not correlate to the RAM with the lowest latency or highest bandwidth.</div><div style="text-align: justify;"><b><i><span style="color: #274e13;"><blockquote>*Though the differences between results are pretty tiny!</blockquote></span></i></b></div><div style="text-align: justify;">From my testing, it appears that just buying XMP CL15 or CL16 DDR4 3800 RAM will give you the best result for mid-range and lower chips for both Ryzen 5000 and Intel 12th gen. Going higher and trying to optimise more will either result in actual performance losses or worse smoothness - which static numbers such as the average and 1% lows will not show you.</div></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">If you're in the market for the higher-end CPUs, reviews have shown that going with DDR5 is the best option for getting performance and, for Intel, going as high as possible really gives a big benefit.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">If I ever get a DDR5 motherboard and RAM for the 12400, I'll take a look into the scaling for that CPU and return to this series. As it stands, we've come to the end of our journey.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">With regards to performance testing methodology, I am not finished with that. I feel like I'm onto something and I will continue to analyse and refine newer games that are released over time.</div>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-7560610393342650347.post-68221185056277586052023-01-04T20:53:00.002+00:002023-01-04T20:53:32.625+00:00Looking back at 2022 and predictions for 2023...<div style="text-align: left;"><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgDQJJyAYzI7E3-EK-kC2cTUPKuGu6Glg6H6OfXVH_4AMN65kMAy7P7k4GYu1GozuY1eUV61P635hWLkIgmrNyKW1v7fur_H_PGp0A8a9kryvq77ojbWbSdWHOBB4hYUNeOTo1fez7SF7rasY1mofTSbIsIAq-pqp7x2wp4Pc-TsQfb-lk7BK8gMfbF/s1420/Happy%20birthday!.png" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="1078" data-original-width="1420" height="486" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgDQJJyAYzI7E3-EK-kC2cTUPKuGu6Glg6H6OfXVH_4AMN65kMAy7P7k4GYu1GozuY1eUV61P635hWLkIgmrNyKW1v7fur_H_PGp0A8a9kryvq77ojbWbSdWHOBB4hYUNeOTo1fez7SF7rasY1mofTSbIsIAq-pqp7x2wp4Pc-TsQfb-lk7BK8gMfbF/w640-h486/Happy%20birthday!.png" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>... etc.</b></td></tr></tbody></table><div style="text-align: justify;"><br /></div><div style="text-align: justify;">I'm not going to lie, 2022 kicked my ass in terms of work. I just didn't have the energy or time to properly dedicate to this blog, even though I had strong opinions on many events that occurred during the year - I just wasn't able to put my thoughts down onto paper (so to speak). Additionally, I didn't play too many games, instead I dedicated a lot of my free time to doing some <a href="https://hole-in-my-head.blogspot.com/2022/08/analyse-this-performance-of-spider-man.html">hardware testing</a>, in order <a href="https://hole-in-my-head.blogspot.com/2022/10/analyse-this-does-ram-speed-and-latency.html">to increase</a> <a href="https://hole-in-my-head.blogspot.com/2022/11/analyse-this-does-ram-speed-and-latency.html">my understanding</a> of <a href="https://hole-in-my-head.blogspot.com/2022/12/analyse-this-does-ram-speed-and-latency.html">that hardware</a> and the ways it can affect gaming experiences in the mid-range.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Unfortunately, that testing is <i>way</i> more time consuming than just doing analysis or quick opinion pieces, but I do feel that I have improved the way I am able to analyse data outputs from game testing - and this is something that I can apply going forward, now that I have worked out the methodology to a greater extent.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">In addition to this, the majority of the big hardware releases have happened this past year and there really isn't that much for me to be excited about for 2023, so my predictions may be a little weak for this coming year... </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">But, the show must go on, so...</div><div style="text-align: justify;"><span><a name='more'></a></span></div></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">2022 Recap...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><a href="https://hole-in-my-head.blogspot.com/2021/12/looking-back-at-2021-and-predictions.html">Last year</a> I made the following predictions. Let's see how they turned out!</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><ul><li><b><span style="color: #274e13;">Inte's Arc will be underwhelming in price, though not in performance. Will not appreciably impact availability of discrete graphics cards for consumers.</span></b></li></ul></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Arc was expensive, across the board. The availability of the cards and did not appreciably impact availability of dGPUs for consumers <i>at all</i>. However, Arc has <i>also</i> been underwhelming in performance. Yes, sure, it does seem like they are improving the situation <a href="https://www.pcworld.com/article/1424865/intel-arc-driver-performance-a-month-checkup.html">with regards to driver optimisation</a>, but that does not mean that the initial launch was representative of this...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Sure, in the end Arc might end up being okay in terms of performance... but I can't claim that as being an accurate prediction.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><b><i><span style="color: #274e13;">Verdict: 2/3 Correct.</span></i></b></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><ul><li><b><span style="color: #274e13;">The i5-12400K/F CPUs will be more expensive than prior X400K/F level CPUs. The bargain prices of the 10400K/F and 11400K/F really are too good to be true.</span></b></li></ul></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">This prediction is a little complicated because, yes - at release - those parts were only marginally more expensive than the prior generations. However, over time, with Intel's revenue woes and the stuff that's going on with inflation, these parts are now in the ballpark of where I was originally envisaging them. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">These bulk-sale products really affect Intel's bottom line - I believe more so than their i7 and i9 products... and, in fact, you can see that those parts are more competitively priced relative to their AMD counterparts.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><b><i><span style="color: #274e13;">Verdict: Wrong... but eventually right.</span></i></b></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><ul><li><b><span style="color: #274e13;">AMD's v-cache will not make an appearance on the 6-core CPU.</span></b></li></ul></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">There's very little to say about this other than many people were predicting 12- or 16-core variants, and a few were predicting 6-cores, too. I just didn't see this a realistic event.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><b><i><span style="color: #274e13;">Verdict: Correct.</span></i></b></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><ul><li><b><span style="color: #274e13;">I don't expect a clock speed bump for Zen 3D SKUs.</span></b></li></ul></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">I was correct on this (in fact, there was a slight clockspeed regression), though the reason for it is not clear 'til this day...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The official word from AMD is that <a href="https://www.notebookcheck.net/AMD-reveals-surprising-reason-why-the-Ryzen-7-5800X3D-is-not-overclockable.608776.0.html">it is a voltage limit</a>, not a heat limit (as I had suspected). However, increasing clocks at a given voltage increases heat-output and increased voltage at the same clocks results in even more heat generated. So, to my mind, the two reasons are still intertwined and not separate, regardless of what AMD says. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Their "<a href="https://www.hardwaretimes.com/amd-ryzen-7000-cpus-are-built-to-run-at-95c-24x7-without-affecting-lifespan-or-reliability/#:~:text=AMD%3A%20Ryzen%207000%20CPUs%20are,Lifespan%20or%20Reliability%20%7C%20Hardware%20Times">95 C as standard</a>" operating temperature for their Ryzen 7000 series CPUs does not address my original issue with this logic - stacking silicon on active silicon will necessarily further worsen thermals on any chip and I can see this being an issue going forward with any Ryzen 7000 product...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><b><i><span style="color: #274e13;">Verdict: Correct.</span></i></b></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><ul><li><b><span style="color: #274e13;">I predict that DirectStorage will be much ado about nothing in 2022.</span></b></li></ul></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">There's not much to say about this, other than <a href="https://hole-in-my-head.blogspot.com/2022/03/directstorage-again.html">I was right</a>. No game released with this API/feature. In fact, the game that was touted to be the first to market with it enabled is delayed until the 24th January, so we will see how things pan out at the end of this month. Also, I'm <a href="https://sherief.fyi/post/directstorage-poor-pc-port/">not the only one</a> who is critical of bringing this technology to the PC space...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><b><i><span style="color: #274e13;">Verdict: Correct.</span></i></b></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><ul><li><b><span style="color: #274e13;">The Radeon RX 7950 XT (or whatever the full-die SKU is called [RX7950 XT?] will not be 3x or even 2x the RX 6900 XT.</span></b></li></ul></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Typos in my prediction notwithstanding, I was correct in this one. Despite <a href="https://hole-in-my-head.blogspot.com/2022/05/analyse-this-dissecting-new-rdna-3.html">analysing the "leaks"</a> from famous leakers which tipped my own expectation towards 2x the performance of the RX 6900 XT, the end reality is <a href="https://www.techpowerup.com/gpu-specs/radeon-rx-6900-xt.c3481">more like 1.4x</a> <a href="https://www.techspot.com/review/2588-amd-radeon-7900-xtx/">- 1.5x</a>. The RX 7000 cards are disappointing to me for a few reasons: </div><div style="text-align: justify;"><ul><li>The reference designs have <a href="https://youtu.be/26Lxydc-3K8">some issues with heat</a>, with large differentials between the average GPU temperature and the hotspots on the package.</li><li>All the talk of the <a href="https://www.techspot.com/review/2588-amd-radeon-7900-xtx/">double-pumping</a> the compute units in the new design with twice the number of shaders has not resulted in much of a performance uplift, at all. The increased number of CUs gives 1.2x theoretical performance uplift and the increase in game clock should give 1.13x performance increase for a total of 1.35x increase over the RX 6900 XT... which is <a href="https://www.techspot.com/review/2588-amd-radeon-7900-xtx/">what is observed on average at a resolution of 1440p</a>. Sure, at 4K, the uplift is closer to 1.5x.... but that's a far cry from at least double... and that means that double shader cores means 15% improvement... absolutely atrocious!</li><li>Power efficiency is <i>absolutely terrible</i> and power scaling is also terrible - and handled terribly.</li></ul></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><b><i><span style="color: #274e13;">Verdict: Correct, though I wish it weren't!</span></i></b></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><ul><li><b><span style="color: #274e13;">Radeon 7000 and Geforce 40 series will both be announced at the tail-end of the year. Nvidia will announce first. However, only Nvidia will have a proper product launch in 2022 for this series. AMD's will be a paper launch, with real availability in Jan 2023.</span></b></li></ul></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">This was a <u style="font-style: italic;">REALLY</u> close one. AMD almost didn't make it and, given the delay for the release of their partner cards, if AMD hadn't have had decent supply for their reference designs, this would have been a good call. However, they did (<i>just!</i>) so I was wrong.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><b><i><span style="color: #274e13;">Verdict: Wrong - but SO close!</span></i></b></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><ul><li><b><span style="color: #274e13;">GPU availability won't appreciably improve in 2022.</span></b></li></ul></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">This was just plain wrong. GPU availability improved a lot during the middle of the year before the RTX 30 and RX 6000 series cards dried up in late October/November and <i>avaiability</i> of the new cards has been quite good... the problem being that people aren't really buying them due to their super high prices and, thus, they are remaining in stock... with the exception of the RTX 4090 and RX 7900 XTX.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><b><i><span style="color: #274e13;">Verdict: Our survey says: "Nuh-UH!"</span></i></b></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj3HnEAWIZyTtbzg7x9S4Cn4yz7uXNvbpIecq4CyfxUZIEDMSJztPUSFyMRRlKnFfTzcrWKcgL4N5nJH-u_JzGakM3ptCxDsSzm-zLQv9D1VQPPdXpmh0Nd61ddVsbU6bB0lUYk2X85UVQRkvIDVZO9pZmll1zS4F7tTdui9paJwrsdlpPGfYYzybHK/s1920/Header.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1080" data-original-width="1920" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj3HnEAWIZyTtbzg7x9S4Cn4yz7uXNvbpIecq4CyfxUZIEDMSJztPUSFyMRRlKnFfTzcrWKcgL4N5nJH-u_JzGakM3ptCxDsSzm-zLQv9D1VQPPdXpmh0Nd61ddVsbU6bB0lUYk2X85UVQRkvIDVZO9pZmll1zS4F7tTdui9paJwrsdlpPGfYYzybHK/w640-h360/Header.jpg" width="640" /></a></div><div><br /></div><div><br /></div><div><b>Our Summary says: a 6:4 ratio of right to wrong...</b></div><div><b><br /></b></div><div><blockquote><b><i><span style="color: #274e13;">That's better than 50:50!!</span></i></b></blockquote></div><div><br /></div>I BELIEVE, that's an improvement over the previous years...<br /><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">Predictions for 2023...</span></h3><div style="text-align: justify;"><br /></div><div style="text-align: justify;">I've been trying to follow along with every crazy thing that has been happening but it's been SO crazy that it's difficult. But let's put these out there:</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><ul><li><b><span style="color: #274e13;">This graphics card generation is a lost generation. There will be ZERO cards that consumers or reviewers consider actually good value... (Look, even the 4090 is not good value!)</span></b></li></ul></div><div style="text-align: justify;">Look, the RX 7900 XTX, XT, RTX 4080, RTX 4070 Ti and, really - if we're honest, the RTX 4090 are all poor value for the performance at the MSRP - and we KNOW we are not getting them at their MSRP. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">As a result, I doubt that any cards further down the stack will be considered good value for money either. It just doesn't seem possible... the top tier cards have done such a poor job and it seems like both AMD and Nvidia want to distance themselves from the prior generation cards that I just cannot see any well-priced cards this generation.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">I should have been prepared for this because I was pretty much <a href="https://hole-in-my-head.blogspot.com/2022/07/inflation-reality-check.html">predicting this entire situation</a>:</div><div style="text-align: justify;"><br /></div><blockquote><div style="text-align: justify;"><b><i><span style="color: #274e13;">"All of a sudden, we're no longer getting the performance of the last generation's top-tier card for half the price, we're getting it for around 60% or two-thirds of the price... and that logic continues to scale down. A 7500 XT goes from 6500 XT's €200 to €350 for approximately an RX 5700 XT's performance... maintaining the performance per unit currency for that tier - which we also observed in the RX 6000 series prices too (though with less VRAM)."</span></i></b></div></blockquote><div style="text-align: justify;">And, really, let's not kid ourselves - the 3090 Ti was priced ridiculously, because they could... we should not be taking its "MSRP" into account when comparing the following generation of cards. The RTX 3080 or 3080 Ti are more credible options when thinking about the pricing structure.... and is the RTX 4070 Ti half the price of the 3080 Ti in terms of performance? </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">No, it's around 60%. Nevermind the fact that we moved "equivalent performance" up by a whole half tier!</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Good job, Duoae... *sighs*</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><ul><li><b><span style="color: #274e13;">Nvidia video Super Resolution will be a BIG thing...</span></b></li></ul></div><div style="text-align: justify;">I <i>seriously</i> do not know why this hasn't been done before. Maybe it was held back for a time when Nvidia would need it? This is tech that should have been available on day 1 of DLSS availability and, if not then, on day 1 of FSR 2.0 availability. In fact, AMD should have promoted it first, using FSR 2.0.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">At any rate - this has the potential to be a huge deal in more ways that just streaming video... "free" upscaling of video played through the Chrome browser has the potential to give the masses upscaling technology for their old video files... I am really looking forward to this.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><ul><li><b><span style="color: #274e13;">There will be no "pro" consoles for either Playstation or Xbox this year... Xbox Series S will continue to be a thorn in developers' sides...</span></b></li></ul></div><div style="text-align: justify;">Seriously, whoever thought of the Series S should be commended... and simultaneously committed because it was a terribly brilliantly terrible idea. Sure, for consumers, it's good (in terms of the initial purchase price) but it's bad for both consumers and developers in virtually every other aspect.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">At any rate, from my perspective, Microsoft have abandoned their phone model and are sticking to console generations, like brother Sony intoned at the beginning of the generation...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><ul><li><b><span style="color: #274e13;">DirectStorage will be a flop... again.</span></b></li></ul></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">I'm sorry - I just can't stop. I have NOT seen a single demonstration or fact that shows that directstorage will improve gaming or streaming of assets. There are other tools on the PC table that are not being touched, for some unknown reason. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Do developers require $700+ graphics cards? Sure! $200 PCIe gen 4/5 NVMe drives?! YES! But, no noes! You can't ask for $120 worth of RAM!</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">What sort of world do we live in?!</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Seriously, the benefits of DS have not been proven - they've shown benefits on SATA devices (shouldn't be possible) HDD devices (shouldn't be possible!) and on NVMe devices, non-substantial performance benefits (i.e. sub two second improvements). What's the big deal?!</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Oh! But they can drain our heavily taxed GPU resources! (Because everyone has an RTX 4090 idling at 40% utilisation during gameplay!)</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Yeah, no... I'm sorry - I just don't get it. Yes, reviewers will RAVE about it. They will coo from the tree tops about this feature, all the while ignoring the fact that it has minimal benefits (like they've already been doing).</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><ul><li><b><span style="color: #274e13;">32 GB of RAM will become standard for the recommended specifications of new AAA PC games...</span></b></li></ul></div><div style="text-align: justify;">This one is a long-shot. I have <i>literally</i> zero ideas why developers are refusing to use this resource or request it from gamers, instead of more expensive and finite hardware items like GPUs and advanced NVMe drives which require state of the art motherboards and CPUs to work properly...</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">However, I <i>have to believe</i> that some AAA developers will begin waking up and finally requiring 32 GB of system RAM instead of 16 GB... it makes so much sense on SO many levels!</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><br /></div><h3 style="text-align: justify;"><span style="color: #274e13;">Conclusion...</span></h3><div><br /></div><div>And that, as they say, is a wrap - I just don't have many predictions this year. I look forward to the year ahead in a gaming sense because so many anticipated titles were shifted into 2023. However, in reality, most of the interesting hardware is already out and there are not very many items expected to be delivered beyond the first weeks of January.</div><div><br /></div><div>Thanks for reading and hope you had a nice Christmas and New Year!</div>Unknownnoreply@blogger.com0