8 January 2023

Analyse This: Does RAM speed and latency make a difference for gaming...? (Part 4)

 
Uber RAM...

I've looked at the performance of RAM speed over the last few entries and come to a few conclusions:
  • People are wont to interpret data incorrectly - or too little data.
  • On mid-range systems (or below): Pushing RAM to get the lowest possible latency (and system latency) in synthetic tests really does not correlate well with actual game performance...
  • On mid-range systems (or below): Pushing RAM to get the highest possible bandwidth (and system bandwidth) in synthetic tests really does not correlate well with actual game performance...
  • Intel and AMD architectures handle memory access in quite different ways - this may be an indication as to why Intel has historically had better gaming performance than AMD.
  • On mid-range systems (or below): RAM speed past DDR4 3200 really doesn't matter too much in gaming applications.
    • What DOES matter is the quality of the memory IC!
    • Samsung B-die is well-known for its overclocking and latency-reducing ability... but even at the same stock settings as another chip show a marked improvement on both AMD and Intel systems for higher-framerate gaming. No overclocking or tightening of timings required!
  • You cannot just look at static metrics like min, 1% low, average and maximum framerates to determine game performance - It doesn't show you the whole picture. 
    • Nowadays, we should be looking at the smoothness of the per-frame presenation. You can do this by adding simplistic numbers like standard deviation of the frame-to-frame variance... or you can plot nice graphs of the per frametime distribution during the benchmark run treated with the natural log (in order to normalise the results from the extremes).
  • The differences are pretty small... when taking everything into account. Optimising RAM timings and speed is the sort of thing people who are obsessed with an activity will do. I did enjoy seeing synthetic benchmark numbers go up until I realised that, after looking at all the data, it was all pointless anyway. You get more out of your time by buying the best memory IC at a decent speed (DDR4 3600 or 3800) and spending more money on your CPU and GPU and overclocking them than you do from optimising your lower quality RAM. 
    • Of course, you probably wouldn't have known what was low or high quality RAM when you bought it! I didn't.
So, with that summary of conclusions out of the way, let's head into the final entry in this series - raytracing.

It's a game of two halves...


We only have two comptetitors in the desktop space and, really, though more might be preferable, the walled gardens of Apple and ARM advocates do not entice me... nor are their low-power efficiencies scaling as well as x64 does. So, we're stuck with two competitors who, seemingly, want to rip each other's throats out.

It really is a strange dichotomy when looking at the CPU (AMD vs Intel) and GPU (AMD vs Nvidia) arenas but it's clear from the pricing that we have really good comptetition in the former and really poor competition in the latter.

You can make the arguments that "researchers" and "miners" and other high-return/investment portfolios are subsuming the "relatively low margin" gamer market... and that prices are increasing and that "inflation is real".

As I've pointed out, before. The long and short of it is that no other computer hardware segment is experiencing these forcings* as they are for the GPU. So, it's either not a thing or it's a highly specific thing that, for some reason, no one is specifically talking about... and they're not. No one is. Not even Jensen - it's just that things are getting more expensive and we should expect increased prices for increased performance.

These arguments ring hollow for me...
*That's a hold-over from my climate research days...
Anyway, remember - these are my custom Spider-man benchmarks running with raytracing enabled (max settings and max LOD, etc), scaling over system RAM speed and then tightening the timings down, in order to prove whether tightened timings and/or faster RAM speeds actually result in better performance.

Yes, there are limitations on this study - it's all mid-range parts. At the high-end, you will most likely see performance gains when doing optimisations... the issue here is that the VAST majority of people will see the results at the high end and spend hours, days or inordinate amounts of time trying to improve their system looking at metrics (such as memory latency and bandwidth as defined by synthetic benchmarks, like AIDA64) and then just playing thinking that their overall game experience is better than it was before.

This is the issue I have with those people who espouse memory tuning and optimisation for gaming: they spit and spout with very little data to back themselves up...

I've covered that, mostly, now we're looking at the Raytracing performance of systems that are essentially "identical"... (as always, the data I used and collected is found here)



DDR4 3200...


CL16, with some tightening wins... but not the lowest latency.


We start with what is essentially the baseline of DDR4 system performance in 2022 and, arguably, for the last few years. The Intel system doesn't reach peak "performance" (i.e. best framerate, best sequential frame presentation) at the lowest latency, but, really there's not a lot of difference between all the memory timings here - as there was last time. What is obvious, is that enabling RT causes a lot more system stress... but while that extra stress does affect min, max, average framerates, it doesn't materially affect sequential per-frame presentation. 


Yes, sure - the per-frame presenation is less stable when running RT but NOT so much more terrible than it was with RT disabled. In fact, we can see that, despite RAM subtiming optimisations, raytracing is stymieing and normalising any memory timing "optimisations".



Lower is better for Ryzen...


The opposite is true for the AMD system. The 5600X definitely runs better with the lowest memory latency and total bandwidth throughput - though it's not a straight road to "better". Some subtiming optimisations result in worse performance... and, in all honesty - the difference between all tested settings is tiny.


DDR4 3600...




Once again, with any sort of sub-timing optimisation, the performance essentially normalises around a central point (approximately 13 ms per frame). There's no benefit to the i5 having any sort of optimisation beyond the very minimal.



The R5 again sees the progression from worst to best does correlate with lower latency - again differences are small, with the frametime hovering around the mid-point of 14 ms. Unfortunately, it appears I didn't really think to test worse memory settings - I just tested stock Patriot specs versus optimised. If I'd have realised when I did all this testing last year (unfortunately, this was right at the start of my memory optimisation testing journey) I would have input some CL16 and CL18 settings to see the improvement*... which we can  see in our next speed.
*This limitation to the dataset doesn't apply to the Intel testing since I was more experienced by that point and had refined my testing methodology...

DDR4 3800...


This is the first time we're testing at 1:2 controller to memory frequency...

The process repeats itself for the i5... but the improvement of the per-frame presentation (aka smoothness) can be seen. I didn't mention it for the prior two memory speeds but the best smoothness is achieved at settings which do not produce the lowest latency or highest bandwidth... but this is the first time that it does not hold true for the Intel chip - the lowest latency gives the best result and that might be tied to the fact that we've had to switch up into Gear 2 mode where the memory controller is operating at half the RAM frequency. 

I can imagine that latency is more important in this mode.


The R5 had no issue keeping controller and memory frequency at a 1:1 ratio...


This time, however, the AMD system shows us a relatively* large change across the RAM subtiming optimisations that I tested. Not only does the difference between the settings with the highest bandwidth and lowest latency giving us a visual indication that performance is improving but we can see the average fps for the whole benchmark moving from 68 - 72 fps at CL15. If I were one to believe that stating percentages is useful - that's a 5% improvement! Unfortunately, I am not. This level of performance difference is not noticeable for any human playing a game.
*I said relative, not large! 
Saying that, moving from CL19 to CL15 netted us a average 5 fps improvement, and to the best (no.2 to no.8) an average 9 fps improvement, with better lows, to boot. That's a 14% improvement (for those wanting that relationship). The issue here is that good quality RAM integrated ciruit dies will already ship with the lower CAS latency setting in their XMP profiles... so, we're not going to be talking about that as an improvement, we're back to the average 4 fps mentioned above.

This just goes back to my summary in the intro to this article - buy the RAM with better ICs, there's no need to optimise. How to learn what those are? That's a different and more difficult story.


DDR4 4000...


The 12400 wasn't really able to scale past DDR4 3800...

Now, although we're still operating under the Gear 2 memory regime, I found myself more limited in what the CPU was capable of in terms of memory timings - hence the reduced number of benchmarks. The best result is not at the lowest latency settings, and if you want to be pedantic about it, we're losing to an average fps from 80 to 77 (so 3 fps! OMG!) but when looking at the stability of the presentation and the worst frametime numbers, the lowest latency configuration performs worse in both of those aspects.


Meanwhile, the 5600X was steaming ahead with the flexibility of its memory controller...


For this memory speed, I forced a 1:2 ratio at the worst settings (CL19, no.1) thinking that it would show that Ryzen was better off in the 1:1 controller:memory frequency regime but, in reality, this performed better than 1:1 (no.2)! Something I did not expect. 

This is something that maybe I can explore in future because the prevailing wisdom has always been that 1:1 is preferable over a 1:2 ratio... Moving on from that result, the lowest latency and highest bandwidth once again gave the best results. So, pretty much par for the course.



DDR4 4200...


Here's where I'm running into the limits of the memory controller on my chips. I'm posting the results of DDR4 4200 and 4400 for completeness and for the final comparison, rather than any particularly interesting insights...



Even operating at 1:2 ratio, I wasn't able to push down the memory further and, in all honesty, I wasn't really interested in running higher voltages in order to do so...



DDR4 4400...



As pointed out in the spreadsheet - I wasn't able to get the Patriot back at 4400 stock settings on the 5600X. I'm not sure what happened to the system stability but it was after all of the messing around in the BIOS for all the other results. It's entirely possible that something "broke" in the BIOS and the entire setup just wasn't having it any more.

Now, let's get to the interesting bit!


The Best of the Best (RT EDITION)...




As with all the other benchmark runs, we don't have a lot of variation for all tested speeds and between Gear 1 and 2... Sure, I can see that the DDR4 3800 optimised RAM is the best... but it's so close as to be unnoticeable by any human actually using the system for gaming.

There was more variation for testing with ray tracing disabled... but even then, still the 3800 memory was considered to be best.



The 5600X, on the other hand, shows quite a bit of difference. We're talking average fps difference of 9 fps - still, not something I believe most would be able to tell but perhaps the difference between a stable 60-ish performance and one dropping below that (depending on the graphics card, of course!).

Once again, DDR4 3800 is best, here - but it's a close call between that and DDR4 3200! Yes, sure, we gain 2 fps and the lows are potentially momentary dips of another 10 fps (which is categorically worse!)... but the frametime presentation is actuall very good for the DDR4 3200.

Looking back at the non-RT testing, we had the same stand-off and DDR4 3800 won that time as well, though more decisively...



Conclusion...


So, what have we further learned from today's additional data?

DDR4 3800 at CL15 is the best for both mid-range platforms in both non-RT and RT . However, the actual difference between optimised sub-timings and stock when using a good IC (i.e. Samsung B-die for DDR4) is minimal at best. We're talking a couple of fps in the average and a few more in the lows...

On Intel, or at least the 12400, latency and bandwidth do not appear to be very important and neither is the Gear mode.

On Ryzen, or at least the 5600X, latency and bandwidth are important when comparing RAM at a given speed*... but looking at the comparison between the best of the bests, the best result does not correlate to the RAM with the lowest latency or highest bandwidth.
*Though the differences between results are pretty tiny!
From my testing, it appears that just buying XMP CL15 or CL16 DDR4 3800 RAM will give you the best result for mid-range and lower chips for both Ryzen 5000 and Intel 12th gen. Going higher and trying to optimise more will either result in actual performance losses or worse smoothness - which static numbers such as the average and 1% lows will not show you.

If you're in the market for the higher-end CPUs, reviews have shown that going with DDR5 is the best option for getting performance and, for Intel, going as high as possible really gives a big benefit.

If I ever get a DDR5 motherboard and RAM for the 12400, I'll take a look into the scaling for that CPU and return to this series. As it stands, we've come to the end of our journey.

With regards to performance testing methodology, I am not finished with that. I feel like I'm onto something and I will continue to analyse and refine newer games that are released over time.

No comments: