16 May 2020

Unreal Engine 5 demo and misconceptions...

There's been quite a few articles covering how HDDs simply can't keep up with the moment-to-moment streaming enabled by the PS5 and SX. But, again this really misses the point of PCs - PCs are engines of brute force. Consoles are beasts that are designed to streamline and work past limitations through optimisation. Yes, the next gen consoles are generally much more powerful than the average PC out there right now but they are still highly optimised pieces of hardware because they are static and will quickly be outclassed by new PC technologies appearing on the near-term horizon.

Yes, the rate of improvement of CPUs and GPUs is quite phenomenal at this point in time!

This is all self-evident, but why am I writing another post so close to the previous one when I'm covering much of the same ground? Well, that's because the Unreal Engine 5 tech demo was released, running on PS5, alongside an interview with key personnel at EPIC Games and there is, as far as I understand things, one primary misconception being repeated over and over again. To paraphrase:
You can't fit all of this detail in the scene into memory on a PC - it's not possible.
This is, at best, a misunderstanding of nomenclature... at worst, it's a lie.

Let's pull up the entire quote from Nick Penwarden:
"So, Nanite enabled the artists to build a scene with geometric complexity that would have been impossible before. There are ten of billions of triangles in that scene and we simply couldn't have them all in memory at once. And so, what we end up needing to do is streaming in triangles as the camera is moving throughout the environment. And the I/O capabilities of playstation 5 are one of the key hardware features that enable us to achieve that level of realism."
This statement has been used by various outlets to imply that this is some sort of impossible feat that current PCs are incapable of. I don't blame them because it sounds impossible when stated like this. But digging deeper, all is not as it appears...

This boils down to words and how we use them.

Traditionally, a "scene" is a short clip in a TV show or movie or a static set change covering limited time in a stage play. These two forms as essentially analogous, with film-making allowing for a greater range of exceptions and allowances. A scene could last for a single fight sequence but the sequence itself could be 10 minutes long with multiple camera cuts.

However, a "scene" in game design parlance is basically any element that encompasses a single experience. This can be a simple menu screen or map interface, to an entire "level" or world as we would normally think of such a term in game design. The two definitions are completely at odds with each other: The scene where Lumen walks through the canyon is not the entire "scene" of the level.

So, with that knowledge, let's re-read that quote from Nick Penwarden:
So, Nanite enabled the artists to increase the amount of detail contained within a single level and have that information organised in a data structure that allowed more dynamic streaming into video memory from the storage drive faster than was possible on Unreal Engine 4. We are doing the same thing as previously - streaming in new parts of the level as the camera moves around it - but our data management allows that to occur more dynamically in terms of geometric and texture LOD. The new SSD on the PS5 allows us to do that faster on that console.
This is just describing streaming of world data - same as we've seen in Assassin's Creed games, same as we've seen in GTA games and Elder Scrolls games... and same as we've seen in every other AAA single player game in existence. A level doesn't fit entirely into RAM and so the data needs to be managed off of the storage device during play. Why is this being dressed up as something new?

This is a scene that is totally possible on mid-range PCs right now. See Destiny's Mars environment, for example...

Going back to that scene in the canyon - it can absolutely be contained within 8 GB of memory and, as Lumen is moving through the canyon, further detail of the upcoming assets can be streamed into the RAM and then VRAM (on PC) from the storage drive. On PS5 and Xbox Series X, that translates to being moved into the unified memory pool from the SSD. In fact, it's ironic that at 3 mins and 34 secs Lumen has to move through a visually occluding barrier to reach the next section - which traditionally denotes a loading transition. This transition is supposed to be a thing of the past, btw.

I actually went and signed up for a free account on Quixel and downloaded some free assets. What I found was that an 8K texture (including normal map and material properties) usually ran around 200-300 MB and an asset, like a cliff (i.e. vertice positional data) added an extra 200-300 MB on top of that. That's at the highest fidelity possible and, more than likely, the PS5 is not streaming those assets and textures into memory with that level of fidelity - this is where we start getting into discussions on LOD and MIP levels.

So, yes, you can stream a movie-quality level of detail to your memory but that's not likely going to be happening on only 16 GB total device memory. At that point, Nanite steps in and downsamples textures and assets until they are manageable. I suspect that most assets are 2k or below in resolution in this demo because the camera doesn't get that close to them

Going back to that loading transition, it takes approximately 10 seconds. That's a HUGE amount of time. Let me put that into perspective:

What is happening over that 10 second period?

The Series X can transfer 16 GB in 6.67 seconds of uncompressed data, the PS5 can do it in 2.91 seconds. Over 10 seconds, a SATA 3 SSD can transfer 5 GB whereas a traditional 7200 rpm HDD can just scrape through just over 1 GB. It's not even worth considering the NVMe drives in this comparison because PC and next gen consoles do not even need to transfer (or display) more than 8-16 GB of data in 10 seconds.

I don't know what's happening in that time frame but it's very suspiciously indicative that the demo is platform agnostic. Looking more closely, the "lock-in" to the transition animation occurs at 3:35 when the focus shift happens and releases around 3:55 when Lumen starts walking - a 20 second period, effectively doubling the previous numbers. Do I believe a SATA 3 HDD can pull into memory the dig hole scene directly after that transition? Bluntly, yes - I think 2.4 GB would be enough to hold that geometry and reused/repeating textures - this is not a transition to a completely different environment, requiring a complete wipe of the RAM.

We get another tranisition animation from approximately 5:14 to 5:21 where loading likely takes place, enough time for an HDD to load those statues into memory (~800 MB). You'll also note that from this point on, duplicated meshes are used extensively, meaning less assets brought into active RAM from the storage drive. There's another loading transition at 5:48 to 5:51 in order to load the new mesh statue just around the corner and a final loading transition from 7:40 to around 7:52. That's not to mention that from 6:17 to around 6:40 appears to be a pre-baked transition followed by 6:45 to 7:31 where it appears to be mostly controlled camera perspectives and limited "player" interaction.

All in all, there's more in-engine cutscene here than an Uncharted game!

The final flight through the city isn't that impressive because most of the meshes and textures are probably lower quality due to the speed at which they're passing the camera and thus not that demanding on I/O. If Oblivion could do it in 2006 on the Xbox 360, then I am not that impressed in 2020 by this portion of the demo...

They did this in 2006 on a console with 512 MB of DDR3... (okay, so this is probably pre-rendered on the 360)


I haven't seen anything super impressive in this demo. The technology is impressive but I see nothing that is saying "adiĆ³s" to the spinning platter hard disc drive... I *AM* excited for what SSDs can bring to the gaming landscape but nothing on show with this demo is purely a result of that.

Think I'm wrong about this? Let me know why I'm wrong in the comments. :)


Dubs said...

"The Series X can transfer 16 GB in 6.67 seconds of uncompressed data, the PS5 can do it in 2.91 seconds"
First of all, if you want to be absolutely correct: You divide 16 / 2.4 and 16 / 5.5. That is almost correct. You have to consider that the data can be loaded on two levels with priority and with the other platform in 6. The actual speed how fast you get 16 GB from the mass storage is actually faster.
The critical part is not getting the data from the mass storage device, but copying the data coming from the mass storage device into the main memory. That is the crucial point why it is a very bad approach to just use the SATA 3 SSD or any computer for comparison. The real performance differs considerably .

Duoae said...

Hi Dubs,

Yes, that's fine to point it out but I was trying to make the point that this demo doesn't appear to be utilising those speeds in the way it's designed - it's still designed with the old bottlenecks in mind.

What I'd like to see is some sort of inception-esque sequence where literal worlds are being loaded in and destroyed, showing off scenes that are literally impossible using SATA.

Imagine a scene where the player turns the camera around and new worlds are streaming in right there and then (like the train scene in Fallen Order)...

Ger said...

Hi Duoae, as always I hope for a good exchange of opinions...


fybyfyby said...

What seemed revolutionary to me is fact, that the demo used original assetts and models withouts usual optimizations as applied normal maps, lod, etc... This fact is not practically visible. Its about optimization of work of devs. That was "wow" for me. Im really curious how fast it could run with usual optimizations of assetts and models.

This informations is "on par" with informations from Mark Cerny. It is something thats not so popular to hear like 8K, 120FPS, HDR, etc... I think we will be surprised with first party Sony games. UE5 is still multiplat engine, so there must be some compatibility between "power tiers". First party Sony studios dont have to achieve any sort of compatibility. Thats a great advantage.

Andrew said...

Epic clarified that there is no loading transition time in the catwalk.

Duoae said...

Hi Ger,

There's not much for me to comment on in Urien's post. He covers the technologies pretty completely. The only exception being Lumen which, according to the developers (Via digital foundry), is not /just/ ray tracing, and not even really that... sort of.


On the large scale, the engine uses voxel global illumination for the broad detail. (See this video- https://youtu.be/oPsza01RYlo or this GDC talk- https://www.gdcvault.com/play/1022428/The-Technology-of-The-Tomorrow )

Then at medium scale it uses signed distance fields, which i will not attempt to explain, instead pointing you here- https://www.google.com/amp/s/jasmcole.com/2019/10/03/signed-distance-fields/amp/

What's cool about Jason's post is that he appears to have been doing something similar to what EPIC are doing, i.e. converting 3D space to simplified representations which enable shadow casting from objects.

Lastly, the engine is using screen space reflections like those used in Crysis 3 and Just Cause 2. This effect is, at this point in time, incredibly cheap to implement as back in 2014, a geforce 650M was able to perform 25 iterations every 5 millisecobds of ray casting from every pixel visible on-screen at a 1080p resolution. See here: http://casual-effects.blogspot.com/2014/08/screen-space-ray-tracing.html?m=1

Now, i also think that there's some temporal element to this lumen technology because i was observing delays in lighting bounces (glow) and i think that they are smoothing out hard edges and such with a temporal solution.

Duoae said...

Wait, i think you're misunderstanding me. What i was saying is that the demo allows time for systems which do not have the PS5's SSD and decompression units to be able to load between areas in the levels. Just because it's not done on this particular implementation on PS5 hardware doesn't mean it won't happen.

There's literally no other reason for them to be there.

Duoae said...

In other words, this demo will run on ps4, Xbox One and mobile phones. That was, after all, the whole point of nanite, right?

fybyfyby said...

I think so.

Duoae said...

I'm really interested to learn how nanite actually works. Urien at disruptive ludens seems to think it generates lower level detailed assets to the SSD on the fly and pulls them into and out of RAM... but that seems VERY intensive and wasteful.

I have a suspicion that a base level of detail is baked in to suit the platform of choice - you're not going to store 8k assets on a ps4 when you'll never use them. I think that those are then "down sampled" (for lack of a better term) in a similar manner to Series X's sample feedback streaming (SFS) technology. So there's no storage of the down rezzed textures or meshes, they're only stored in RAM.

fybyfyby said...

I hope streaming possibilities of ps4 and xsx will not lead to more developer laziness and that games will not have double the size of current gen games.

But Im also interested, how will devs use these possibilities.