How faster storage is actually improving visuals in next-gen games
The Largest Improvement to Visual Fidelity in Games Since 3D
I've spent the past few weeks trying to convince people that faster storage doesn't just mean spending less time looking at a loading screen, telling them to think bigger about the possibilities brought by both abandoning the Hard Drive and by the incredible hardware accelerated I/O pipeline in both new consoles.
Before we get started, there are two resources that any reader should be generally familiar with.
The first is a blog post I made where I discuss a lot of the details on the limitations of HDDs and why game developers were so keen to abandon the HDD, Why SSDs don't offer the gaming boost we expect given their radically increased performance on paper, and what next-gen consoles are doing to address the pipeline issue. If you aren't incredibly clear on any of those issues, have a look here.
The second resource is the May 13th Unreal Engine 5 Tech Demo reveal, linked below.
What benefits did I expect from optimizations for next-gen SSDs and accelerated IO pipelines?
I had been brainstorming ideas for ways that the next-gen consoles would benefit from this.
No more loading times
The most obvious answer is no more loading times. If you can load data into RAM at a rate of 5.5GB/s without taking into account compression, with which you can get 8-9GB/s depending on how well the data compressed (with a theoretical peak of ~22GB/s for data that happened to compress particularly well) it stands to reason that you should be able to get a player in-game in a second or two.
Better use of RAM
In the current gen, we spend a lot of time with our RAM full of idle data, as we have to have anything loaded in RAM that could be encountered or seen by the player within the time that it takes to load that data from storage, if we don't want the player to experience immersion breaking object-popping or loading screens. This is pretty thoroughly explored in my previous post linked above. Making the storage faster means we have to have less idle data in RAM, as we can more readily bring in what we need as we need it. This means we can afford to have more of the RAM buffer used for data that actually improves the experience of what we're looking at and interacting with. The following points explore those details.
More and larger textures
I'm not the only one that's looked at walls, or looked out over a frozen lake, or rock face, or terrain of some kind and noticed that there's obvious repeating textures. Textures are a huge RAM hog, so they end up repeating a lot to save space. We can also increase their size so that they're more impressive on closer examination of models. This does not require a more powerful GPU at all; just more available RAM, enabled by faster storage.
More Mocap and better Animations
Motion Capture and other animations are another very expensive asset type, which results in many background NPCs using the same few generic motions, making the world feel generic and less 'alive'. With more space in RAM to fill with animations, we can have a large variety of animations for even less important NPCs in the background that help flesh out the world.
Better storage efficiency
The main concern with all these increases in textures and mocap, etc., is that it also increases the size of games. This is a very real concern, but there's mitigating factors as well. First of all, since hard drives need to replicate data to reduce seek times (described in more detail in my previous article) abandoning them immediately brings savings, sometimes dramatically reducing the size of the game on disk, in extreme cases, by half or more, by removing Hard Drive optimizations. Secondly, is something present in next-gen consoles primarily, which is the hardware decompressor, which allows all data on disk to be compressed, and have it decompressed on the fly as it's streamed into RAM without using up CPU resources, and without losing bandwidth. This has the potential to further reduce the game files on disk by half in some cases.
Better world creation
It's very common in video games to have long tunnels, elevators, stairs, etc., in streamed worlds that effectively act as loading screens, put in place to put some distance between two areas, and thus an opportunity to discard the data of the place you're leaving and load in the data for the place you're headed to. While sometimes it's nice to see Garrus and Tali interacting on a long elevator trip, it soon gets tiring and immersion breaking when you know the place you're headed isn't 50 floors beneath you. This frees game developers, allowing them to stream in assets more readily, without subjecting the player to awkward non-loading-screen loading sequences which are hardly better.
Faster travel through worlds
The most famous example of this is Spiderman for the PS4, as Mark Cerny in 2019 gave a tech demo showcasing the fact that Spiderman had a maximum speed that he could travel through the city, because that was the rate at which data could load from disk. Moving faster would result in meeting unloaded game objects in a game-breaking experience. On the PS5, it was shown that a jet could speed through the city with no loading issues.
And more.
While you may be able to think of examples that I don't mention at all in this article, (as there are many) this was simply the list I could think of. I've been using these examples to tell people that time spent looking at loading screens is not even the largest reason to want better use of faster storage, that they need to think bigger. Even though I would describe what I've already mentioned as a significant improvement, when I saw Unreal Engine 5's tech demo, I was stunned as I realized that I was not thinking big enough. Everything I've described so far is indeed true benefits of fast storage. But it goes so much further than that.
Let's build some anticipation as we cover some basics you'll need to know.
In order to understand What's going on in the demo, it's important to understand the composition of models and something called LoDs (Levels of Detail) on models, and mipmaps on textures.
Level of Detail (LoD)
Arising over 20 years ago not long after the first games with proper 3D models composed of triangles (sometimes referred to by artists by the term polys, which are just flat polygons composed of triangles), game developers wanted to have more objects on-screen, and further in the distance. However, the issue is that the GPUs are only capable of rendering a certain number of triangles, and increasing the number of objects exceeded that budget. To get around this, game developers started making lower detail copies of the models, which could be swapped out on the fly when the object was further away, using less GPU resources. This is an intensive, often manual process that has endured even into 2020 and beyond, sometimes with entire jobs largely dedicated to creating LoDs for assets created by other artists in the company.
The major drawback of the LoD system other than the additional storage required for the LoDs (as they're separate) and the time consuming process already described, is the immersion-breaking popping of the LoDs that is often visible as the model transitions either too early or too late from one LoD to another.
Texture Mipmaps
I will admit straight away that this topic is one that I understand the least, but I can cover what I do know. In concept, they're similar to an LoD, but they're generated based on the full resolution texture, producing gradually lower and lower resolution versions of the original.
These textures, or mips, are referred to by their mip number, for example, the full texture is Mip 0, and Mip 1 will have exactly one quarter the detail, being half the width and height. While the image used isn't a typical texture, I chose it based on it being very easy to see how the image is duplicated in lower and lower quality.
The process of correctly choosing which mip to use sounds conceptually easy, but is computationally difficult. The process involves sampling textures and and seeing how many pixels of the texture are visible per pixel of resolution, so that you don't end up wasting memory with detailed resolution that you can't see. But you want to do it with as few samples as possible, because sampling takes resources, too. Again, my knowledge on this subject is tenuous at best (and if you're a graphics programmer please save me in the comments haha) The point is when things are far away they use lower quality textures, but the quality of the algorithm can have a dramatic effect on performance and RAM use.
Now what was so impressive about the UE5 tech demo?
A bit of background and context for the numbers
With Unreal Engine 4, most games were targeted at console level hardware. That means character models typically have budgets of between 15k and 60k tris. Entire environments and scenes have a budget of about 1-1.5 million tris. That sounds like a lot, and when the PS4/XBOne era came out, that number was crazy next to the low hundreds of thousands the previous gen consoles did, which was a great leap from what came before and so on. A PS2 using all of its power might have been able to render Aloy from Horzion: Zero Dawn. Just Aloy. Alone, with no environment, and no animation. How far we have come.
Now, as shown in the demo, the number of tris is in the billions. Yes, you heard me right. Billions. We went from barely a million to over a billion tris in a scene. but how could this be possible? The playstation 4 has 1.84 TFLOPS, and the PS5 has 10.3 TFLOPS. You don't have to be a math wiz to realize that even though the PS5 actually does do a bit more work per TFLOPS (useful occasional reminder that TFLOPS is an easy number to reference but doesn't tell the whole story), it's not 1000 times more TFLOPS, so that doesn't make any sense at all. How can it be possible that the PS5 is running so many tris in the scene?
The magic of Unreal Engine 5, and how it leverages the full hardware of the PS5
I want to start off by saying I'm not an Unreal Engine or Epic fanboy. I personally don't even have Epic launcher installed on my PC. While I don't mean to undermine Epic's serious technological accomplishments, as the software to make this work is nearly as impressive as the hardware that enables it, they won't be the only ones that come up with similar solutions. I expect to see this sort of technology used in many games across PS5, PC, and XSX.
So, no, the PS5 is not capable of rendering 1000x more tris than the PS4. That would be insane. The real number of tris rendered at any given time is closer to the 20 million mark in the demo (still an incredible increase, but much less than that billion plus figure) The glory is that with the delicate marriage between SSD and software, it doesn't have to. The PS5 isn't even capable of holding all the detail for the billions of tris worth of models and all their 8k textures and all at once in RAM, so how can it have that level of detail in a scene?
Remember what I was saying about mipmaps, where the game engine uses sampling to decide which resolution textures should be used on the models in which cases? Again, I don't understand how the underlying engine actually accomplishes this, but it now does the same thing with both textures and tris. The engine decides what detail it needs, then offloads the rest of the detail to storage with little loss in performance. This effectively makes mipmaps and LoDs entirely obsolete since they're created on the fly, automatically, in-engine. It's no longer pulling those different levels of detail from storage. This is enabled by the SSD, because the storage has to be fast enough that as you move towards a model, it needs to be able to seamlessly gather more data from storage, because a new level of detail is now required. It's my opinion that the unprecedented CPU performance is also partly to thank for the algorithms used.
While there are billions of tris in a scene, only maybe 20 million or so are rendered at any time, because most of them are smaller than a pixel. No need for them to be taking up valuable RAM space.
Stay Tuned for a follow up article regarding XSX and PC!
I decided that it was best to cut this article short. Here, we've explored what makes the tech running the demo very impressive. Next we're going to examine the tech in PC and XSX (spoilers, it's about the Velocity Architecture) and how well the demo would run on them, as well as what's coming next for PC.
Thank you for
reading!
If
you enjoyed this, feel free to visit my Discord and my Patreon to tell
me why I'm wrong discuss tech with me.
Discord: https://discord.gg/CHfha8V
Patreon: https://www.patreon.com/MeyerTechRants
Discord: https://discord.gg/CHfha8V
Patreon: https://www.patreon.com/MeyerTechRants
Comments
Post a Comment