Hardware-accelerated IO in consoles coming to PC
The prevalence and performance of SSDs
There's been a lot of hype around SSDs for nearly a decade now, touting anything from 10x to 100x the read and write speeds as seen on the spinning rust hard drives of old. SSDs have become nearly ubiquitous on the modern PC for several reasons including tighter boot and load times, an improvement in overall system responsiveness, and an increasing reduction in cost. However, the real-world performance improvements we see from SSDs over their Hard Drive counterparts is often nowhere near what is expected. Not only do almost all even recent AAA titles run fine off a Hard Drive, games rarely load more than twice as fast when we install them on an SSD. Even more concerning is the often imperceptible difference installing an extremely fast -- and far more expensive -- NVMe SSD makes. But why is this? Why do games take so long to load even when our storage can clearly make up the difference.
Note: If you're already well versed in HDD and SSD tech, how they compare, and the bottlenecks in loading technology are, feel free to skip down to the section on "But what about PC" as a lot of this may be review for some.
A common misconception about random IOPS and sequential MB/s as they pertain to gaming
A commonly touted but misleading point is that of IOPS (Input/output Operations Per Second) which relates more to how data is stored on the drive. In any given game or program, there may be many different files, some large in size, and some small. It is more complex to locate and read a large number of small files scattered randomly through the storage compared to a small number of large files. If there are a lot of small files, it can take longer to transfer the same volume of data. IOPS described the theoretical peak number of random tiny reads the drive can make, whereas the rated bandwidth in MB/s is the theoretical peak throughput if you're reading a large easy file.
The common argument is that games use many smaller files, and thus their performance is reliant more on random IOPS than sequential MB/s. To put those to rest, I'll just point you to a little chart here:
Storage Type
|
7200rmp Hard Drive
|
SATA SSD
|
NVMe SSD
|
Peak IOPS
|
~75-100 IOPS
|
~90,000 IOPS
|
Up to 750,000 IOPS
|
Peak MB/s
|
~80-150 MB/s
|
~550 MB/s
|
Up to 5,000 MB/s
|
Why don't we see better performance in load times with better storage?
I wish there was a simple answer for this. The truth is there's a lot to it, but it can be briefly summarized in two main points; pipeline, and optimization.
Optimizations for hard Drives -- Only part of the story
Since we're mainly going to focus on the pipeline in this discussion, I'll first briefly describe what I mean by optimization. The chart above does not even tell the whole story on how much slower Hard Drives are compared to SSDs. You may notice that the HDD IOPS is about 1000x slower than even the slower SATA SSD. This is largely because of something called seek time. The hard drive is composed of magnetic disks, with data stored across their surface, and the data must be read off them with a little arm. In order to read the data in any given section, the disk must be spun and the arm positioned to read that data at that location. This action takes time, so there's a delay between the request and the read. As such, games not only take care to order their requests to reduce the amount of time spend spinning (reading from locations {1, 3, 4, 6, 7, 9} is faster than reading from {1, 9, 3, 6, 4, 7}) but games will often store the same data on multiple locations on the disk to made sure that common data is not physically too far from any given point that it may need to be fetched. Famously, Spiderman for PS4 had the same mailbox repeated over 400 times through the game code on disk as it's seen many times throughout the city.
The combination of these and other issues I haven't even discussed, mean that if a game developer wants that game to be able to be run off a hard drive, extra care must be taken that not only increases the size of the game on the disk, but requires extra processing time to reduce search latency. It's also prohibitively complex and error-prone to optimize game data for both SSDs and HDDs, so that is not done.
Pipeline Issues -- The main culprit for slow loading times
The pipeline, however, is the true culprit. The pipeline exists in its current form because if the game is designed to run off an HDD, the storage is the bottleneck, so we have the liberty to do the processing in the pipeline. However, without the HDD holding us back, the pipeline becomes the bottleneck and limits what we can achieve. This was described excellently by Mark Cerny in his GDC talk on the PS5, and he gave us this diagram:
Here, we can see him describing several stages of the pipeline that reduce the speed of the data coming in from the storage into the game engine.
Here, we can see him describing several stages of the pipeline that reduce the speed of the data coming in from the storage into the game engine.
How we overcome the pipeline issue
The solution is found in next-gen consoles. As an SSD was the most-requested feature for both Playstation and XBox's next consoles, they were both faced with the question of how to make the most of them. After all, if they bring in SSDs, but don't improve the pipeline, sure there are some benefits, but it's not a revolutionary improvement. Playstation was the first to give us details on their solution in Mark Cerny's GDC presentation:
We can see that the pipeline can now keep up with the storage, but 'how'? 100x faster seems too good to be true when we've experienced such marginal improvements in loading times over the past years. While admittedly we have yet to test these claims, they may not be too far from reality. The PS5 uses specialized processors, creating a Hardware Accelerated Pipeline depicted below:
It's worth noting that the Xbox Seriex X has also been confirmed to have a hardware accelerated pipeline, just on a slower 2400 MB/s SSD, compared to the PS5's 5500 MB/s SSD, and a pipeline that's designed more specially for textures, rather than Sony's general purpose pipeline. This article is not making claims for which is better, only using the PS5 as an example because they gave us nice slides to work with.
But what about PC?
This sounds great, but unfortunately, games developed for console may literally not be able to run on any modern PC because of the pipeline issue, even if you have the fastest storage available. Currently, even if we were willing to leave HDDs in the dust (it's overdue anyway) We don't have the hardware for the accelerated pipeline, so we would not even be able to make the most of our SATA SSDs, even if games stopped running entirely on HDDs.
Unfortunately, it may be many years before we can have that kind of hardware accelerated IO on PC. It's likely that there's already discussion happening on it as game developers are not going to be excited about designing games for the blazing fast new consoles, then having to deal with extremely slow PC pipelines (I take care to say that it's the pipeline and not the storage itself that's slow on PC). It's not impossible that someone designs an expansion card that integrates hardware accelerators capable of accelerating the pipeline, but we run into this awkward chicken and egg situation where not enough people have the card to make it worthwhile for devs to make games that can make use of it, and not enough games that make use of it for consumers to bother buying the card. The same issue exists if a motherboard manufacturer decides that they're going to introduce the technology. What happens if multiple motherboard manufacturers all have the same idea, but make very different solutions, so game devs would have to pick one or develop for multiple. We have to wait for an organization to develop a standard and API that all the PC parts manufacturers agree on and use, THEN wait for widespread adoption of this technology before devs will begin making use of it. In the meantime, we get a lot of games that are released on both new gen consoles, but not on PC.
What a mess.
What a mess.
... Or so I thought.
Hardware Pipeline Accelerators for PC might already be here, in your machine right now.
I admit that this is a long-shot, but my point may be true regardless. Thre are a few things that we do know about the PS5 and Xbox Series X chips. They're using AMD technology for both CPU and GPU; they both use AMD's Zen2 CPU architecture and RDNA2 GPU architecture. Even if you don't know anything about those architectures, what's important is the graphics chip architecture RDNA was developed by AMD, in collaboration with both Microsoft and Sony for the next generation consoles. They both had input. While RDNA2-based graphics cards have yet to hit the market, AMD have said that they are coming. The important takeaway here is that technology that both Sony and Microsoft helped develop for consoles is coming to PC, because AMD owns the technology.
The Zen2 architecture CPUs that are in the next gen consoles, however, is already here in PCs. They're the 3rd generation AMD Ryzen CPUs, for example the Ryzen 7 3700X whose processing cores are identical in design to the cores in both consoles.
Here is a picture of a Ryzen 7 3700X with the metal Integrated Heat Spreader (IHS) removed from it. (and for reference, a picture of a 3700X with IHS installed on a motherboard)
There are two 'dies' in this chip. The larger die is called the IOD (Input/Output Die) and the smaller chip is called the Core Complex Die (CCD). The CCD contains nothing but the zen2 CPU cores. The IOD handles I/O for the CPU; PCIe lanes, RAM channels, communication with the chipset, etc.
Now after taking a good look at that chip, go back and look at Mark Cerny's diagram for the Playstation 5 chip. Now, this is a matter of interpretation, but that diagram to me gives the impression that the "I/O complex" is its own die on the package, much like what we see with the Ryzen 3rd gen chips. However, even if it is all integrated into one die like we see on the Xbox Series X, the point remains the same. If both Xbox and Playstation use the same or similar hardware for their hardware accelerated pipelines (Playstation and Xbox would have developed their own pipelines and APIs, and likely collaborated with AMD on the hardware), the IP for that hardware likely either belongs to AMD, or at least AMD may be free to implement those designs in their own IOD for PC.
It's possible that these I/O accelerators may be present in Zen2's IOD, thus present on any 3rd gen Ryzen, such as the 3600, 3700X, even the mobile 4800H. However, I have no proof that it's there, and honestly it's just as likely that it's not. Even if it's not, there's a strong possibility that AMD can add them to future generation designs.
Having the gaming I/O accelerators on the IOD effectively solves the entire mess I described above. It ensures adoption, as anyone buying using a CPU of that generation or later, at any price-point in that generation, would have access to the accelerators. AMD could even licence that IP to Intel to allow them to incorporate the same accelerators. It also solves the standardization issue, as one design means there's no muddying of the waters, and one clear path forward. Also, very notably, Microsoft has ALREADY made the API and pipeline that's in use in the Xbox Series X that they could literally just implement in windows, to allow PC ports of XBSX games.
The path towards a world of video games with no loading times on PC is clear.
tell me why I'm wrong discuss tech with me.
Discord: https://discord.gg/CHfha8V
Patreon: https://www.patreon.com/MeyerTechRants
Now after taking a good look at that chip, go back and look at Mark Cerny's diagram for the Playstation 5 chip. Now, this is a matter of interpretation, but that diagram to me gives the impression that the "I/O complex" is its own die on the package, much like what we see with the Ryzen 3rd gen chips. However, even if it is all integrated into one die like we see on the Xbox Series X, the point remains the same. If both Xbox and Playstation use the same or similar hardware for their hardware accelerated pipelines (Playstation and Xbox would have developed their own pipelines and APIs, and likely collaborated with AMD on the hardware), the IP for that hardware likely either belongs to AMD, or at least AMD may be free to implement those designs in their own IOD for PC.
It's possible that these I/O accelerators may be present in Zen2's IOD, thus present on any 3rd gen Ryzen, such as the 3600, 3700X, even the mobile 4800H. However, I have no proof that it's there, and honestly it's just as likely that it's not. Even if it's not, there's a strong possibility that AMD can add them to future generation designs.
Having the gaming I/O accelerators on the IOD effectively solves the entire mess I described above. It ensures adoption, as anyone buying using a CPU of that generation or later, at any price-point in that generation, would have access to the accelerators. AMD could even licence that IP to Intel to allow them to incorporate the same accelerators. It also solves the standardization issue, as one design means there's no muddying of the waters, and one clear path forward. Also, very notably, Microsoft has ALREADY made the API and pipeline that's in use in the Xbox Series X that they could literally just implement in windows, to allow PC ports of XBSX games.
The path towards a world of video games with no loading times on PC is clear.
Note:
Much of the conclusion of this is speculation based on analysis, and may not in fact be correct. It could be that AMD never brings the hardware pipeline acceleration to PC, and we have to rely on some other entity to develop and standardize it for PC. However, given the evidence, I believe this is the simplest and most likely path.Thank you for reading!
If you enjoyed this, feel free to visit my Discord and my Patreon toDiscord: https://discord.gg/CHfha8V
Patreon: https://www.patreon.com/MeyerTechRants
The gaming consoles give the game developers a known quantity and they go and write their code to hopefully make the best use of the performance available. If I was Sony and Microsoft I wouldn't want to burden the developer with any extra overhead to maximize the SSD performance, I'd hope that it would be provided to them as high performance storage, and hide the details. This will allow for, when the time comes to "port" the game to PC, they could provide the user a option for legacy mode or optimized mode. Maybe have a benchmark tool in the game options to auto-select. Although your idea for the I/O die has merit, and I'm not going to dissuade you, because there are so many possible accelerator possibility for it. One must consider what AMD has in store for the replacement of StoreMI. I'm thinking that they will allow the fusing of HDD, SATA SSD, Nvme SSD, and RAM in a single filesystem where block location is constantly optimized. They could also have a "game drive" which fuses Nvme SSD and RAM only. This is where the I/O die idea makes sense to facilitate on the fly compression and decompression and maybe even extra DMA controllers. I would also say that this functionality would need to be leveraged in the enterprise arena before it would be considered. By eliminating the need for developers to know anything about the inner workings of the fast storage, and just presenting it as the storage device, it make developing much easier and porting to PC seamless. If I had enough money to buy a system capable of 128 GB or more, then I could create a 112 GB+ sized RAM drive to store game resources of a particular game. At this point I'm gambling that the game detects that I have console spec storage performance and I can game on the path where loading elevator's don't exist.
ReplyDeleteCame here from Reddit. This is ridiculous.
ReplyDeleteHi there, You’ve done an excellent job. I will definitely digg it and personally recommend to my friends. I’m confident they’ll be benefited from this website.
ReplyDeleteLike!! Really appreciate you sharing this blog post.Really thank you! Keep writing. This is my site. คาสิโนออนไลน์ , แทง esport