Ever wondered why one of the most awaited features of the newly dawned NVIDIA’s Maxwell was deprived from it? Well, this question did baffle many other readers and me as well. Digging deep into the technical explanation and finding out why did this happen takes time, so here’s a short and brief yet detailed account of what made NVIDIA do so. Also the disappearance of the integrated CPU on the GPU die is explained
Introduction- Unified Virtual Memory in a nutshell
NVIDIA first announced this feature in the official announcement for the “that time” forthcoming Maxwell architecture, what this basically does is that it conjuncts the System’s main memory with the memory possessed by the GPU which is the VRAM. The CPU memory and the GPU memory is addressed unanimously and hence the CPU application can benefit from the increased and faster RAM and vice versa (The GPU oriented programs written in CUDA SDK). A Similar example of NVIDIA’s Unified memory approach can be seen in the Tegra K1 “superchip” Tablet SoC which is basically an ARM based CPU with a 192 cored NVIDIA Kepler iGPU. This could prompt that in the near future, the desktop UVM implementations could be benefited by addition of an on-board integrated ARM based CPU. But sadly, this year’s GTC marked the dawn of UVM to be featured on the Pascal GPU micro-architecture and won’t be featured on any Maxwell GPU die.
More about CUDA Programming and UVM
If UVM is manifested in desktop GPUs then CUDA programmer would be relieved from the hassle of making explicit copy operations to move data from CPU memory to GPU memory (i.e the VRAM) hence making it more efficient to code in CUDA 6 (the latest revision). The GPU’s architecture is burdened with the copy task of moving the pertinent code from memory to video memory and hence doing it in behalf of the programmer. Therefore UVM would also help in developer resources. Since the CUDA SDK requires a central processing unit per memory to copy the code back and forth to maintain the unification integrity, the ARM based CPU on the Tegra K1 superchip may be the most efficient solution for future implementations.
So why did this happen? We do have an answer
NVIDIA officially declared at the GTC that the UVM will not be the prime feature of the Maxwell but instead will be another prime feature of Pascal other than the already scheduled NVLINK and Stacked 3D DRAM technological implementations. Just like we explained already, the current approach, one needs to couple the VRAM with an integrated CPU (like the ARM A15 CPU found on Tegra K1) but then comes the catch, the NVLINK which would first feature on Pascal GPUs delivers the System to Graphic memory coherency even effectively yet efficiently.
We need to explain about the NVLINK in short for better explanation. Basically and in the most simplest terms, the NVLINK is a DMA+ based interconnect that beats the traditional PCIe interface in terms of sheer memory bandwidth and leverage it from the bandwidth limitations between the inter GPU memory as well as the System Memory (SDRAM) it also features a technology called the second generation cache coherency which as the name suggests enables coherency of data residing in multiple caches. But cache coherency isn’t the NVIDIA “thing” since it features on other modern architectures too. So in short, NVLink provides a more efficient UVM approach and along with the optimizations to let the CUDA programmers use the DRAM and VRAM in one single unit and five to twelve times higher memory bandwidth to deliver the best of both worlds. And hence, since the NVLink is a “Pascal exclusive” feature, one wouldn’t get the UVM or the supporting iCPU in the near future Maxwell die GPUs. maybe now you know “Why Maxwell didn’t get UVM”.
Conclusion – Future expectations for the UVM and iCPU
When asked about the near future mainstream implementations of NVLink on PC platforms made NVIDIA comment that they don’t plan to involve the mainstream aspect of the feature to be implemented on the gaming GeForce GTX GPU line-up, rather they’d concentrate more on the Tesla Supercomputing behemoths and the server solutions to be ready for NVLink and hence the dream of having an UVM GPU with NVLink in about a year is completely shattered. But this also may be the ‘cause the chipset manufacturer has to comply with the NVLink criterions and thus NVIDIA and Intel must work together to develop the suited solution for this situation, at least that’s what the officials a GTC said when asked about it. In the end the outcome and effectiveness is what matters, so let’s keep our fingers crossed to hope for the best for the mainstream PC sector. Have any questions? Shoot ‘em up in the comments below.