AMD has today revealed new details on its high-end VEGA architecture for flagship graphics cards. AMD VEGA has been built on the 5th generation Graphics CoreNext, a new iteration of the internal architecture of GPU core, but all with a completely new design. GPUs will continue to use GCN computing units, but with the addition of several new components. This will improve the efficiency of the chip despite dealing with complex graphics and general processing workload.

AMD VEGA Architecture: All You Need to Know

The idea behind VEGA is to face emerging challenges, and this translates into leaving the stagnation obsolete 1080p resolution and move to popular resolutions such as 2K and 4K requiring a proportional increase in GPU performance. Furthermore, the workload required silicon level optimization. This means that giving the GPU the ability to learn how a 3D application behaves, and allows it to optimized itself to the running application. For this AMD has added a second set of Ultrafast memory that works in tandem with the graphic memory.

The GPU also needs expanded geometry processing capabilities to cope with the increasing realism in the today’s Photorealistic 3D scenes. That’s why AMD added an updated pixel engine to draw the scenes in 3D. AMD claims that they made major changes in the four areas with “VEGA”.

Improved Memory Management

Below you can see some typical diagrams of the completely renovated memory architecture of AMD VEGA, which ensures that data moves smoothly in and out of the GPU and to make sure that valuable resources not lost in the data search of the host machine.

AMD GPUs will be equipped with a lot of memory bandwidth with large memory bus widths, however, AMD believes there is scope to improve the way the GPU juggles between the host and memory of local video.

Adaptive Fine-grained Data Movement

AMD believes that there is a disparity between the memory allocation and actual memory access by applications. An application may load resources it find relevant for the 3D scene being processed, but not access it at all times. This disparity feeds valuable memory, impairs memory bandwidth, and it also wasted the clock cycles while attempting to move the data.

Normally, the graphics driver development team works with game developers to minimize this phenomenon and rectify through game patches and driver updates. AMD indicates that this can be corrected at the hardware level. AMD calls this “Adaptive fine-grained data movement“. It is a comprehensive pipeline memory allocation that detects the relevance of the data, and moves to relevant physical memory, or differs access.

High Bandwidth Cache Controller – HBCC

One of the problems that AMD identified with their last-generation GPUs is that some applications need to access more data than is available in their VRAM. Compute applications in particular, along with professional rendering tools, are most susceptible to slowdowns due to these problems. Why? Well, traditionally if a GPU wants to access data outside its VRAM, it must first pause and transfer this data from system RAM or SSD/HDD storage into VRAM before any processing can take place.

Vega changes this through the inclusion of a High-Bandwidth Cache Controller. Start with a quick cache that is one level above the traditional L2 cache, but large enough with an extremely low latency. This cache based on a separate silicon die seated on the interposer, the silicon substrate which connects the chip with GPU memory stacks. AMD calls this the High Bandwidth Cache Controller – HBCC.

The HBCC has direct access to the other memory along the pipeline, including video memory, system memory, etc. Vega supports a virtual memory address space of 512 TB, which is far larger than any on-board VRAM solution. The GPU uses the HBMC to cushion and soften the movement of data between the host and GPU machine. This approach would ensure that the GPU has to spend less resources in obtaining irrelevant data and greatly improves the use of memory bandwidth.

The reason for such a large space virtual address is the same as that found in a CPU. Directories can be allocated more efficiently with the memory management unit of the GPU, which manages the virtual mapping to physical and also enables moving pages of memory between storage tiers, similar to how the paging file works on Windows.

Apart from it, the VEGA architecture also equipped with NVRAM. This means that the GPU has the ability to interact directly with the NAND Flash memory or 3D X-Point of an SSD on a PCIe connection located,  which helps to work with huge data sets. The port of “Network” allows graphics card manufacturers to add network PHYs directly to the card (help render farms). Thus, AMD is preparing a common silicon for various applications (consumer and professional level render farms).

All this will be supported by the HBM2 memory, which comes with eight times the maximum cell density and twice the bandwidth compared to the HBM1 memory, which was debuted with the Radeon R9 X Fury. In theory, you may use up to 32 GB of memory through four stacks, eliminating the limitation of 4GB per stack in HBM1.

Next Generation Geometry, Calculus and Pixel Engine

AMD improved the process responsible for processing geometry available in previous generations. The pipe / programmable geometry pipeline of new generation has more than twice the maximum performance per clock. VEGA now supports primitive shaders, plus contemporary geometry shader and vertex. Also AMD has improved the way it distributes workloads between geometry, calculus and engines pixels.

A Primitive Shader is a new type of low-level shader that gives the developer more freedom to specify all the shader steps you want to use and run them at a higher speed because they decoupled from the traditional DirectX shader model. Also AMD has added the ability to use the graphics driver to predefine where a game will use full DirectX shader, which can be replaced by a single primitive shader for improved performance.

With VEGA, AMD improved CUs functionality, now called NCUs (Next Generation Compute Engine), adding support for super simple operations of 8 bits, in Addition to operations 16 – bit (FP16) introduced with Polaris and conventional single -and double-precision floating-point operations that support the older generations. Support for 8-bit operations allows game developers to simplify their code, so if you leave your footprint within the memory of the 8-bit address space, 512 of them can be worked through the clock cycle.

AMD also introduced a new feature called “Rapid Packed Math” in which several groups of operations between 16 – bit 32 – bit registers to perform simple clock – work. Thanks to these improvements, AMD VEGA architecture NCU is able to perform higher clock operations per clock cycle in comparison with the previous generation, and all this with twice the clock speed. AMD has also given life to a memory bandwidth savings lossless compression algorithms. Finally, AMD improved Pixel Engine with a new generation of binning rasterizer. This conserves clock cycles.

We ended up indicating that AMD changed the hierarchy of the GPU in a way that improves the performance of applications. The pipe / pipeline geometry, the calculation engine and engine pixels, leaving the ROPs (L1 cache) which are now linked to the L2 cache memory.