When it comes to PC Components, AsusTek is among one of the best names we have in the market. It was founded in 1989 in Taiwan. Ever since its foundation, Asus has seen a phenomenal growth and diversity in its business line. When it comes to Asus, the first associated name that comes to mind is ROG or Republic of Gamers. ROG brand was introduced in 2006 and it focuses on mainstream gamers/enthusiasts with products ranging from Motherboard, Graphics Cards to Peripherals. ROG is now the pinnacle of the Asus products lineup. Strix has been a new addition in ROG lineup and here is what Asus is saying about it: “ROG Strix is the newest recruit into the Republic of Gamers. A series of specialized gaming gear designed for the rebel in all of us, Strix exemplifies ROG’s premier performance, innovative technology, and leading quality, but with its own confident and dynamic attitude. Featuring bold designs and bright colors, this exciting new series possesses a spirit of fierce individualism that charges every gaming experience with thrilling energy. ROG Strix equips players with the necessary speed and agility to dominate their game. A new generation of force has arrived. Join the Republic and experience the power of ROG Strix.” Asus has sent me their Strix GeForce RTX 2080 and 2080Ti graphics cards for the review. Since this is my time with the RTX, I will be giving a brief introduction to the Turing architecture before moving on to the main content.
The word Reflection has been under the spotlight in recent time among the PC and Gaming communities. Much has been hyped before the launch of the Nvidia’s new generation of the gaming graphics card. Since the launch of the Pascal in 2016, gamers were expecting a new generation from Nvidia in 2018 and Nvidia has kept the trend going. But this time around it is different. Saying bye bye to GTX and welcome RTX. People were not anticipating the change in branding as the topic of the discussion has always been the numbering nomenclature of the then-upcoming graphics cards. With RTX, Nvidia has introduced new technological advancements in the graphical processing and has taken it to a new level at hardware abstraction. Ray Tracing is at the heart of the new RTX graphics cards based on Turing architecture. Ray tracing in real-time was quite challenging in the previous generations as ray tracing is quite computationally intensive and it was simply not possible on a single GPU and ray tracing has its benefits when it comes to rendering closer to the reality. Now, Nvidia has provided dedicated core based on RT to up the game and it has taken them 10 years in research to bring forward this technology at the consumer front. Another significant addition is the Tensor Core which is again a hardware-based implementation taking advantage of the AI (Artificial Intelligence) based on DNN (Deep Neural Network) and has implemented a new sampling technique called DLSS (Deep Learning Super Sampling), advanced shading techniques and this list can’t be complete without mentioning the new GDRR6 memory. I have jumped straight at the new additions yet we have a new core architecture here labeled as Turing. In a nutshell, Turing is a major architectural leap forward using new hardware-based accelerators, Hybrid Rendering approach using Rasterization and Ray Tracing, AI and simulation that would bring closer to the real life like cinematic effects and realism in the gaming.
So, what is the Ray Tracing? What AI and Deep Neural Network have to do with the graphics processing in gaming? What is the new architecture? It is time to take a look at the Turing architecture to answer these questions. Please, keep in mind that I will be brief in this content. If you want to read more about it, I will leave the links at the end of this introduction for detailed study. Please, keep in mind the coming paragraphs will be focusing on the gaming architecture excluding the Tesla. Let’s start with the key ingredients of the Turing architecture:
- New Core Architecture using new Streaming Multiprocessors
- RT (Ray Tracing) cores
- Tensor Cores
- GDDR6 Memory
- Advanced Shading Techniques
- Second Generation Nvidia NVLink
- USB-C and VirtualLink
Following chips are based on Turing:
Here TU102 is a fully enabled implementation/version of the Turing whereas TU104 and 106 are scaled down versions of the full Turing architecture. The naming convention for the graphics cards are as follow:
- RTX 2080Ti is based on TU102
- RTX 2080 is based on TU104
- RTX 2070 is based on TU106
Unlike the past trend in which the Ti card was being released later down the road after the release of cut down versions, this time around we are seeing RTX 2080 and RTX 2080Ti being launched right at the start whereas the RTX 2070 is planned to be launched in late October 2018.
Here are the key highlights of the Turing:
The high-end TU102 GPU includes 18.6 billion transistors fabricated on TSMC’s 12 nm FFN (FinFET NVIDIA) high-performance manufacturing process. The GeForce RTX 2080 Ti Founders Edition GPU delivers the following exceptional computational performance:
- 2 TFLOPS1 of peak single precision (FP32) performance
- 5 TFLOPS1 of peak half-precision (FP16) performance
- 2 TIPS1 concurrent with FP, through independent integer execution units
- 8 Tensor TFLOPS1,2
- 10 Giga Rays/sec
- 78 Tera RTX-OPS
Turing Architecture and SM Design
Let’s take a look at the new core architecture of the Turing. At its very heart is the TU102 GPU which packs 6 GPCs (Graphics Processing Clusters), 36 TPCs (Texture Processing Clusters), and 72 Streaming Multiprocessors (SMs). Each GPC packs 6 TPCs and a dedicated raster engine. Each TPC, in turn, packs 2 SMs where each SM has 64 CUDA cores, 8 Tensor Cores, a 256KB register file, 4 Texture Units, and 996 KB of L1/shared memory. This shared memory has a very interesting feature which I will describe shortly. Here is a graphical presentation of the TU102 GPU. TU102 includes two NVLink x8 links each capable of delivering up to 25 Gigabytes/second in each direction, for a total aggregate bi-directional bandwidth of 100 Gigabytes/second.
Having described the components and sub-components configuration, here is the summed up implementation of the TU102 GPU:
- 4608 CUDA Cores
- 72 RT Cores
- 576 Tensor Cores
- 288 Texture Units
- 96 ROPs
- 6144 KB L2 Cache
- 144 FP64 units [two per SM]
- 12 32-bit GDDR6 Memory Controller with 384-bits in total.
Each memory controller has 8 ROP units and 512 KB L2 cache. Here is a pic of the actual die.
The key difference between Turing and Pascal is coming from the concurrent execution of FP32 and INT32 operations which was not possible in the Pascal. The Turing architecture features a new SM design that incorporates many of the features introduced in the Volta GV100 SM architecture. Two SMs are included per TPC, and each SM has a total of 64 FP32 Cores and 64 INT32 Cores. In comparison, the Pascal GP10x GPUs have one SM per TPC and 128 FP32 Cores per SM. The Turing SM supports concurrent execution of FP32 and INT32 operations, independent thread scheduling similar to the Volta GV100 GPU. Each Turing SM also includes eight mixed-precision Turing Tensor Cores and one RT Core.
As mentioned above that TU104 and TU106 are scale down version of the fully enabled Turing architecture, it is time to take a quick look at them before coming back to the main architecture debate. The full TU104 chip contains six GPCs, 48 SMs, and eight 32-bit memory controllers (256-bit total). In TU104, each GPC has a raster unit and 4 TPCs. Each TPC has a PolyMorph Engine and two SMs. Each SM has the new RT Core, 64 CUDA Cores, 256 KB register file, 96 KB L1 data cache/shared memory cache, and 4 texture units. The full TU104 chip contains 13.6 Billion transistors and includes 3072 CUDA Cores, 368 Tensor Cores, and 48 RT Cores. TU104 also supports second-generation NVLink. One x8 NVLink link is included, providing 25 GB/sec of bandwidth in each direction (50 GB/sec total bandwidth).
The GeForce RTX 2070 based on the Turing TU106 is designed to deliver the best performance and energy efficiency in its class. Don’t take it to light for just being the entry-level card in gaming RTX lineup – for it is not cheaper by any mean. Most of the key new features found in the Turing architecture are supported by TU106, including the RT Cores, Turing Tensor Cores, and all of the architectural changes made to the Turing SM. The TU106 does not offer NVLink or SLI support. The TU106 GPU has 3 GPCs, 36 SMs, and eight 32-bit memory controllers (256-bit total). In the TU106, each GPC has a raster unit and 6 TPCs. Each TPC has a PolyMorph Engine and 2 SMs. Each SM in TU106 has the new RT Core, 64 CUDA Cores, 256 KB register file, 96 KB L1 data cache/shared memory cache, and 4 texture units. The full TU106 GPU contains 10.8 Billion transistors and includes 2304 CUDA Cores, 288 Tensor Cores, and 36 RT Cores.
Before moving on the RT and Tensor Core, below table is a comparison between the Pascal and the Turing GPUs.
|GPU Feature||GTX 1070||RTX 2070||GTX 1080||RTX 2080||GTX 1080Ti||RTX 2080Ti|
|GPU Base Clock MHz (Reference/FE)||1506/1506||1410/1410||1607/1607||1515/1515||1480/1480||1350/1350|
|GPU Boost Clock MHz (Reference/FE)||1683/1683||1620/1710||1733/1733||1710/1800||1582/1582||1545/1635|
|RTX-OPS (Tera-OPS) Reference/FE||6.5/6.5||42/45||8.9/8.9||57/60||11.3/11.3||76/78|
|Rays Cast (Giga Rays/s) Reference/FE||0.065/0.065||6/6||0.89||8/8||1.1/1.1||10/10|
|Peak FP32 TFLOPS (Reference/FE)||6.5/6.5||7.5/7.9||8.9||10/10.6||11.3/11.3||13.4/14.2|
|Peak INT32 TIPS (Reference/FE)||NA||7.5/7.9||NA||10/10.6||NA||13.4/14.2|
|Peak FP16 TFLOPS (Reference/FE)||NA||14.9/15.8||NA||20.1/21.2||NA||26.9/28.5|
|Peak FP16 Tensor TFLOPS with FP16 (Reference/FE)||NA||59.7/63||NA||80.5/84.8||NA||107.6/113.8|
|Peak FP16 Tensor TFLOPS with FP32 (Reference/FE)||NA||29.9/31.5||NA||40.3/42.4||NA||53.8/56.9|
|Peak INT8 Tensor TOPS (Reference/FE)||NA||119.4/126||NA||161.1/169.6||NA||215.2/227.7|
|Peak INT4 Tensor TOPS (Reference/FE)||NA||238.9/252.1||NA||322.2/339.1||NA||430.3/455.4|
|Memory Size and Type||8192MB GDDR5||8192MB GDDR6||8192MB GDDR5X||8192MB GDDR6||11264MB GDDR5X||11264MB GDDR6|
|Memory Clock||8 Gbps||14 Gbps||10 Gbps||14 Gbps||11 Gbps||14 Gbps|
|Memory Bandwidth (GB/s)||256||448||320||448||484||616|
|Texture Fill Rate (GT/s)||202/202||233.3/246.2||277.3/277.3||314.6/331.2||354.4/354.4||420.2/444.7|
|L2 Cache Size||2048 KB||4096 KB||2048 KB||4096 KB||2816 KB||5632 KB|
|Register File Size/SM||256 KB||256 KB||256 KB||256 KB||256 KB||256 KB|
|Register File Size/GPU||3840 KB||9216 KB||5120 KB||11776 KB||7168 KB||17408 KB|
|TDP (Reference/FE)||150/150 W||175/185W||180/180W||215/225W||250/250W||250/260W|
|Transistor Count||7.2 Billion||10.8 Billion||7.2 Billion||13.6 Billion||12 Billion||18.6 Billion|
|Die Size||314 mm²||445 mm²||314 mm²||545 mm²||471 mm²||754 mm²|
|Manufacturing Process||16nm||12nm FFN||16nm||12nm FFN||16 nm||12nm FFN|
Each Turing SM is partitioned into four processing blocks, each with 16 FP32 Cores, 16 INT32 Cores, two Tensor Cores, one warp scheduler, and one dispatch unit. Each block includes a new L0 instruction cache and a 64 KB register file. These blocks share a combined 96 KB L1 data cache/shared memory. Traditional graphics workloads partition the 96 KB L1/shared memory as 64 KB of dedicated graphics Shader RAM and 32 KB for texture cache and register file spill area. The Compute workloads can divide this 96 KB of memory into 32 KB shared memory and 64 KB L1 cache, or 64 KB shared memory and 32 KB L1 cache. Turing’s SM also introduces a new unified architecture for shared memory, L1, and texture caching. This unified design has allowed the L1 cache to increase its hit bandwidth by 2x per TPC compared to Pascal and allows it to be reconfigured to grow larger when shared memory allocations are not using all the shared memory capacity. This dynamic nature of the L1 cache is a handy implementation. The Turing L1 can be as large as 64 KB in size, combined with a 32 KB per SM shared memory allocation, or it can reduce to 32 KB, allowing 64 KB of allocation to be used for shared memory as per the requirement. Turing’s L2 cache capacity has also been increased.
Another key improvement area in the Turing is the concurrent processing of the FP and INT instructions which was previously not possible. Modern shader workloads typically have a mix of FP arithmetic instructions such as FADD or FMAD with simpler instructions such as integer adds for addressing and fetching data, floating point compare or min/max for processing results, etc. In previous shader architectures, the floating-point math datapath sits idle whenever one of these non-FP-math instructions runs. Turing adds a second parallel execution unit next to every CUDA core that executes these instructions in parallel with floating point math. Below pic shows that the mix of integer pipe versus floating point instructions varies, but across several modern applications, we typically see about 36 additional integer pipe instructions for every 100 floating point instructions. Moving these instructions to a separate pipe translates to an effective 36% additional throughput possible for floating point.
According to Nvidia, the overall technological advancements in SM have enabled the Turing to achieve 50% improvement in the delivered performance per CUDA core.
Tensor Cores and DLSS
Here comes another key difference between the Turing and Pascal, in fact, all the previous architecture. Turing features Tensor Cores. As mentioned above each GPC has 6 TPCs and each TPC has 2 SM units with each SM packing 8 Tensor Cores making it total 576 Tensor Cores in full unlocked Turing. What is Tensor Core and what benefit does it bring in the overall architecture? Let’s see about these questions. By incorporating the Tensor Cores, Nvidia has brought an altogether a new horizon for the game developers and gamers as we are literally talking about the Neural Network and Artificial Intelligence (AI) here. But what does it has to do with gaming? This question would definitely be bothering you by now. Well, first thing notably is here that these cores bring INT8, INT4, and FP16 precision modes for higher precision and inferencing workloads that can tolerate quantization. The introduction of Tensor Cores into Turing-based GeForce gaming GPUs makes it possible to bring real-time deep learning to gaming applications for the very first time. Turing Tensor Cores accelerate the AI-based features of NVIDIA NGX Neural Services that enhance graphics, rendering, and other types of client-side applications. The key NGX AI features are:
- Deep Learning Super Sampling
- AI InPainting
- AI Super Rez
- AI Slow-Mo
At the heart of the Turing Tensor Cores is the Matrix computation. Tensor Cores accelerate the matrix-matrix multiplication at the heart of neural network training and inferencing functions. Turing Tensor Cores particularly excel at inference computations, in which useful and relevant information can be inferred and delivered by a trained deep neural network (DNN) based on a given input. Examples of inference include identifying images of friends in Facebook photos, identifying and classifying different types of automobiles, pedestrians, and road hazards in self-driving cars, translating human speech in real- time, and creating personalized user recommendations in online retail and social media systems.
Each Tensor Core can perform up to 64 floating point fused multiply-add (FMA) operations per clock using FP16 inputs. Eight Tensor Cores in an SM perform a total of 512 FP16 multiply and accumulate operations per clock or 1024 total FP operations per clock. The new INT8 precision mode works at double this rate or 2048 integer operations per clock. This is a significant boost to the speed for matrix computations. Nvidia has used neural graphics functions as a mean to the new sampling technique called DLSS.
For the very purpose of gaming and game-related feature, I will be focusing on the DLSS only. Sampling sits at the very core of the graphics processing. In modern games, rendered frames are not displayed directly, rather they go through a post-processing image enhancement step that combines input from multiple rendered frames, trying to remove visual artifacts such as aliasing while preserving detail. For example, Temporal Anti- Aliasing (TAA), a shader-based algorithm that combines two frames using motion vectors to determine where to sample the previous frame, is one of the most common image enhancement algorithms in use today. However, this image enhancement process is fundamentally very difficult to get right. Nvidia has addressed the image analysis and optimization problem with no clean algorithmic solution through AI. Deep learning has now achieved super-human ability to recognize dogs, cats, birds etc., from looking at the raw pixels in an image. In this case, the goal would be to combine rendered images, based on looking at raw pixels, to produce a high-quality result—a different objective but using similar capabilities. The deep neural network (DNN) that was developed to solve this challenge is called Deep Learning Super-Sampling (DLSS). DLSS produces a much higher quality output than TAA from a given set of input samples, and Nvidia has leveraged this capability to improve overall performance. DLSS not only allows to generate much higher quality output but it does that with much more efficiency as it does that with half the shading work. In TAA, the rendering is done at the final target resolution and then combining frames and subtracting details. DLSS allows faster rendering at a lower input sample count, and then infers a result that at target resolution. The much faster raw rendering horsepower of RTX 2080 Ti, combined with the performance uplift from DLSS and Tensor Cores, enables RTX 2080 Ti to achieve 2x the performance of GTX 1080 Ti.
The key to the neural network is in the training process to continue increasing the learning capacity based on the given inputs making it smarter and accurate over time. Same goes with the DLSS where it gets the opportunity to learn how to produce the desired output based on large numbers of super-high-quality examples. To train the network, Nvidia collects thousands of “ground truth” reference images rendered with the gold standard method for perfect image quality, 64x supersampling (64xSS). 64x supersampling means that instead of shading each pixel once, it is done at 64 different offsets within the pixel, and then combine the outputs, producing a resulting image with ideal detail and anti-aliasing quality. This was the first phase. In the second phase, they also gathered raw images using normal rendering. Now comes the training process. The idea here is to make the DLSS network to match the processed outcome to the original 64xSS. DLSS will go through each input, producing an output and then measuring the difference between the outcome and original 64xSS which is now our target. DLSS then address the differences between the processed output and the target output through a process called back-propagation. Think of above as a single iteration. Now with repeated iterations, the DLSS learns on its own how to produce the final image that closely matches to the target 64xSS image. This is not all as the real beauty of this network is in doing all this yet avoiding the issues like blurring, disocclusion, and transparency. We all know that these are common issues with the TAA based rendering. Nvidia is calling all of the above-described capability as a standard DLSS mode. They have provided a second mode called DLSS 2X. In this mode, the DLSS input is rendered at the final target resolution and then combined by a larger DLSS network to produce an output image that approaches the level of the 64x super sample rendering – a result that would be impossible to achieve in real time by any traditional means. Below picture shows DLSS 2X mode in operation, providing image quality very close to the reference 64x super-sampled image.
As mentioned above that DLSS is handling the sampling task more intelligently and efficiently and also avoiding the issues like blurring, disocclusion etc. Blurring is in fact quite common in the traditional sampling renderers particularly in the case of the multi-frame image enhancement. Below picture illustrates one of the challenging cases for multi-frame image enhancement. In this case, a semi-transparent screen floats in front of a background that is moving differently. TAA tends to blindly follow the motion vectors of the moving object, blurring the detail on the screen. DLSS is able to recognize that changes in the scene are more complex and combines the inputs in a more intelligent way that avoids the blurring issue.
Another important aspect of this hardware level AI support is that it frees the main GPU cores to process more efficiently and effectively the tasks at hand while all the sampling-related processing is being handled by the dedicated Tensor Cores.
Turing Ray Tracing and RT Cores
We have seen quite a hot debate in the gaming community on the ray tracing when Nvidia announced the Turing architecture. It is a challenge as Ray Tracing in REAL TIME is something that can’t be handled by a single GPU in the past as it is quite a computational intensive rendering technology that realistically simulates the lighting of the scene and the objects. Turing GPUs can render physically correct reflections, refractions, shadows, and indirect lighting in real time. While NVIDIA’s GPU-accelerated NVIDIA Iray® plugins and OptiX ray tracing engine have delivered realistic ray-traced rendering to designers, artists, and technical directors for years, high-quality ray tracing effects could not be performed in real-time. Similarly, current NVIDIA Volta GPUs can render realistic movie-quality ray-traced scenes, but not in real- time on a single GPU. Due to its processing intensive nature, ray tracing has not been used in games for any significant rendering tasks. Instead, games that require 30 to 90+ frame/second animations have relied on fast, GPU-accelerated rasterization rendering techniques for years, at the expense of fully realistic looking scenes. It took Nvidia 10 years in research to bring the hardware-based real-time ray tracing capability on a single GPU at the gamers’ disposal. This hardware is what we know as RT Cores combined with NVIDIA RTX software technology where RT stands for Ray Tracing. This does not mean that Nvidia is not using Rasterization any more if that is what you are thinking. Rather, they have implemented the Raster Engine and RT Cores to work simultaneously and cooperatively for better rendering of the scene making it as much closer to the real life as possible hence creating a hybrid approach in which combination of Ray Tracing (which is still computational intensive) and Rasterization is being used to produce high-quality rendering. With this approach, rasterization is used where it is most effective, and ray tracing is used where it provides the most visual benefit vs rasterization, such as rendering reflections, refractions, and shadows. Hybrid Rendering combines ray tracing and rasterization techniques in the rendering pipeline to take advantage of what each does best to render a scene. We all know that although Rasterization based rendering produces good quality rendering but it has its own limitations as well. For example, rendering reflections and shadows using only rasterization requires simplifying assumptions that can cause many different types of artifacts. Similarly, static lightmaps may look correct until something moves, rasterized shadows often suffer from aliasing and light leaks, and screen-space reflections can only reflect off objects that are visible on the screen. These artifacts detract from the realism of the gaming experience and are costly for developers and artists to try to fix with additional effects. Rasterization and z-buffering are much faster at determining object visibility and can substitute for the primary ray casting stage of the ray tracing process. Ray tracing can then be used for shooting secondary rays to generate high-quality physically correct reflections, refractions, and shadows. Nvidia expect many developers to use hybrid rasterization/ray tracing techniques to attain high frame rates with excellent image quality. Alternatively, for professional applications where image fidelity is the highest priority, they expect to see use of ray tracing for the entire rendering workload, casting primary and secondary rays to create an amazingly realistic rendering.
Turing GPUs not only include dedicated ray tracing acceleration hardware, but also use an advanced acceleration structure. Essentially, an entirely new rendering pipeline is available to enable real-time ray tracing in games and other graphics applications using a single Turing GPU.
While Turing GPUs enable real-time ray tracing, the number of primary or secondary rays cast per pixel or surface location varies based on many factors, including scene complexity, resolution, other graphics effects rendered in a scene, and of course GPU horsepower. Do not expect hundreds of rays cast per pixel in real-time. In fact, far fewer rays are needed per pixel when using Turing RT Core acceleration in combination with advanced denoising filtering techniques. NVIDIA Real-Time Ray Tracing Denoiser modules can significantly reduce the number of rays required per pixel and still produce excellent results. Real-time ray tracing of selected objects can make many scenes in games and applications look as realistic as high-end movie special effects, or as good as ray-traced images created with professional software-based non-real-time rendering applications. Ray-traced reflections, ray-traced area light shadows, and ray-traced ambient occlusion can run on a single Quadro RTX 6000 or GeForce RTX 2080 Ti GPU delivering rendering quality nearly indistinguishable from movies.
Please, note that you would need raytracing enabled API to take full advantage of the Ray Tracing in Turing. Nvidia is reportedly working with the game developers to bring RT enabled games. Battle Field V, Shadow of Tomb Raider are among the few titles that will be RT enabled. Turing ray tracing hardware works with NVIDIA’s RTX ray tracing technology, NVIDIA Real-Time Ray Tracing Libraries, NVIDIA OptiX, the Microsoft DXR API, and the soon-to-come Vulkan ray tracing API. Users will experience real-time, cinematic-quality ray-traced objects and characters in games at playable frame-rates, or visual realism in professional graphics applications that has been impossible with prior GPU architectures in real time.
Turing GPUs can accelerate ray tracing techniques used in many of the following rendering and non-rendering operations:
- Reflections and Refractions
- Shadows and Ambient Occlusion
- Global Illumination
- Instant and off-line lightmap baking
- Beauty shots and high-quality previews
- Primary rays for foveated VR rendering
- Occlusion Culling
- Physics, Collision Detection, Particle simulations
- Audio simulation (ex., NVIDIA VRWorks Audio built on top of the OptiX API)
- AI visibility queries
- In-engine Path Tracing (non-real-time) to generate reference screenshots for tuning real-time rendering techniques and denoisers, material composition, and scene lighting.
As I mentioned above that Each SM has single RT Core making it 68 RT Cores in a fully unlocked Turing architecture. RT Cores accelerate Bounding Volume Hierarchy (BVH) traversal and ray/triangle intersection testing (ray casting) functions. RT Cores perform visibility testing on behalf of threads running in the SM. RT Cores work together with advanced denoising filtering, a highly-efficient BVH acceleration structure developed by NVIDIA Research, and RTX compatible APIs to achieve real-time ray tracing on single Turing GPU. RT Cores traverses the BVH autonomously, and by accelerating traversal and ray/triangle intersection tests, they offload the SM, allowing it to handle other vertex, pixel, and compute shading work. Functions such as BVH building and refitting are handled by the driver, and ray generation and shading are managed by the application through new types of shaders. Essentially, the process of BVH traversal would need to be performed by shader operations and take thousands of instruction slots per ray cast to test against bounding box intersections in the BVH until finally hitting a triangle and the color at the point of intersection contributes to final pixel color (or if no triangle is hit, background color may be used to shade a pixel). Ray tracing without hardware acceleration requires thousands of software instruction slots per ray to test successively smaller bounding boxes in the BVH structure until possibly hitting a triangle. It’s a computationally intensive process making it impossible to do on GPUs in real-time without hardware-based ray tracing acceleration.
The RT Cores in Turing can process all the BVH traversal and ray-triangle intersection testing, saving the SM from spending the thousands of instruction slots per ray, which could be an enormous amount of instructions for an entire scene. The RT Core includes two specialized units. The first unit does bounding box tests, and the second unit does ray-triangle intersection tests. The SM only has to launch a ray probe, and the RT core does the BVH traversal and ray-triangle tests, and return a hit or no hit to the SM. The SM is largely freed up to do other graphics or compute work.
Here is another key element that many gamers may not know. The very basic performance indicator for any graphics card is FPS i.e Frame per Second. Gamers tend to relate performance using the FPS. The higher the better is the thing with the FPS and the gamers. Ray tracing is not measured in terms of the FPS but Giga Rays per Second. Turing ray tracing performance with RT Cores is significantly faster than ray tracing in Pascal GPUs. Pascal is spending approximately 1.1 Giga Rays/Sec, or 10 TFLOPS / Giga Ray to do ray tracing in software, whereas Turing can do 10+ Giga Rays/Sec using RT Cores, and run ray tracing 10 times faster.
Here are some pictures showing the rendered scenes from various games with and without ray tracing.
In above picture, one can see realistic reflections on the car from an off-screen explosion in the RTX ON scene. Such reflections are not possible with screen-space reflections without ray tracing, as in the RTX OFF scene.
The scene in the above picture shows another issue with non-ray-traced reflection algorithms. In this case, with RTX OFF, a reflection is partially present, but missing for the portion of the scene that is visible through the gunsight. With RTX ON, the scene looks correct.
With RTX OFF, there are no shadows cast from the sparklers being held by the children in the above picture, so they look like they are floating above the surface. With RTX ON, the shadows are correct.
Turing Memory Architecture and Display
Now, that we have taken a look at the major architectural changes in the Turing over the previous generations, it is time to look at the memory architecture and display features before concluding the Turing introduction and moving on to the main content of the Asus Strix RTX 2080. Turing improves main memory, cache memory, and the compression architectures to increase memory bandwidth and reduce access latency. Generally, with any memory system/sub-system we are concerned with bandwidth and access time with a focus on more bandwidth and reduce access time or latency. Improved and enhanced GPU compute features help accelerate both games and many computationally intensive applications and algorithms. New display and video encode/decode features support higher resolution and HDR-capable displays, more advanced VR displays, increasing video streaming requirements in the datacenter, 8K video production, and other video-related applications in the Turing architecture.
As the display resolution increases, it puts a toll on the memory system as with increasing display size, one would need more bandwidth and more capacity to maintain the computational speed and higher possible frame rates. Turing is the first architecture to be using GDDR6 memory type. The Pascal has been using GDDR5 and GDDR5X memory types. GDDR6 is the next big advance in high-bandwidth GDDR DRAM memory design. Enhanced with many high-speed SerDes and RF techniques, GDDR6 memory interface circuits in Turing GPUs have been completely redesigned for speed, power efficiency, and noise reduction. This new interface design comes with many new circuits and signals training improvements that minimize noise and variations due to process, temperature, and supply voltage. Extensive clock gating was used to minimize power consumption during periods of lower utilization, resulting in significant overall power efficiency improvement. Turing’s GDDR6 memory subsystem delivers 14 Gbps signaling rates and 20% power efficiency improvement over GDDR5X memory used in Pascal GPUs. Achieving this speed increase requires end-to-end optimizations. Using extensive signal and power integrity simulations, NVIDIA carefully crafted Turing’s package and board designs to meet the higher speed requirements. An example is a 40% reduction in signal crosstalk, which is one of the most severe impairments in large memory systems. To realize speeds of 14 Gbps, every aspect of the memory subsystem was carefully crafted to meet the demanding standards that are required for such high-frequency operation. Every signal in the design was carefully optimized to provide the cleanest memory interface signaling as possible.
The size and speed of the L2 caches have been enhanced in the Turing GPUs. The TU102 GPU ships with 6 MB of L2 cache, double the 3 MB of L2 cache that was offered in the prior generation GP102 GPU used in the TITAN Xp. TU102 also provides significantly higher L2 cache bandwidth than GP102. Like prior generation, NVIDIA GPUs, each ROP partition in Turing contains eight ROP units and each unit can process a single-color sample. A full TU102 chip contains 12 ROP partitions for a total of 96 ROPs. NVIDIA GPUs utilize several lossless memory compression techniques to reduce memory bandwidth demands as data is written out to frame buffer memory. The GPU’s compression engine has a variety of different algorithms which determine the most efficient way to compress the data based on its characteristics. This reduces the amount of data written out to memory and transferred from memory to the L2 cache and reduces the amount of data transferred between clients (such as the texture unit) and the frame buffer.
In order to meet the ever-increasing need from the gamers to play at highest possible display size even 8k, the Turing GPUs include an all-new display engine designed for the new wave of displays, supporting higher resolutions, faster refresh rates, and HDR. Turing supports DisplayPort 1.4a allowing 8K resolution at 60 Hz and includes VESA’s Display Stream Compression (DSC) 1.2 technology, providing higher compression that is visually lossless. With DisplayPort 1.4a, we can as high a bandwidth per lane as 8.1Gbps.
Turing GPUs can drive two 8K displays at 60 Hz with one cable for each display. 8K resolution can also be sent over USB-C. Turing’s new display engine supports HDR processing natively in the display pipeline. Tone mapping has also been added to the HDR pipeline. Tone mapping is a technique used to approximate the look of high dynamic range images on standard dynamic range displays. Turing GPUs also ship with an enhanced NVENC encoder unit that adds support for H.265 (HEVC) 8K encode at 30 fps. The new NVENC encoder provides up to 25% bitrate savings for HEVC and up to 15% bitrate savings for H.264. Turing’s new NVDEC decoder has also been updated to support decoding of HEVC YUV444 10/12b HDR at 30 fps, H.264 8K, and VP9 10/12b HDR. Turing improves encoding quality compared to prior generation Pascal GPUs and compared to software encoders. Below picture shows that on common Twitch and YouTube streaming settings, Turing’s video encoder exceeds the quality of the x264 software-based encoder using the fast encode settings, with dramatically lower CPU utilization.
USB-C and VirtualLink
Nvidia has not forgotten the VR headsets while designing the Turing and has enhanced the architecture to support the industry’s new VR standard called VirtualLink. Using a single USB-C cable, Nvidia is able to reduce the cable clutter that is with the current generation. Supporting VR headsets on today’s PCs requires multiple cables to be connected between the headset and the system; a display cable to send image data from the GPU to the two displays in the headset, a cable to power the headset, and a USB connection to transfer camera streams and read back head pose information from the headset. This is cumbersome approach making it inconvenient for the gamers. To address this issue, Turing GPUs are designed with hardware support for USB Type-C™ and VirtualLink™. VirtualLink is a new open industry standard that includes leading silicon, software, and headset manufacturers and is led by NVIDIA, Oculus, Valve, Microsoft, and AMD. VirtualLink has been developed to meet the connectivity requirements of current and next-generation VR headsets. VirtualLink employs a new alternate mode of USB-C, designed to deliver the power, display, and data required to power VR headsets through a single USB-C connector. VirtualLink simultaneously supports four lanes of High Bit Rate 3 (HBR3) DisplayPort along with the SuperSpeed USB 3 link to the headset for motion tracking. In comparison, USB-C only supports four lanes of HBR3 DisplayPort OR two lanes of HBR3 DisplayPort + two lanes SuperSpeed USB 3. In addition to easing the setup hassles present in today’s VR headsets, VirtualLink will bring VR to more devices.
With Turing, there is a new approach to Multi-GPU based processing and scaling. While Nvidia users are tuned to the word SLI which would indicate the system with more than one graphics card, Turing has NVLink design which is something new for the gamers. Prior to the Pascal GPU architecture, NVIDIA GPUs used a single Multiple Input/Output (MIO) interface as the SLI Bridge technology to allow a second (or third or fourth) GPU to transfer its final rendered frame output to the primary GPU that was physically connected to a display. Pascal enhanced the SLI Bridge by using a faster dual-MIO interface, improving bandwidth between the GPUs, allowing higher resolution output, and multiple high-resolution monitors for NVIDIA Surround. Turing TU102 and TU104 GPUs use NVLink instead of the MIO and PCIe interfaces for SLI GPU-to- GPU data transfers. The Turing TU102 GPU includes two x8 second-generation NVLink links, and Turing TU104 includes one x8 second-generation NVLink link. Each link provides 25 GB/sec peak bandwidth per direction between two GPUs (50 GB/sec bi-directional bandwidth). Two links in TU102 provides 50 GB/sec in each direction, or 100 GB/sec bidirectionally. Two-way SLI is supported with Turing GPUs that have NVLink, but 3-way and 4-way SLI configurations are not supported. Please note that RTX 2070 will not have the NVLink functionality hence SLI on RTX 2070 (TU-106) cards won’t be possible. Compared to the previous SLI bridge, the increased bandwidth of the new NVLink bridge enables advanced display topologies that were not previously possible.
Source: I am thankful to the Nvidia for providing me the Turing literature. Above text is sourced from the Nvidia’s literature. If you are interested in reading more about it, here is the link.
Having discussed the Turing architecture it is time to take a look at our very first RTX based graphics card from the Asus. I will be taking a spin on Asus Strix GeForce RTX 2080 O8G edition today. This card is based on Turing TU104 GPU which is a cut down version of the fully enabled TU102. The TU104 GPU incorporates all of the new Turing features found in TU102, including the RT Cores, Turing Tensor Cores, and the architectural changes made to the Turing SM. The full TU104 chip contains six GPCs, 48 SMs, and eight 32-bit memory controllers (256-bit total). In TU104, each GPC includes a raster unit and four TPCs. Each TPC contains a PolyMorph Engine and two SMs. Each SM includes the new RT Core. Like TU102, each SM also includes 64 CUDA Cores, 256 KB register file, 96 KB L1 data cache/shared memory cache, and four texture units. The full TU104 chip contains 13.6 Billion transistors and includes 3072 CUDA Cores, 368 Tensor Cores, and 48 RT Cores. TU104 also supports second-generation NVLink. One x8 NVLink link is included, providing 25 GB/sec of bandwidth in each direction (50 GB/sec total bandwidth).
The Asus Strix GeForce RTX 2080 O8G edition retains the design of Strix cards introduced with Pascal generation with fans receiving a worthy upgrade. Aura RGB lighting is on the board and there are two fan headers and an RGB header as well. The card is using Asus MaxContact technology allowing 2X more contact with GPU for better thermal performance. This is a 2.7 slot design with emphasis on the cooling department. The major difference in terms of cooling is coming from the new axial-tech fans with the IP5X rating for better performance at improved acoustics. These cards are produced using Asus Auto-Extreme Technology. The frame is more reinforced to prevent torsion and lateral bending of the PCB. Another key feature differentiating this card from the previous generation is the Dual BIOS. These cards have two BIOS on the board. There is a switch located on the top side of the PCB. P and Q modes are designated for these BIOS. P mode enables the performance mode with an emphasis on better cooling to gain more performance and Q mode has a focus on silent operations where the fans will operate at much lower RPM at the cost of the thermal performance. This card has a base clock of 1515MHz with a boost clock of 1890MHz (OC Mode) with 8GB GDDR6 Micron chips.
Product: ROG Strix GeForce RTX 2080 O8G
Price: Rs.140,000/- [At the time of the review]
The front side of the packing box has ROG eye and Republic of Gamers printed on the top left followed by the ROG Strix Gaming Graphics Card text. The main background has ROG eye logo printed in multiple colors. GeForce RTX 2080 is printed at the bottom right. Asus AURA Sync, OC Edition, and 8GB GDDR6 info labels are printed at the bottom left side. There is a picture of the graphics card on the left side.
The top side of the packing box has GeForce RTX 2080 printed in the white and green colors. OC edition and 8GB GDDR6 are printed at the bottom. The right side has ROG brand logo and name printed.
The backside of the packing box has ROG brand logo and name printed on the top left side followed by the ROG Strix Gaming Graphics Card and GeForce RTX 2080 text. This card carries limited 3 years of warranty. There are 6 pictures in the center focusing on the salient highlights of the card like MaxContact Technology, Auto-Extreme Technology, AURA Sync Compatibility, Axial-Tech fans, Dual BIOS, and GPU Tweak II. Main specifications and key features are printed on the left side.
The left and right sides are identical. There is a ROG brand logo name printed on the top. ROG Strix Gaming Graphics and OC edition, 8GB GDDR6 text is printed in the middle. The lower portion has a green color background with GeForce RTX 2080 printed in the white color.
The bottom side of the packing box has minimum system requirements printed in 15 different languages. There is a sticker pasted on the right side showing the Part No, Serial No, EAN, and UPC labels and info. These requirements are: –
- Minimum 650W PSU or greater power supply.
- PCIe Compliant motherboard with dual-width graphics slot.
- 5GB of free disk space.
- 8GB System memory (16GB recommended)
- Microsoft Windows 7 x64/Microsoft Windows 10 x64 (April 2018 Update or later)
- 2x 8-pin PCIe connectors
There is a cardboard box inside the main packing box. It has Strix printed on the top cover. Opening it will show a black color Styrofoam pad placed on the top and there is container placed in the middle with Asus name printed in gold color. User guide and installation disk are inside this container. Removing this top layer will show the graphics card wrapped inside anti-static cover. Two ROG branded Velcro strips are also included.
- 1x Asus Strix GeForce RTX 2080 O8G graphics card
- 2x Asus ROG branded Velcro Hook and Loop
- 1x Quick Guide
- 1x Installation disk
Design and Features
It is time to take a closer look at the design of the graphics card before proceeding to the testing. Asus ROG Strix GeForce RTX 2080 O8G is a beautifully designed graphics card. It carries the same shroud design as was introduced with the release of the Pascal generation cards. It is a 2.7 slot design yet with aesthetically pleasing looks and feels to it. Aura Sync adds the subtle touch when in operation and it speaks for itself. This design really complements the ROG series motherboards from the Asus. The dimension of the graphics card is 11.8×5.13×2.13 inches or 29.97×13.04×5.41 CM. The card is following the PCIe 3.0 bus interface. It packs 8GB GDDR6 memory rated at 1750MHz using 256-bit bus width at 448 GB/s bandwidth. The base clock of the card is 1515MHz in all the modes. The default mode is Gaming Mode with 1860MHz boost clock and 1890MHz boost clock under OC Mode. Please, note that you will need to install GPU Tweak II to access these modes. BIOS switch has nothing to do with these modes. Interesting enough this card has 2944 CUDA Cores whereas fully enabled TU104 chip has 3072 CUDA cores. Maximum supported digital resolution is 7680×4320. The card is drawing power using two 8-pin connectors. This card packs 64 ROP units and 184 TMUs. The pixel fillrate is 98.9 GP/s and Texture fillrate is 284.3 GT/s. Texture fillrate is low as compared to Nvidia’s stated minimum of 314.6 GT/s.
Let’s dig deep in the design elements of this card and explore the might and the beauty of it. This card has a stylish cooler shroud that differentiates the Strix cards form others. The cooler shroud is made of hard plastic. Top and bottom cutouts on the cooler have LEDs on them which can be controlled with AURA Graphics Card software available on Asus website. The central fan has Asus branding printed in white color on its fan hub whereas the other two fans have ROG Eye printed in their centers. With the curves, edges, and grooves Asus not only was able to maintain the typical Strix looks it is known for but has given the user what could be described as one of the most stunning design.
Asus has taken a different approach (much needed for Turing) with the ROG Strix GeForce RTX series coolers design. They have increased the width of fin stack by adding 20% more to the surface area over the previous generation Strix cards making the design to be 2.7 slots one. This has enabled them to have more sink surface area for effective heat dissipation across the complete surface.
Asus MaxContact is an industry-first GPU cooling technology that features an enhanced nickel-plated copper plate that makes direct contact with the GPU. This plate is 10 times flatter than the traditional plates. MaxContact utilizes precision machining to provide a surface that makes up to 2X more contact with the GPU than traditional heat spreaders, resulting in an improved thermal transfer. This card is using a single heatsink with aluminum fins and five 8mm (not confirmed on the thickness) nickel plated copper heatpipes. There are two nickel plated copper plates. One is making contact with the GPU and the other is making contact with the MOSEFT/VRMs of this card. The heat pipes are terminated at the front.
The Asus Strix GeForce RTX 2080 O8G has three 90mm fans with an axial-tech design. The central fan has Asus branded sticker pasted in the center. The left and the right fans have ROG branded stickers pasted in the center. These fans have an IP5X certification which means they are more dust resistant which would improve their reliability and a longer lifespan. The previous generation of Strix cards has the wing-blade design. This time around, Asus has come up with the Axia-Tech fans which are delivering up to 27% increase airflow and 40% increased static pressure. This was a needed requirement as the width of the heatsink has been increased resultantly, stronger fans with high static pressure and airflow were needed. Asus has reduced the size of the fan’s hub to allow for longer blades and added a barrier ring that increases structural integrity and downward air pressure through the heatsink. These fans are using the Asus 0dB technology. Please, note that due to dual BIOS nature the 0dB works under the Q-Mode only. They don’t spin until the temperature exceeds 55°C. If you want to enable the 0dB technology for P-mode then use GPU Tweak-II to enable it. The left and right fans are grouped to be controlled jointly whereas the middle fan can be controlled separately using the GPU Tweak-II.
As mentioned above, among the key differentiating design features of this card over the previous generation is the Dual BIOS implementation. The Asus Strix GeForce RTX 20xx cards come with two BIOS. In order to differentiate the two, they are labeled as P-Mode and Q-Mode. P-Mode focuses on the performance with adequate cooling over the acoustic whereas the Q-Mode is focused on the silent operations which come at the cost of thermal performance. I have tested the thermal performance of the graphics card under both modes which can be checked in the testing section. There is a switch on the top side of the PCB. P-Mode is on the left side and the Q-Mode is on the right side. The default mode is P-mode. Another important observation is that once the Windows is loaded, switching to the other BIOS will not take effect until the PC is restarted. Below picture highlights the effects on temperature and acoustics in both modes. Asus in-house testing is showing the graphics card to be 30% cooler in P-Mode over the Q-Mode. Similarly, the graphics card was 25% quieter in Q-Mode over the P-Mode.
Another key design feature is the provision of the LED On/Off button located on the backside of the graphics card. This will allow the users to turn the RGB lighting completely on or off at their disposal. This was not possible in the previous design. Seems, Asus has taken the note of the feedback by the users. This somehow has a limitation as it will disable/enable the entire lighting zones on the card. There are three zones. One on the ROG Eye located on the backplate, one on the top side and one on the shroud itself. There is no control over the dedicated zones lighting. Builders/gamers who would prefer stealth look would appreciate this feature.
Let’s take a look on the top side of the graphics card. STRIX is printed on the lower left part of the shroud. GeForce RTX is printed on the upper part of the shroud opposing the STRIX. The fins are straight design not angular. Shroud is not fully covering the fin stack which is a must for effective heat dissipation. “Republic of Gamers” brand name and logo are on the top left side of the shroud. They have LED underneath and light up under operation. Asus has implemented reinforced frame in this generation of Strix cards which has increased the structural integrity of these cards 3X by using a metal brace which is mounted to both backplate and I/O shield. This metal brace prevents excessive torsion and lateral bending of the PCB.
The card requires two 8-pin power connectors to power it up. Both connectors have LEDs beneath them to indicate their action. Static white color would mean the normal power. Static red light would indicate the power related issue.
Let’s have a look at the top front side of the graphics card. The Shroud’s end is not fully covering the heat sink. Heads or terminating ends of the 5 heat pipes are visible. Underneath we have two PWM fan headers. ASUS FanConnect II features two 4-pin, hybrid-controlled headers that can be connected to both PWM and DC system fans for optimal system cooling. Normally there is no way to reference the PC Chassis fans to regulate their speeds based on the graphics card’s temperature. Asus has taken care of this particular situation in their Strix cards as up to fans can be connected and controlled based on the graphics card’s requirement. The connected fans reference both the GPU and CPU, operating automatically based on the one with the higher temperature. One fan power connector and the RGB LED power connector are visible on the left side. There is a 4-pin RGB header having 12V GRB pin format. The user can connect supported RGB LED strip with the graphics card as well. This will come handy when using on the Asus AURA Sync enabled motherboard and in that particular scenario think of it adding one more header at the user’s disposal. The connected fans can be controlled using the GPU Tweak II. This end of the shroud has extended over the PCB and the heat sink which adds to the looks of the card from the front side and gives the impression of one complete design.
On the back side of the graphics card, we have the same metal backplate as has been on the previous generation. It has printed lines in a pattern to signify Strix concept. We’ve a large size ROG Eye in the white background which is a diffuser. This section is implemented with RGB LED and really adds to the cool looks on the card when in use. We can see two 8-pin power connectors. There are what seem to be soldered overclocking tweaking points on the left side of the power connectors. One of the screws on the GPU bracket is covered with a white sticker. Peeling or tearing that would void the warranty though recently warranty terms have been redefined dropping this requirement in the US region. I am not sure if this is done worldwide yet. There is a sticker pasted on the bottom right side with the serial no of the card. The LED on/off button is located under the NVLink connector. As mentioned in the introduction, the Turing based graphics cards have been implemented with NVLink that enables the multiple-GPU configuration at a much higher bandwidth of up to 50 GB/s bi-directional for the TU104 and 100 GB/s for the TU102 GPUs. Though I did not open the card, it seems like there is no thermal pad between the backplate and the PCB.
The rear side has the I/O shield for the output. It is not in a silver color as Nvidia has opted for a black color I/O shield on their RTX cards which definitely adds to the overall look and feel of the graphics card. We have two HDMI 2.0b ports, two DisplayPort 1.4 ports, and a USB Type-C port. This configuration allows the user to enjoy immersive virtual reality experiences anytime without having to swap cables by having a VR Device connected with other displays at the same time. Backside implementation allows the better cable management as well.
The bottom side of the card clearly shows the two fin stacks on the cooler. Thermal pads have been used on the possible point of contacts between PCB and the cooler. PCB color is black. The visible thermal pad seems to have slipped through the QC as it is already torn.
Asus graphics cards are produced using Auto-Extreme Technology, an industry-exclusive, 100% automated production process that incorporates premium materials to set a new standard of quality. Auto-Extreme Technology ensures consistent graphics card quality as well as improved performance and longevity. It allows the soldering to be done in a single pass reducing thermal strain on the components and avoiding the use of harsh cleaning chemicals. The end result is a less environmental impact, lower manufacturing power consumption, and a more reliable product.
The Asus Strix GeForce RTX 2080 O8G has 10+2 power phases using Super Alloy Power II components. These components would enhance efficiency, reduce power loss and would achieve sustained thermal levels. They are using SAP II capacitors having a 2.5X extended lifespan (over 90000 hours longer than standard capacitors), SAP II chokes to help to reduce the buzzing, SAP II DrMos for lower temperature and increased power efficiency and SAP II POSCAP to maximize overclocking headroom.
Featuring Aura RGB Lighting on both the shroud and the back plate, ROG Strix graphics cards are capable of displaying millions of colors and six different effects for a personalized gaming system. ROG Strix graphics cards also feature ASUS Aura Sync, RGB LED synchronization technology that enables complete gaming system personalization when the graphics card is paired with an Aura-enabled gaming motherboard. There are 6 modes which user can configure and select for the color effect.
- Static mode. A single color of user’s choice would remain lit.
- Breathing mode would fade in and out the user’s selected color.
- Strobing mode flashes the user’s selected color.
- Music Effect mode would produce the pulses of the user’s selected color.
- Breathing mode will enable the user to select the color which will be then faded in and out.
- GPU Temperature will change the color depending upon the load and the temps under the loads.
GPU Tweak II
Asus has designed comprehensive software to control and monitor their graphics cards. This software is known as GPU-Tweak-II. It has a typical red and black color theme on it which represents ROG traditional colors. Though in recent times, ROG has taken a deviation from the Red/Black combo and is setting yet another tradition when it comes to colors on the brand.
The main window of the software shows three main indicators which are: –
- VRAM Usage
- GPU Speed
The red bar on these circles shows the corresponding value of the indicator. On top, we have model no of the graphics card on the left side with three buttons to its right, Home, Info and Tools.
The home button is the default and can be clicked at any time to bring the main window back on the screen. Info button will show the Graphics Cards specs with built-in GPU-Z implementation. Tools button has Game XSplit Game Caster, AURA Graphics Card and the ROG Furmark buttons to launch the corresponding app.
Below the model no, we have a triangle featuring the blend of most important factors that end user would want. They are Performance, Coolness, and Silence. An optimal combination of these three is what Strix is all about. One can have the utmost performance with exceptional cooling yet silent operations. Our card has boosted to 1995MHz at a lower sound level on the fans with 67°C temps (P-Mode) clearly indicating what Asus has achieved here. Red color span within triangle would vary with each profile showing how the card would manage all three with a respective profile.
Next, we have 4 profiles which are OC Mode, Gaming Mode, Silent Mode, and My Profile. Gaming is a default mode with a base clock of 1515MHz and boost clock of 1860MHz. OC Mode has a base clock of 1515MHz and boost clock of 1890MHz. My Profile will allow the user to create a custom profile based on user’s own settings. This can be done in Professional Mode where all the settings like Voltage Control, Power Level, Base Clock, Memory Clock, Fan Speed can be configured. The fan can be set on Auto or Custom fan curve.
Monitoring window can be activated by clicking on the Monitoring button at the bottom left side of the main window. Monitoring window shows all the critical variables for monitoring. Values are mentioned in Min, Max and Current value is shown on the graph. The user has the option to monitor only the desired variables. Monitoring window can be disconnected from the main window by clicking once on the chain button between both windows. This is where the user can also control the fans (if any) connected to the graphics card’s fan headers.
Gaming Booster option is at the bottom of the main window. Clicking this would open a new window. Here we’ve three options. Visual Effects, System Services, System Memory Defragmentation. Visual effects reduce the windows visual flares like animations, animated themes to reduce the performance hit these settings could have on the performance. System Services would allow stopping the not needed services to boost the performance. System memory defragmentation would help restore the wasted memory space and would boost the application handling.
In crux, this software has everything, the user would have dreamt of to monitor and control their graphics cards. Plus, the interface is easy to understand and once, you have launched it, it will get you going.
Following test bench setup is used:
- Intel i7 8700k @ 5.0GHz using 1.350V
- Asus Strix Z390-E Gaming
- Ballistix Elite 4x4GB @ 3000MHz
- Deepcool Castle 240 AIO
- Thermaltake TP RGB 750W PSU
- HyperX 120GB SSD
- Seagate Barracuda 2TB for games
Following games have been tested:
- Battlefield 1 [DX11, DX12]
- DOOM [Vulkan]
- Grand Theft Auto V
- Metro Last Light Redux
- Far Cry 5
- Assassin’s Creed Origin
- The Witcher 3
- Rise of the Tomb Raider [DX11, DX12]
- Shadow of the Tomb Raider [DX11, DX12]
- Wolfenstein II: The New Colossus [Vulkan]
- Middle Earth: Shadow of War
- Ashes of Singularity: Escalation [DX11, DX12]
Software information is as under:
- MSI Afterburner v4.50
- HWInfo 64 v 5.88-3510
- Unigine Superposition
For GeForce GTX graphics card, Nvidia’s driver 398.36 are used and for GeForce RTX graphics cards, Nvidia’s 411.70 drivers are used. Microsoft Windows 10 x64 version 1607 was used. All the reported framerates are average. Previously Unigine Heaven and Valley were a part of our testing but they have been dropped in favor of superposition.
Let’s take a look on performance graphs.
Battlefield 1 DX11
On 1080P, there is a 28.6% performance gain over the GeForce GTX 1080. On 1440p, the performance gain is 34.15%. At 4K, the performance gain is 42.39%. The Asus Strix GeForce GTX 1080Ti OC was in closer performance range with Asus Strix GeForce RTX 2080Ti.
Battlefield 1 DX12
On 1080P, there is a 34.25% performance gain over the GeForce GTX 1080. On 1440p, the performance gain is 41.16%. At 4K, the performance gain is 39.24%. Once again, we are seeing close quarters between the GTX 1080Ti and RTX 2080Ti at 4k resolution but lower resolutions are showing the RTX 2080 to be a clear winner.
On 1080P, there is a 0.15% performance gain over the GeForce GTX 1080. The performance gain is marginal as all the cards are hitting near 200 FPS mark. So real, performance testing result would come from higher resolution testing particularly the 4K. On 1440p, the performance gain is 12.74%. At 4K, the performance gain is 31.74%. Both the GTX 1080Ti and RTX 2080Ti were neck-to-neck on it and at 4K GTX 1080Ti has taken a lead.
Wolfenstein II: The New Colossus
On 1080P, there is a 36.11% performance gain over the GeForce GTX 1080. On 1440p, the performance gain is 44.52%. At 4K, the performance gain is 46.44%. RTX 2080 has a lead over the GTX 1080Ti in this game.
Metro Last Light Redux
On 1080P, there is a 38.77% performance gain over the GeForce GTX 1080. On 1440p, the performance gain is 45.45%. At 4K, the performance gain is 54.54%. This game has shown a good improvement. Once again, the GTX 1080Ti and RTX 2080 were neck-to-neck with just 1 FPS difference.
Grand Theft Auto – V
On 1080P, there is a 23.32% performance gain over the GeForce GTX 1080. On 1440p, the performance gain is 21.85%. At 4K, the performance gain is 27.68%. GTX 1080Ti has taken a lead over the RTX 2080 in this game on all resolutions.
Far Cry 5
On 1080P, there is a 26.05% performance gain over the GeForce GTX 1080. On 1440p, the performance gain is 33.33%. At 4K, the performance gain is 34.88%. RTX 2080 has a marginal lead over the GTX 1080Ti by 3 FPS at 1080P and 1440P. Both were tied at 4k resolution.
Middle Earth: Shadow of War
On 1080P, there is a 36.89% performance gain over the GeForce GTX 1080. On 1440p, the performance gain is 40.84%. At 4K, the performance gain is 47.50%. RTX 2080 has a marginal lead over the GTXS 1080Ti by 4 FPS.
Assassin’s Creed Origin
On 1080P, there is a 20.45% performance gain over the GeForce GTX 1080. On 1440p, the performance gain is 31.34%. At 4K, the performance gain is 38.46%. RTX 2080 has a marginal lead over the GTX 1080Ti.
Rise of the Tomb Raider DX11
On 1080P, there is a 22.77% performance gain over the GeForce GTX 1080. On 1440p, the performance gain is 32.23%. At 4K, the performance gain is 32.88%. At 1080P the RTX 2080 has a marginal lead over the GTX 1080Ti whereas, on higher resolutions, GTX 1080Ti has a marginal lead over the RTX 2080.
Rise of the Tomb Raider DX12
On 1080P, there is an 18.19% performance gain over the GeForce GTX 1080. On 1440p, the performance gain is 27.29%. At 4K, the performance gain is 51.72%. At 1080P the GTX 1080Ti has almost 5 FPS lead over the RTX 2080. This lead maintained at higher resolutions though it was marginal.
Shadow of the Tomb Raider DX11
On 1080P, there is a 20.22% performance gain over the GeForce GTX 1080. On 1440p, the performance gain is 29.68%. At 4K, the performance gain is 38.23%. Once again there was a close quarter between the RTX 2080 and GTX 1080Ti and both were neck-to-neck.
Shadow of the Tomb Raider DX12
On 1080P, there is a 32.29% performance gain over the GeForce GTX 1080. On 1440p, the performance gain is 37.50%. At 4K, the performance gain is 39.39%. Once again there was a close quarter between the RTX 2080 and GTX 1080Ti and both were neck-to-neck.
The Witcher 3
On 1080P, there is a 26.08% performance gain over the GeForce GTX 1080. On 1440p, the performance gain is 39.64%. At 4K, the performance gain is 54.82%. Again we are seeing marginal performance gain between the RTX 2080 and GTX 1080Ti with RTX 2080 taking a lead on higher resolutions.
Ashes of the Singularity – Escalation DX11
On 1080P, there is a 20.17% performance gain over the GeForce GTX 1080. On 1440p, the performance gain is 23.02%. At 4K, the performance gain is 27.51%. Again the RTX 2080 and GTX 1080Ti were neck-to-neck.
Ashes of the Singularity – Escalation DX12
On 1080P, there is a 34.78% performance gain over the GeForce GTX 1080. On 1440p, the performance gain is 23.80%. At 4K, the performance gain is 30.03%. This has shown better improvement in the DX12. Here, the GTX 1080Ti has taken a marginal lead over the RTX 2080.
The Asus ROG Strix GeForce RTX 2080 O8G is a factory overclocked graphics card. O in O8G denominates Overclocked edition. In gaming mode, we have 1515MHz base clock which is same as on the Nvidia’s reference design base clock. In OC mode the base clock is 1515MHz as well. The boost clock in Gaming mode (Default) is 1860MHz which is 60Hz above the Nvidia’s FE design. In OC mode, the boost clock is 1890 MHz which is 90MHz above the Nvidia’s FE design.
Out of the box, the graphics card was boosting to 1995MHz in the OC mode and 1980MHz in the Gaming mode many thanks to the Nvidia’s turbo boost 3.0. Overclocking the RTX card seems challenging. I started with the memory overclock first without disturbing the core clock. The voltage was set to the 100% in the GPU Tweak II and Power limit was increased to 125% with 88°C temperature limit.
I managed to get +76MHz on the core clock and +275MHz on the memory clock. Please, keep in mind that this was done with the fans on Auto settings. With overclocking the maximum boost was 2040MHz though the clocks were never settled as there was continuous fluctuation with 2010MHz being the lowest. It all depends upon the cooling solution which is quite adequate on this graphics card. Despite overclocking, the results are marginal. Here is the result of the synthetic benchmark with overclocking:
This graphics card was continuously hitting the power limit (not the thermal limit). The total power limit is 125% with 88°C thermal limit. By default power limit is set at 100% and 83°C. I observed the boost clock throttling down to 1875MHz when the graphics card was hitting power limit. Mind you the temperature was 68°C during this time hence thermals have nothing to do with it. When the power limit was increased to 125% the frequency of the graphics card to hit power limit reduces significantly and maximum drop during the recurrence was 1915MHz. I would suggest the gamers/users set the power limit to 125% all the times regardless of the overclocking.
The graphics card was tested with Furmark run of 10 minutes each at native resolution with 8x MSAA in full screen. For ease of reference, the ambient temperatures are also mentioned. Thermal testing was done with P-Mode and Q-Mode. After each testing minimum of 30 minutes idling was ensured. 79°C was hit under the stress test in Q-Mode. Keep in mind the ambient temperature. 67°C was hit in P-Mode. Using Q- Mode will have an impact on the boost clocks as dynamic nature of turbo boost 3.0 it will clock down as the temperature goes higher. This is how it was with the Pascal as well.
I have tested the graphics card in P and Q modes to check for performance loss if any. For this purpose, Battlefield 1 was used using Ultra settings in DX11 at 4K. Here are the results:
|Mode||Minimum Boost Clock||Maximum Temperature||FPS|
Effective from this content, I will be using HWInfo 64 to record the power consumption of the graphics card. It seems like the HWinfo 64 is measuring the total power draw of the graphics card, not just the GPU. The below graph is showing the power draw of the graphics card only and not of the PC. The power draw of Nvidia GeForce GTX 1080 FE seems sketchy as the chip itself has TDP of 180W. To measure the power draw on idle, all the background apps were closed and the system was left on idling for 30 minutes. Battlefield 1 in DX11 at 4K using Ultra settings was used to measure the in-game power draw of the graphics card.
As the summer season is still here, there is environmental noise that is beyond my control. These sounds will easily invalidate the sound meter testing. The card was tested on an open-air test bench and I am sitting close to my test bench setup. Using my judgment, the graphics card was silent under Q-Mode which is damn impressive but it came at the cost of the 79°C max temperature. The P-Mode is still not that much audible and with room’s fan powered off and fans on the AIO set at 40% of their speed, the whole room was almost silent and I had to get much closer to the graphics card to hear the fans under the stress test. Asus has definitely done a great job in this department.
The Asus ROG Strix GeForce RTX 2080 O8G is a first RTX card on my test bench. This card is based on Turing TU104 GPU. The dimension of the graphics card is 11.8×5.13×2.13 inches or 29.97×13.04×5.41 CM. The card is following the PCIe 3.0 bus interface. It packs 8GB GDDR6 memory chips from Micron rated at 1750MHz using 256-bit bus width at 448 GB/s bandwidth. The base clock of the card is 1515MHz in all the modes. The default mode is Gaming Mode with 1860MHz boost clock and 1890MHz boost clock under OC Mode. Please, note that you will need to install GPU Tweak II to access these modes. BIOS switch has nothing to do with these modes. Interesting enough this card has 2944 CUDA Cores whereas fully enabled TU104 chip has 3072 CUDA cores. Maximum supported digital resolution is 7680×4320. The card is drawing power using two 8-pin connectors. This card packs 64 ROP units and 184 TMUs. The pixel fillrate is 98.9 GP/s and Texture fillrate is 284.3 GT/s. Texture fillrate is low as well compared to Nvidia’s stated minimum of 314.6 GT/s. This card carries all the bells and whistles of the Turing TU104 GPU including RT Core, Tensor Cores, USB Type-C, VirtualLink, NVLink, New decoder/encoder, DLSS with ray tracing sitting at the core of Turing. Unfortunately, we have yet to test the true performance potential of these cards due to the lack of enabled games and API. It is clear that I will be visiting this content again as soon as we get to have RT based synthetic benchmarks apps and games at our disposal.
The Asus Strix GeForce RTX 2080 O8G has retained the basic concept design from Pascal and has brought further improvement. This card features dual BIOS which can be toggled using a switch located on the top side of the PCB. These are designated as P-Mode and Q-Mode. P-Mode focuses on the strong cooling for better performance that may come at more sound level than Q-Mode which aims at bringing the silent operations at the user disposal. However, this is done at the cost of high thermals. Once the Windows is loaded, switching the BIOS will not take effect until the system is restarted. Dual BIOS always come handy when flashing a corrupt BIOS as we have a nice backup. There is an LED power on/off button located on the backside of the card. Now, the user has the option to turn the lighting off for a pure stealth look. Asus has introduced new Axial-Tech fans in this graphics card which are delivering up to 27% increase airflow and 40% increased static pressure. This is a much-needed requirement as the overall thickness of the heatsink has been increased by 20% hence powerful fans with more static pressure and airflow. The increase in surface area of the heatsink has made the overall design of the card to be 2.7 slots. Keep that in mind for clearance issues with respect to the chassis. The length of this card is 11.8” which is another important factor for clearance. The heatsink has 5 nickel-plated copper heat pipes which seem to be 8mm thick. The middle portion of the heatsink is bit recessed. There are two nickel plated copper plates on this heatsink. One is making contact with the GPU and the other is making contact with the MOSFET/VRMs. This card is using MaxContact technology that utilizes precision machining to create a heat spreader surface that makes up to 2X more contact with the GPU for better heat transfer. The backplate is of the same design as we saw on the previous generation Strix cards. This card is using metal brace as an added strength measure to bring reinforcement to the structure that prevents excessive torsion and lateral PCB bending. This card has two 4-pin fan headers for controlling the chassis fans according to the graphics card’s thermals. There is one 12V GRB pin format AURA header as well. If using this graphics card on Asus AURA Sync enabled motherboard then this would be an added AURA header should one need.
So, the big question. What is the performance like on the RTX 2080? The graphs are self-explanatory when it comes to the Asus Strix GeForce RTX 2080 O8G competing against the Nvidia GeForce GTX 1080 FE. The performance gain range is 20% to 50% depending upon the game. Overall, it seems like there is an average gain of like 30% over the GTX 1080. However, the real performance is still to be tested using the new technology and features which will be done when enabled games will be available and let’s hope that it would improve the performance to the level of justifying these prices. Nvidia has not only brought further graphics processing improvement in Rasterization but they have implemented dedicated hardware for Ray Tracing and Artificial Intelligence (Tensor Cores). DLSS is expected to bring better graphical processing with more efficiency and ray tracing which is a computationally intensive task, has been implemented as a dedicated hardware which would relax the main GPU to focus on more traditional processing. Further, they are implemented in a hybrid approach where the processing will be shared depending upon the load. The current performance gain is coming from Rasterization only. We are expecting better performance gain and efficiency once new technology will be put to full potential. As of now, it is on the paper only. I am looking forward to the time where we will be testing these cards to determine the true performance potential of the Turing based graphics cards.
The GeForce GTX 1080Ti has given tough time to the GeForce RTX 2080 and both cards were found neck-to-neck in performance testing. There is not any significant lead by either card except in a few titles. This card will retail at Rs.140000/- when it will be launched in the local market. This is too high a price tag to consider particularly when the last gen’s high-end card is offering almost the same performance level. In my personal opinion, I would wait till the Turing cards get to be tested using the new technology and see what performance level they actually bring on the table. Asus is offering limited 3 years limited warranty on the Strix GeForce RTX 2080 O8G which is nice. I am thankful to the Asus Pakistan for giving me the opportunity to review their Asus ROG Strix GeForce RTX 2080 O8G.