After the Pascal P100 focused on the professional sector, now meet the Pascal GP100, the high-end silicon that we can find in the gaming segment. The full version of will offer 3840 CUDA Cores with 240 TMUs and will be accompanied by up to 16 GB of HBM2 memory with eight memory controllers of 512 bits, resulting in a memory interface of 4096 bits would give a bandwidth of 720 GB / s. This silicon will also be manufactured at a 16nm FinFET lithography, which is why we can expect a dramatic improvement in performance further decreasing energy consumption.
Table of Contents
NVIDIA Announces Pascal GP100 – 3840 CUDA Cores and 16Gb HBM2 Memory
Pascal GP100 has insane clock speeds, 1328 MHz base and 1480 MHz in Turbo mode with a TDP now set in the 300W, while AIB partners versions will cross the barrier of 1500 MHz. Unfortunately, it will be released in the Q1 of 2017.
The Pascal GP100 Architecture: Faster in Every Way
With every new GPU architecture, NVIDIA introduces major improvements to performance and power efficiency. The heart of the computation in Tesla GPUs is the SM, or streaming multiprocessor. The streaming multiprocessor creates, manages, schedules and executes instructions from many threads in parallel.
Like previous Tesla GPUs, GP100 is composed of an array of Graphics Processing Clusters (GPCs), Streaming Multiprocessors (SMs), and memory controllers. GP100 achieves its colossal throughput by providing six GPCs, up to 60 SMs, and eight 512-bit memory controllers (4096 bits total). The Pascal architecture’s computational prowess is more than just brute force: it increases performance not only by adding more SMs than previous GPUs, but by making each SM more efficient. Each SM has 64 CUDA cores and four texture units, for a total of 3840 CUDA cores and 240 texture units.
Delivering higher performance and improving energy efficiency are two key goals for new GPU architectures. A number of changes to the SM in the Maxwell architecture improved its efficiency compared to Kepler. Pascal builds on this and incorporates additional improvements that increase performance per watt even further over Maxwell. While TSMC’s 16nm Fin-FET manufacturing process plays an important role, many GPU architectural modifications were also implemented to further reduce power consumption while maintaining high performance.
The following table provides a high-level comparison of Tesla P100 specifications compared to previous-generation Tesla GPU accelerators.
Tesla Products | Tesla K40 | Tesla M40 | Tesla P100 |
GPU | GK110 (Kepler) | GM200 (Maxwell) | GP100 (Pascal) |
SMs | fifteen | 24 | 56 |
TPCs | fifteen | 24 | 28 |
FP32 CUDA Cores / SM | 192 | 128 | 64 |
FP32 CUDA Cores / GPU | 2880 | 3072 | 3584 |
FP64 CUDA Cores / SM | 64 | 4 | 32 |
FP64 CUDA Cores / GPU | 960 | 96 | 1792 |
Base Clock | 745 MHz | 948 MHz | 1328 MHz |
GPU Boost Clock | 810/875 MHz | 1114 MHz | 1480 MHz |
FP64 GFLOPs | 1680 | 213 | 5304 |
texture Units | 240 | 192 | 224 |
Memory Interface | 384-bit GDDR5 | 384-bit GDDR5 | 4096-bit HBM2 |
Memory Size | Up to 12 GB | Up to 24GB | 16 GB |
L2 Cache Size | 1536 KB | 3072 KB | 4096 KB |
Register File Size / SM | 256 KB | 256 KB | 256 KB |
Register File Size / GPU | 3840 KB | 6144 KB | 14336 KB |
TDP | 235 Watts | 250 Watts | 300 Watts |
Transistors | 7.1 billion | 8 billion | 15.3 billion |
GPU Die Size | 551 mm² | 601 mm² | 610 mm² |
Manufacturing Process | 28nm | 28nm | 16nm |