After the Pascal P100 focused on the professional sector, now meet the Pascal GP100, the high-end silicon that we can find in the gaming segment. The full version of will offer 3840 CUDA Cores with 240 TMUs  and will be accompanied by up to 16 GB of HBM2 memory with eight memory controllers of 512 bits, resulting in a memory interface of 4096 bits would give a bandwidth of 720 GB / s. This silicon will also be manufactured at a 16nm FinFET lithography, which is why we can expect a dramatic improvement in performance further decreasing energy consumption.

NVIDIA Announces Pascal GP100 – 3840 CUDA Cores and 16Gb HBM2 Memory


Pascal GP100 has insane clock speeds, 1328 MHz base and  1480 MHz  in Turbo mode with a TDP now set in the 300W, while AIB partners versions will cross the barrier of 1500 MHz. Unfortunately, it will be released in the Q1 of 2017.

The Pascal GP100 Architecture: Faster in Every Way

With every new GPU architecture, NVIDIA introduces major improvements to performance and power efficiency. The heart of the computation in Tesla GPUs is the SM, or streaming multiprocessor. The streaming multiprocessor creates, manages, schedules and executes instructions from many threads in parallel.

Like previous Tesla GPUs, GP100 is composed of an array of Graphics Processing Clusters (GPCs), Streaming Multiprocessors (SMs), and memory controllers. GP100 achieves its colossal throughput by providing six GPCs, up to 60 SMs, and eight 512-bit memory controllers (4096 bits total). The Pascal architecture’s computational prowess is more than just brute force: it increases performance not only by adding more SMs than previous GPUs, but by making each SM more efficient. Each SM has 64 CUDA cores and four texture units, for a total of 3840 CUDA cores and 240 texture units.

Delivering higher performance and improving energy efficiency are two key goals for new GPU architectures. A number of changes to the SM in the Maxwell architecture improved its efficiency compared to Kepler. Pascal builds on this and incorporates additional improvements that increase performance per watt even further over Maxwell. While TSMC’s 16nm Fin-FET manufacturing process plays an important role, many GPU architectural modifications were also implemented to further reduce power consumption while maintaining high performance.

The following table provides a high-level comparison of Tesla P100 specifications compared to previous-generation Tesla GPU accelerators.

Tesla Products  Tesla K40Tesla M40Tesla P100
GPU  GK110 (Kepler)GM200 (Maxwell)GP100 (Pascal)
SMs  fifteen2456
TPCs  fifteen2428
FP32 CUDA Cores / SM  19212864
FP32 CUDA Cores / GPU  288030723584
FP64 CUDA Cores / SM  64432
FP64 CUDA Cores / GPU  960961792
Base Clock  745 MHz948 MHz1328 MHz
GPU Boost Clock  810/875 MHz 1114 MHz1480 MHz
FP64 GFLOPs  16802135304
texture Units  240192224
Memory Interface  384-bit GDDR5384-bit GDDR54096-bit HBM2
Memory Size  Up to 12 GBUp to 24GB16 GB
L2 Cache Size  1536 KB3072 KB4096 KB
Register File Size / SM  256 KB256 KB256 KB
Register File Size / GPU  3840 KB6144 KB14336 KB
TDP  235 Watts250 Watts300 Watts
Transistors  7.1 billion8 billion15.3 billion
GPU Die Size  551 mm²601 mm²610 mm²
Manufacturing Process  28nm28nm16nm