After the Pascal P100 focused on the professional sector, now meet the Pascal GP100, the high-end silicon that we can find in the gaming segment. The full version of will offer 3840 CUDA Cores with 240 TMUs  and will be accompanied by up to 16 GB of HBM2 memory with eight memory controllers of 512 bits, resulting in a memory interface of 4096 bits would give a bandwidth of 720 GB / s. This silicon will also be manufactured at a 16nm FinFET lithography, which is why we can expect a dramatic improvement in performance further decreasing energy consumption.

NVIDIA Announces Pascal GP100 – 3840 CUDA Cores and 16Gb HBM2 Memory


Pascal GP100 has insane clock speeds, 1328 MHz base and  1480 MHz  in Turbo mode with a TDP now set in the 300W, while AIB partners versions will cross the barrier of 1500 MHz. Unfortunately, it will be released in the Q1 of 2017.

The Pascal GP100 Architecture: Faster in Every Way

With every new GPU architecture, NVIDIA introduces major improvements to performance and power efficiency. The heart of the computation in Tesla GPUs is the SM, or streaming multiprocessor. The streaming multiprocessor creates, manages, schedules and executes instructions from many threads in parallel.

Like previous Tesla GPUs, GP100 is composed of an array of Graphics Processing Clusters (GPCs), Streaming Multiprocessors (SMs), and memory controllers. GP100 achieves its colossal throughput by providing six GPCs, up to 60 SMs, and eight 512-bit memory controllers (4096 bits total). The Pascal architecture’s computational prowess is more than just brute force: it increases performance not only by adding more SMs than previous GPUs, but by making each SM more efficient. Each SM has 64 CUDA cores and four texture units, for a total of 3840 CUDA cores and 240 texture units.

Delivering higher performance and improving energy efficiency are two key goals for new GPU architectures. A number of changes to the SM in the Maxwell architecture improved its efficiency compared to Kepler. Pascal builds on this and incorporates additional improvements that increase performance per watt even further over Maxwell. While TSMC’s 16nm Fin-FET manufacturing process plays an important role, many GPU architectural modifications were also implemented to further reduce power consumption while maintaining high performance.

The following table provides a high-level comparison of Tesla P100 specifications compared to previous-generation Tesla GPU accelerators.

Tesla Products   Tesla K40 Tesla M40 Tesla P100
GPU   GK110 (Kepler) GM200 (Maxwell) GP100 (Pascal)
SMs   fifteen 24 56
TPCs   fifteen 24 28
FP32 CUDA Cores / SM   192 128 64
FP32 CUDA Cores / GPU   2880 3072 3584
FP64 CUDA Cores / SM   64 4 32
FP64 CUDA Cores / GPU   960 96 1792
Base Clock   745 MHz 948 MHz 1328 MHz
GPU Boost Clock   810/875 MHz  1114 MHz 1480 MHz
FP64 GFLOPs   1680 213 5304
texture Units   240 192 224
Memory Interface   384-bit GDDR5 384-bit GDDR5 4096-bit HBM2
Memory Size   Up to 12 GB Up to 24GB 16 GB
L2 Cache Size   1536 KB 3072 KB 4096 KB
Register File Size / SM   256 KB 256 KB 256 KB
Register File Size / GPU   3840 KB 6144 KB 14336 KB
TDP   235 Watts 250 Watts 300 Watts
Transistors   7.1 billion 8 billion 15.3 billion
GPU Die Size   551 mm² 601 mm² 610 mm²
Manufacturing Process   28nm 28nm 16nm