NVIDIA today announced its Tesla P100 with PCI-Express interface, a slightly less powerful variant based on the NVLink  interface optimized for servers. The new GPU will be available in two variants of 16 and 12 GB of HBM2 memory, so in total we have already three Tesla P100 available.

NVIDIA Tesla P100 accelerator is equipped with  flagship Pascal P100 GPU offers 3840 CUDA Cores with 240 TMUs  and comes with up to 16GB of HBM2 memory with 512 bits eight memory controllers, which translates into a memory interface 4096 bits would give a bandwidth of 720 GB / s  at best. This silicon manufactured at a lithography 16nm FinFET so we can expect a dramatic improvement in performance further reducing power consumption.
The NVIDIA Tesla P100 provides a Single Precision performance of 9.3 TFLOPs (vs 10.6TFLOPS with NVLink) and 4.7 TFLOPs (vs 5.3) double Precision performance. The model with 16GB HBM2 memory reaches a bandwidth of 720 GB / s while the 12GB reaches 540 GB / s. In both cases, both GPUs are cooled passively despite that we have a TDP of 250W. Not a word about the price of these two new models.

NVIDIA Tesla Series Specification

Tesla P100Tesla P100
Tesla P100
Tesla M40
Stream Processors3584358435843072
Core Clock1328MHz??948MHz
Boost Clock(s)1480MHz1300MHz1300MHz1114MHz
Memory Clock1.4Gbps HBM21.4Gbps HBM21.4Gbps HBM26Gbps GDDR5
Memory Bus Width4096-bit4096-bit3072-bit384-bit
Memory Bandwidth720GB/sec720GB/sec540GB/sec288GB/sec
Half Precision21.2 TFLOPS18.7 TFLOPS18.7 TFLOPS6.8 TFLOPS
Single Precision10.6 TFLOPS9.3 TFLOPS9.3 TFLOPS6.8 TFLOPS
Double Precision5.3 TFLOPS
(1/2 rate)
(1/2 rate)
(1/2 rate)
(1/32 rate)
Transistor Count15.3B15.3B15.3B8B
Form FactorMezzaninePCIePCIePCIe
Process NodeTSMC 16nm FinFETTSMC 16nm FinFETTSMC 16nm FinFETTSMC 28nm
ArchitecturePascalPascalPascalMaxwell 2