You remember this thing, right? The GP100 GPU, the beast that powers NVIDIA’s Pascal-based P100 accelerator. Some might call the GP100 GPU the actual Pascal chip, others might call it a glimpse into Volta, the true successor to Maxwell. Whatever the case may be, NVIDIA currently ships the world’s fastest graphics card for general compute purposes, although it only is available now inside the Tesla DGX-1, a server built and shipped by NVIDIA direct. NVIDIA promised that the P100 Accelerator would ship in PCI-Express form sometime this year, and they appear to be on track to do that. The company just announced the Tesla P100, although there have been a number of changes to it that make it less of a compelling product for HPC applications.
To start, the P100 Accelerator had 16GB of HBM v2 memory, while the Tesla P100 will ship with only 12GB of HBMv2 memory. That’s not to say that they’ve disabled an entire stack, as PC Perspective says could be possible, but rather NVIDIA is simply taking off 1GB of DRAM from each stack to arrive at the final amount. They’re doing this through either disabling access to that memory, or lasering off the section of the memory controller that connects to it. Bandwidth drops to 540GB/s, which is still a lot more than most GPUs available for HPC compute applications at this time.
NVIDIA’s NVLINK, a replacement for the PCI-Express bus that runs alongside it to create a ring network for the GPUs to communicate over, is also not a feature of the Tesla P100. This feature has been disabled because only NVIDIA’s DGX-1 server has it built-in, and the company wants their board partners to eventually look into making similar products one day. NVLINK is a bit like AMD’s XDMA technology in a sense, but you can also consider it a hardware replacement for the SLI bridge, which NVIDIA uses for inter-GPU communication separately from the PCI-Express bus on traditional motherboards.
Raw performance also takes a bit of a hit. NVIDIA has dropped the clock speed and TDP for the Tesla P100 down to 250W and 1.3GHz base clock, which saves them a little over 50W in power draw and heat output. This drops overall output significantly. From the P100 Accelerator’s 10.6TFLOPS of single-precision compute, and 5.3TFLOPS of double precision compute performance, we now have 9.3TFLOPS and 4.7TFLOPS respectively. That still puts it well ahead of any other Tesla product in the same form factor or at a similar price point, but it’s interesting how the company chose to make these changes instead of attempt to simply swap the P100 Accelerator’s circuit board for one in the PCI-Express form factor. Either there are power requirements which limit their ability to do so, or NVIDIA wants to sell these to customers who weren’t happy with the performance of Tesla GPUs based on the Maxwell architecture. Whichever it was, they’re taking home the bacon with this card.
If you’re one of the companies that NVIDIA sells parts like this to that wants it for your compute server farms, you’d better get on the waiting list for one. NVIDIA expects to ship the Tesla P100 in Q4 2016 or 1Q 2017 at the latest, and will probably charge something approaching $10,000 per card.