Right up until last night, no-one knew of Nvidia’s plans for the future or their rollout of next-generation products. Apart from some advancements on the Tegra side of things and the tiny Maxwell launch with the Geforce GTX750 and 750 Ti, along with discussions on DirectX 12 support and their commitment to OpenGL and Linux gaming, the company hasn’t had much to say this year – all of this, aside from the Maxwell launch, was largely unsurprising. But the opening discussion by Jen-Hsun Huang at the 2014 annual GPU Technology Conference revealed an incredibly far-reaching plan and the realisation that Nvidia is far from running out of ideas.
Aside from new roadmaps and product ideas and technologies, the company also hinted at a lot of changes in direction in between the lines and I’d like to take you through the announcements as well as discuss the implications of some of them. Follow me after the jump!
Nvidia Geforce NVLink
Right at the start of the two-hour spectacle, Jen-Hsun started talking about the bottlenecks that CPU and GPU manaufacturers were facing today with the PCI-Express standard. While PCI-E is still the preferred method of communicating with graphics cards and other add-in boards, the current 3.0 standard tops out at 985MB/s per lane, with 16 lanes adding up to roughly 16GB/s transfer speeds.
Although PCI-Express 4.0 ramps that up to almost 32GB/s in total, it’s still a long way away, too long for Nvidia to unlock extra memory bandwidth and aim for the performance crown in the 4K era. With the need for SLI to be replaced as well, Nvidia had to create something very similar to AMD’s XDMA engine and they call their solution NVLink.
NVLink is a communication bus that sits between GPUs and allows for much higher transfer speeds, the ability to program to each GPU separately and speeding up transfers in the order of 5-12 times the current limit for PCI-E 3.0 as more GPUs are added into the system. NVLink is most likely going to be a modified PLX chip that will sit on a custom PCB that joins multiple GPU daughterboards will only be compatible with certain systems.
So right off the bat, you can start weeping. NVLink won’t be used for gaming.
It also allows for Nvidia to match AMD in two crucial areas – unified memory pooling and cache coherency. See, AMD’s Hawaii and Bonaire families already support XDMA which means that not only does Crossfire run on the PCI-Express, but the XDMA engine can also address GPUs separately or as a whole and can manage memory pools separately or as one giant pool of togetherness.
Both NVLink and XDMA solve the issues that plague multi-GPU configurations today – that VRAM needs to be mirrored on both GPUs and that the SLI/Crossfire bridge is needed for GPUs to communicate and divide up work given to them from the CPU. The only thing holding AMD back at this point is drivers, whilst Nvidia is still about a year or two away from a commercial implementation of NVLink.
The NVLink, IBM and ARM relationship
Off-the-cuff in his keynote, Jen-Hsun also mentioned that NVLink would also have compatible NVLink CPUs that would allow for cache coherency with the CPU and GPUs to form a unified compute platform in a very similar manner to AMD’s solution, which also allows for cache coherency with its APUs (in fact, AMD now calls separate processors and shader modules “Compute Units”).
But that begs the question – which CPUs will be NVLink compatible? It turns out that IBM’s PowerPC processors will be compatible with the NVLink technology, which is expected considering that IBM partnered with Nvidia to develop the technology.
The NVLink module will be a long board that plugs into a PCI-Express slot supported by a PLX switch and supports multiple daughterboards, so already Nvidia is planning to use this for GPU compute purposes and won’t sell this to small companies – they’re targeting corporations and governments here. But that’s not all.
Given the in-house nature of NVLink and the fact that Nvidia is aiming to keep the GPU compute crown at all costs, we’ll also more than likely see a solution with a Tegra processor that will handle all the CPU duties. This makes a lot of sense because not only can Nvidia then sell a complete, supported solution to their smaller customers, they can also customise Tegra extensively to suit a particular purpose.
Pascal, the next-generation GPU architecture
Nvidia glossed right over Maxwell and its imminent launch and jumped straight into Pascal, their next architecture. Its named after Blaise Pascal (1623-1662), a French mathematician who built the first mechanical calculator, clarified the concepts of pressure and vacuum, helped create probability theory and had a significant contribution in the study of the motion and behaviour of fluids.
Pascal’s work directly influences Nvidia’s Physx simulations with bodies of water and environmental simulations, so this is a particularly fitting tribute.
Pascal is the successor to Maxwell and is scheduled for a 2016 launch. Right there, that means that not only will we see Maxwell launching this year, but there will only be a Maxwell refresh in 2015 to fill in the gaps. Nvidia was supposed to have Volta launching in 2015 but now the timelines have been extended and altered. Nvidia’s first photograph of Pascal hardware was in daughterboard form for the proposed NVLink board for IBM PowerPC systems.
Pascal will also be the first GPU in the world, if AMD doesn’t beat Nvidia to it, to have 3D memory chips stacked on top of each other to increase bandwidth and reduce the footprint on the PCB. 3D memory stacking isn’t a new idea but it is one that hasn’t really been implemented in consumer products before. In Nvidia’s case, this memory will likely be used as an additional cache similar to Intel’s L4 cache in Crystalwell, which is accompanied by Iris Pro graphics.
Jen-Hsun claims that not only will bandwidth increase exponentially, memory capacity will also increase more than two-fold and with as much as a 4x reduction in power consumption.
But Nvidia didn’t talk about consumer applications for Pascal, even though it will reach us eventually. The daughterboard shown is purely to be used for GPU compute purposes in workloads like machine learning, articifical intelligence systems, cryptography, search engines, weather prediction and much more. The chip will probably be run on a 14nm production process but look at how huge it already is in the prototype!
This is going to be a very big, very expensive product and this will be fine for the initial production run because it’ll be sold to governments and companies like Google who would be able to afford it.
The Geforce GTX Titan Z
Nvidia’s hardware reveal for the keynoard is the triple-slot, single-blower, aluminium-clad behemoth known now as the Geforce GTX Titan Z. Its composed of two GTX Titan Black GPUs which are currently the world’s fastest single GPUs and also available with the fully functional double-precision FP64 shader cores. Its a Quadro K5000 on a budget and now two of them are shoved on to one PCB.
It boasts a staggering 5760 CUDA cores, 480 Texture units, 96 ROPs and a total of 12GB of 6.0GHz GDDR5 memory, 6GB to each GPU. The card does use a single radial cooling fan, so heat is pushed to the rear and the front of the card – if you want to use one of these, you’d best have a damn good chassis to cool it. Its already in production, the card will require at least two 8-pin PCI PEG power connectors and it costs a wallet-destroying US $2999.99.
Nvidia is targeting gaming at 4K and higher resolutions as well as professional uses for running simulations and 3D design. Although admittedly this doesn’t hold that much value, considering that you could purchase three GTX Titan Black cards for the same price.
Tegra is changing more rapidly than before
Even though Tegra K1 was just announced at CES 2014 and has had very little time to actually gain traction, Nvidia is already thinking of replacing it with another one – codenamed Erista, who in Marvel lore is the unknown son of Gabriel “Wolverine” Logan (if you didn’t know by now, all Tegra devices from version 3 onwards have had codenames derived from comic book super-heroes). Nvidia doesn’t seem to be very picky about whether they support Marvel or DC Comics, but in any case Erista is going to change things quite a bit, shoving in a little bit of Maxwell love into the GPU’s core.
This means that not only will we see a huge jump in efficiency for Erista, but we might also see a doubling of power as well as Nvidia scales it up to support more than just mobile gaming devices and tablets. Erista could find its way into mobile phones, kiosks and even Android laptops.
The only problem is, though, that there don’t seem to be many big game developers working on titles for the Tegra architecture, which is especially disappointing considering that the GPUs will now be similar.
So that’s probably why one of the biggest announcements that Nvidia had in store was a port of Valve’s Portal to the Nvidia Shield. Shield is still based on Tegra 4 so it’s behind the advancements in Tegra K1 and Erista, but it’s still pretty potent. Moving forward, it might be possible that Valve would port a sizeable part of their lineup of Source-based games to the Shield platform and possibly get a native Steam application going on the device as well.
It fits in so well with their plans for In-Home Streaming and Family Sharing that I would be flabbergasted if they didn’t make a customised Tegra-powered handheld console of their own in the near future.
And that’s a wrap of all the important things that you may have missed from the keynote! Nvidia’s GPU Technology Conference 2014 continues until 27 March.
Discuss this in the forums: Linky