Its no secret that Nvidia chose to dumb down their launch lineup for Kepler. After all, once the company realised that GK104 would be competitive with AMD’s GCN at the highest level in the form of the Pitcairn-based HD7970, they chose to launch the GTX680 with the GK104 core and coined the money it was saving by using a GPU with a smaller die size. We all know how that went – even though the HD7970 was king of the hill for a good few months and extended its reign even after the GTX680 launch, it was short-lived and the high-end market remains hotly contested. Because Nvidia normally launched any new GPU family with a large-size die around 500mm² in the past, the GTX680 was a bit perplexing. They fixed that last week though, with the Nvidia Titan.

titan2-620

Two things need to be mentioned here: Titan is actually both the card’s codename and its retail designation. It doesn’t belong to the Geforce 600 or the 700 family, its firmly out of scope for both family lines. What Titan is, is a rebranded Tesla K20 GPU, the same stuff used in the Titan supercomputer built by Cray at the Oak Ridge National Laboratoy, which is the inspiration for GK110’s name. Had Nvidia not gone the power-conscious route and subsequently playing to the efficiency tune, this would have been its launch product.

titan_open_angle-view

The second is that Titan is very much a niche product. Its priced the same as the $1000 GTX690 and fits in a similar power envelope. It won’t be price-competitive with the HD7970 and two GTX680s will easily outrun it for gaming capability it while saving you at least R2000 on the purchase price. You won’t be walking into a retail store and buying one, you actually have to order it from online retailers because this is pretty much a paper launch. A week or five from now you might see a trickle of these monsters in the country, but they’re going to be snapped up by people who desire the exclusivity of Titan, as well as the unlocked compute potential. Ever since the GTX480, Nvidia has been looking for ways to stifle the compute performance of its Geforce-class cards to avoid people choosing the cheaper lineup over its Quadro and Tesla options. My guess is that with the Cray order done, Nvidia has decided to start selling GK110 to the public to gain public favour as well as sell their existing stock of chips. With that aside, lets take a closer look at it.

titan bare pcb

The first thing you’ll notice is the SLI fingers. Yes, Titan is capable of doing quad-SLI and will probably smoke any other existing solution in terms of raw computational power in quad-SLI. This is part of the Titan’s allure – you typically can’t buy a Tesla card if you’re Average Joe, as you’d need to be part of a company and the waiting time to get your order done and dusted could take weeks. This is a quicker way to market for Nvidia while it waits for more customers for the K20 Tesla. Going a little left, you’ll see the GK110 chip in all its glory surrounded by twelve 2Gb 6GHz DDR5 memory chips from Samsung. Moving a little left again, Titan uses a 6+2 power phase setup to regulate the mammoth, but its not an overboard design. Titan may be capable of overclocking, but that isn’t Nvidia’s goal with the product. Core clocks are 836MHz, boosting up to 876MHz when there’s enough available power and the GPU isn’t already at 100% utilisation.

titan bare backFlipping the card over, another twelve 2Gb chips are found around the CPU area, making up Titan’s total of 6GB VRAM. A rather interesting addition is the extra space available for another 8-pin PEG power connector. Titan has a shipping TDP of 250W but with the 75W power draw from the board, as well as the 75W and 150W from the 6 and 8-pin connectors, there’s 300W of available power to draw from. Its a rather stark contrast to the GTX680, which in most cases only has two 6-pin PEG connectors for a total power draw of 225W, but mostly sits at its TDP, when boosted, of just 190W. Titan is much, much bigger. It also ships with a larger 384-bit bus, giving it bandwidth weighing in at 288GB/s. For comparison, most GTX680 cards you’ll find out there are capable of around 192GB/s bandwidth.

gk110 block diagram

The block diagram for GK110 is massive. The chip measures 551mm² and consists of 2880 CUDA cores. Those are broken up into five graphics processing clusters, each of which houses 576 CUDA cores. Those are in turn grouped into sets of 192 cores, mimicking the design laid out in the GTX680, which I detailed in my Analysis articles on the card last year. For comparison, the GTX680 has 1536 CUDA cores. It packs those into four GPCs housing 384 cores each, linked up to a quad-channel memory controller (Titan has six channels). In pure performance terms, Titan has around 50% more processing power, anti-aliasing capabilities and memory bandwidth. The only drawback to such a large design is that yields won’t always be perfect, hence the disabled shader cluster highlighted in red. But there’s another, much larger difference Titan boasts as well.

gk110-gpc-block

As mentioned before, Titan’s market is rather niche, but the cards will be snapped up by people who can’t afford Tesla cards and still want the ability to play the odd game when they’re not number crunching, which you can’t easily do on a Quadro. Titan fills in that lineup by adding 32 double-precision ALUs (the orange ones) to each shader cluster, giving Titan the ability to serve as a compute cruncher for some very business-critical work. Double-precision is the entire sales mantra of the Quadro lineup and it’s the reason why Nvidia has sought to throttle its Geforce lineup so that it has just enough compute performance to be reasonable, but not competitive with Quadro.

CaptureTo enable double-precision mode, you need to switch in on in the drivers and then the card underclocks itself by about 15MHz. On a related note, overclocking is a little different here. Part of it is welcome, the other part is pretty much fluff you could achieve years ago, but now its built into the Geforce driver. With the GTX680, Nvidia limited overclocking according to specific boost limits. In the case of the GTX680, you could boost up clock rates so long as the card was under its set power draw. That limit was a maximum of 190W, even though the card was capable of 225W with two 6-pin PEG connectors and when shipped at stock speeds hovered around 170W. In the drivers, however, you could set the maximum power draw level to 138%, effectively giving it a 235W maximum power draw limit. With Titan, its not so flexible, sitting at just 106%, giving you only 6% leeway in boosted speeds. But there’s more (less, actually).

The graph on the right was composed by Anandtech and this is how Nvidia has approached overvolting. Yes, you read that right, overvolting is coming back, but this time the driver ties it with clock speed. As the voltage is adjusted, the default clocks go up until you reach 1.16v and around 990MHz. At this level, Nvidia recommends it unsafe to cross the GHz barrier, as they warn that the consequence of using the 1.2v bin reduces the life of the card. Using the 990MHz bin improves performance by about 20%.

However, the 1.2v limit is where the fun ends. You can’t go higher than that and you’d have to hack the card’s firmware to go any higher. Partners like ASUS, Gigabyte and MSI won’t be able to release versions that clock higher than that either – there will be no custom Titan designs, no extra VRM phases and no way to adjust the clocks on their own. You can raise the boost profile and/or the default voltage, that’s it. You can buy cards with better coolers, but you’ll still be subjected to the same speed wall. On the plus side, the card will no longer disable boosted states based on power consumption and proximity to the maximum TDP, it now regulates it according to temperature.

display_oc_2

And another weird improvement, but one that you could do manually for yourself, is what Nvidia calls Display Overclocking. Essentially, the company thinks that those 60Hz refresh rates could do with a little change, so it allows you to force the monitor into running at a higher-than-supported refresh rate. Most monitors can only go up to 66Hz, which isn’t much of an improvement. Some go to 75Hz, some of those Korean 30″ panels can go way beyond 120Hz. Its all much of a muchness, but giving gamers higher V-Sync levels might be a welcome benefit, although I have no idea if I’d buy into it. My monitor’s only capable of 60Hz.

Moving onto in-game performance, most websites still use the average FPS metric to measure and record performance. Because that doesn’t detail what actually happens inside that second, sites like TechReport and PC Perspective are pushing others and both AMD and Nvidia to start validating GPU performance by improving frame delivery times and not just FPS averages. What happens inside the second can vary drastically, delivering microstutter with multi-GPU solutions and, occasionally, framerate drops to below 25fps, even if your averages are quite high. I’m using some data from both sites as well as Tom’s Hardware, because the rest of the tech world stuck on FPS metricts doesn’t quite do it right. Because you would never be using a Titan on a 1080p monitor, I’ll only concentrate on 2560 x 1440 and 5760 x 1080/1200 results, which are indicative of the native resolution of a single 30″ monitor and three 1080p monitors respectively.

border-2560-average border-2560-frot border-5760-average border-5760-frot

Getting a comprehensive overview of the performance you can expect in the games you’re looking for requires a little hopping around. Lets start with Borderlands 2. Chances are, you’ve played this game or are planning to own it and the results are quite something. Titan keeps up pretty well with the GTX690 and is always decisively ahead of the GTX680 by around 20-30 frames per second. Frame rates are always above 60 and there’s quite a gap between Titan and the GTX680. Not enough for some people to justify the price premium, mind you, but for some its worth it. However, though Tom’s does now show framerates over time, they don’t show how far or high the drops really go. They’re still showing a framerate average, but TechReport shows you what they’re missing. We’ll take a gander at the comparison at 2560 x 1600 for a GTX690 and the Titan.

bl2 frame times

As TR notes, there’s a lot of places where Titan is faster and at the same time slower than a GTX690. There are periods where the graph looks more like a small cloud and that’s mictostutter – something that a single-GPU setup shouldn’t be seeing. Borderlands 2 is optimised for Nvidia graphics in any case, but this is the first sign that the Titan isn’t going to dethrone the dual-GPU king.

fc3-2560-average fc3-2560-frot fc3-5760-average fc3-5760-frot

Far Cry 3 seems to go down swimmingly even with the Ultra Quality presets, although Titan isn’t enough to provide frame rates above 40. Still, it has a decent lead over the GTX680, although it’s not double-your-performance lead, like the GTX690 is. Chances are that the 384-bit bus isn’t enough to keep a game like Far Cry 3 happily fed, whereas the GTX690, with an enourmous 512-bit bus, plods on like nothing happened. That is, until you hit the triple-monitor resolution. None of the solutions are playable, let alone something as powerful as the GTX690. Your only option at that resolution would be to play at High settings and lower some other performance sappers. But once again, there’s more to these graphs than meets the eye.

fc3 frame times

The frame times the GTX690 produces for TechReport at 2560 x 1600 show that overall, the experience on the Titan would be better, owing to the fact that it has much less variance, even though the GTX690 pushes out more and higher frame rates. There’s only a few hiccups beyond a 40ms frame time, whereas the ones on the GTX690 are show-stopping on two occasions. This is probably also a VRAM issue, although I’m betting it’s not being flushed quick enough to render the next scene which is about where you get that huge spike. Overall, the Titan keeps up well with its competition. But what happens when you get to possibly the most demanding game of all time?

crysis3-25x14-avgfps crysis3-25x14-frametimes

PC Perspective is the only site that I’ve come across that included Crysis 3 in their testing and with frame times to boot. Safe to say, though, that you won’t be enabling the “Very High” preset in a triple-monitor setup unless you were running two Titans. Even at those detail levels and 2560 x 1440, the game crushes both single and dual-GPU cards. You’d probably have to use two GTX690s just to break the 60fps barrier, that’s how intensive/unoptimised this game really is (and I have first -hand experience at that, my Radeon HD6870 needs to be on low settings for the game to run well). But as I’ve said before, the allure isn’t only in games and there’s a very good reason why Nvidia wants prospective buyers to look at Titan rather than the GTX690. Most sites, in fact delivered their verdict based purely on game performance, while not looking at what other benefits GK110 brings to buyers.

anand civ v compute anand brute password anand compute bench

Anandtech is one of the few places that does benchmark the compute and GPGPU performance of cards like the Titan and it’s performance is pretty staggering. First off, a nod to the fact that most cards that have Direct Compute capabilities perform pretty similarly in Civilisation V. The benchmark Anandtech runs takes a number of buildings in the game and uses GPGPU acceleration to decompress textures using something similar to OpenCL and then applies it to the buildings, which are tesselated. The results are returned showing how many frames it took until the process and the drawing was complete – all cards here are now CPU-limited and things like driver optimisations and improvements to the game engine have allowed the GTX690 and GTX680 to slightly exceed the HD7970’s score, which has been the same for over a year.

Moving to a GPGPU-accelerated password cracker, however, shows just how fast the Titan is. Its more than three times as fast as the GTX680 and even outpaces the GTX690, probably owing to the extra ALUs capable of double-precision math. These are probably put to use as well, giving the Titan a huge advantage. The last bench they perform, System Compute, shows that the Titan falls behind both a HD7990 and the HD7970. More than likely, the program doesn’t take multiple GPUs into account and is affected by clock speed as much as the amount of resources available. If Anandtech were to overclock Titan to 1GHz, it would probably command a convincing lead. Additionally, this is a DirectCompute test that doesn’t use Nvidia’s CUDA software, hence the reason why it could be slower. In the graphs below showing power usage, this was on Tom’s testing machine, the setup of which can be found on the first page in their performance review. Compared to the GTX690, its pretty economical.

idle-power idle-tempload-tempload-power

In the end, the numbers really don’t speak for themselves because too many people will look at this and think its merely a hot gaming graphics card. Its not. Its a hand-me-down from the Tesla market, a card that doesn’t make waves there as much as it could do for consumers. Nvidia appears to have realised that all those people who went out and bought the GTX480 for its Compute and CUDA capabilities really just wanted the card for those things. Nvidia’s Quadro lineup is regularly dumbed down and often you’ll see that it’s a bit overpriced. That turns people off and they turn to solutions which don’t hinder their performance for either market – and that’s AMD. The GCN cores inside the latest HD7000 family really haven’t even been stressed that much as there’s a significant amount of software that benefits Nvidia, but not AMD – chiefly, those video transcoders that use CUDA. To my knowledge, there’s tons of them out there. How many use ATi Stream, or even DXVA? Go Google it, I only found one.

With the rise of OpenCL (the slow, slow rise) Nvidia may realise that CUDA can’t hold onto the GPGPU market for that long. Already Adobe’s products are supporting OpenCL and even Intel, who has sacrificed a lot of development opportunities on QuickSync to be able to concentrate on power efficiency, Haswell and Broadwell, is gaining some ground. As much of a success story as CUDA is, it’s a closed-off standard that can’t be run on any other GPU architecture, just like Physx.

Closed standards are not the way forward.

Titan is therefore sort of a “Hero” product for Nvidia. Its as fast as the dual-GPU GTX690 but isn’t exclusively targeted to gamers. Its more than double the price of the GTX680, but it consequently outpaces it as well, if not by the same extremes to which it’s priced. For those of you who are purchasing a GTX690, though, take stock of what kind of applications you’re running and whether or not Titan would be a better fit. Its lower power consumption is certainly a welcome bonus, as the GTX690 requires two 8-pin PEG connectors and won’t work in some chassis where space is at a premium and you can’t afford extra heat to remain in there. For small builds into chassis like the Bitfenix Prodigy, or Cooler Master’s Elite 120, the Titan is a better fit for the same price.

Compared to AMD’s Radeon lineup, its far away from what the HD7970 GHz Edition can offer to gamers, but as previously mentioned, gamers aren’t the only target market here – its made for professionals who like to play games as well. BLOPs 2 while you’re on a lunch break waiting for your render to complete itself? With a card that’s architecturally the same as the Tesla K20X, but costs eight times less? I’d say that would be a tempting offer.

Sources: Anandtech, PC Perspective, Tom’s Hardware, TechReport

Discuss this in the forums: Linky