Something that’s stuck with me these past few weeks is a line from Chris Harris’ review of the Toyota GT86 based on half an hour with it on the track. Sometimes, in whatever profession you’re in, you’re present for a significant event that changes the way you and other people look at things. And one of those events, yesterday, was the launch of the GTX660 Ti by Nvidia.
Now, depending on what your current thoughts are on the Kepler architecture, you might have a different viewpoint to me on why this is a rather clever card. Or if the price is right, or if Nvidia are playing their hand properly. I think they’re doing all three pretty well. But crucially, this card changes the way we should think about performance in future and how to achieve it.
Now, its not like we’re dealing with a new card here. The heart of the GTX660 Ti most probably started off as a GTX680. GTX680 chips are the ones that Nvidia makes by the bucketload to cover three card families – the faulty chips from the GK104 lineup (full Kepler design architecture) end up on the chopping block as GK104-A (that’s what I call it), which is included in the GTX670. That card, as you well know, is the new high-range performance king, beating down the HD7950 and even the HD7970 with a bit of an overclock, showing little in the way of a performance deficit thanks to the disabled shader module. Those chips are again binned for use in the GTX660 Ti (hereafter referred to as GK106-B), having shaved off a little bit of their L2 cache to further differentiate the part.
On the surface, you’d have a hard time telling the cards apart just by looking at the specs at a glance and you’d be forgiven for thinking its the same card. The important points are the shrunken-down memory bus feeding the DDR5 RAM and the power consumption, just 20W shy of the GTX670 and more than 60W shy of the GTX570, a much, much slower card than the one you’re looking at today. Given the differences, you’d expect the card to turn in a similar performance to the GTX670 which is more or less right. Its Nvidia’s contender for the same market space as the HD7870 and the HD7950.
WHY SHARING (YOUR DESIGNS) IS CARING
Given the economies of scale that Nvidia could reach by simply using GK104 chips for all five families (the GTX660 included) its easy to see that Kepler is finally coining it for Nvidia. Its easier for them to just bin and disable chips selectively from their bulk production line at TSMC, rather than have them create several separate designs for each family, a far more costly affair. Nvidia’s GTX580, 570 and 560 Ti 448-core variants all used the same dies, but the company made the more popular GTX560 vanilla, SE and Ti versions use a physically smaller die size, which is why the cards were expensive to make. At least with the 22nm process, Nvidia only has to die-shrink the GK106-based GT640, keeping the rest of the lineup in the mainstream channel the same size.
One problem from the get-go, though, is memory bandwidth. Looking at the specs sheet you’d expect that the card should turn in a poorer performance than the HD7950 and equal the HD7870 at the very least. In truth and as you’ll see in the benchmarks later, its a little more complicated than that. The texture fillrate allows the GTX660 Ti to turn in better performance in games that require more work on the textures and this would apply to cell-shaded ones like Borderlands and games that use some tesselation, like Civilisation V and Battlefield 3, with all texture packs and DX11 enhancements installed.
Thanks to improvements in the way that Nvidia’s cards handle texturing using a bindless texture model, its no longer an issue how many textures a game applies into a level because the card won’t run out of assigned memory for them. This means that Borderlands 2, a sequel developed with Nvidia’s dev team helping out, should once again favour cards from the green team. Games that rely on higher memory bandwidth like Crysis 2, Metro 2033 and anything on Codemaster’s game engine, should allow the card to turn in performance equal to the HD7870, its nearest-price competitor. In compute-heavy games the GTX660 Ti should be almost identical to the GTX670.
The upcoming GTX650, on the other hand, should be a little bit lower than that. On good days, it needs to go toe-to-toe with the HD7870 and mostly mingle with the HD7850. You can expect bandwidth for that card to be lowered to 96GB/s with another shader disabled, bringing the card to 96 texture units, 24 ROPs and 1152 CUDA cores, with performance to around 20% slower. For the vanilla GTX660, I reckon the card’s RAM and core clocks will be lowered and physically limited, dropping performance by 10%, keeping everything else the same. That means that it’d be possible to flash a vanilla GTX660 into a Ti version if both cards are physically the same or reference models, but whether Nvidia would do this or not is the burning question – we’re still waiting on specs for the GTX660 to surface.
STIFLING RAM PERFORMANCE TO CREATE ANOTHER CONTENDER
But there’s another catch that may hurt compute performance in future. Traditionally, using a smaller bus of 192 bits meant that RAM on the card had to be equally assigned for the chip to work properly – you can’t easily interleave RAM on a GPU by stacking more on a single controller in older architectures. The GTX570, for example uses a 320-bit bus composed of five 64-bit controllers and only comes with 1280MB of RAM. That’s because Nvidia’s early Fermi designs called for better memory alignment, with each memory controller assigned two 1Gb (128MB) chips to avoid interleaving and possibly choking available bandwidth.
That makes it a bit more expensive than modern designs because that meant using ten smaller chips for the GTX570 to reach 1280MB. On the GTX580, Nvidia used twelve chips on a 384-bit bus to total 1.5GB and this meant that to manufacture the damn things wasn’t just a logistical hassle, but a costly one as well when it came down to the final benchmarks. Once AMD’s high-end cards launched with 2GB of GDDR5 RAM and game developers started targeting cards with that much memory, the GTX580 began to lose out because it was crippled. AMD used only eight 2Gb chips in the Radeon HD6970 with a 256-bit bus, giving it the edge with the extra 512MB on tap.
Having unequal amounts of chips was something that Nvidia’s designers intentionally avoided while taping out Fermi and making it work – today its not longer an issue. This changed with the GTX550Ti because that used a 192-bit bus as well, but Nvidia’s engineers wanted to see how far they could stretch things. Using a combination of 1Gb and 2Gb (256MB) chips, they could land up at 1024MB with chips in all channels without compromising on the final amount of RAM. It did mean that one memory controller was working at full blast while the others were only at half speed but that’s a sacrifice you need to make when you’re chopping off entire shader modules. This advancement meant that Nvidia could now have unequal amounts of RAM on separate channels, allowing them a little more flexibility in future. Interleaving the RAM without serious performance deficits was next on their list.
For reference, the GTX680 uses a quad-channel memory controller attached to all eight shader modules to saturate the 256-bit bus and have up to eight modules attached for a total of 2GB of DDR5 RAM (some third-party designs have more RAM because they interleave the chips). Because the card maximises memory bandwidth through the use of GDDR5 and not the eternally slow GDDR3, there’s no reason why a larger 512-bit bus would be necessary for the GTX680, a 384-bit one less so.
So for those out there who really, really want that 512-bit card, buy a GTX690. Asking for something technically and financially unfeasible just because you know its a bigger number makes no sense, especially when it brings no benefit to you.
With the GTX660 Ti, Nvidia took the same approach with having different amounts of RAM on one controller, allowing it to further use its economy of scale to its benefit, as well as stifle performance intentionally. Its the first time they’ve tried interleaving and this means that the GTX660 Ti is part of a long-term experiment for the company. Remember when I said that its better for the company to make loads of GK104 chips and simplify their production lineup as much as possible? By binning the GPU cores and re-using the same 6GHz Hynix memory in all of their cards based on Kepler, Nvidia can keep production costs down and adjust pricing for the cheaper cards without losing much profit. The only exception to this is the Kepler-based GT640 with the GK106 chip.
Using pairings of 2Gb modules , the GTX 660 Ti lands up at 2048MB on the dot. Those extra open slots on the two remaining memory controllers will allow third-party manufacturers to add an extra GB of RAM onto the chip. What’s interesting is what would theoretically happen if you had to remove those extra 2Gb modules. If you had to take away 512MB RAM from the GTX660 Ti, you’d land up with bandwidth at a much higher 192GB/s – just as fast as the GTX670, in fact. With 3GB on the card, performance would crawl to just 96GB/s, far too slow for a performance card in this price range.
That’s why the likelihood of Nvidia chopping off another shader module for the GTX650 is so… likely, for want of a better word. Chopping off a module and using a smaller bus width again (128-bit) would be the best way out and the GTX650 could end up with around 96GB/s. You can work out bandwidth yourself by multiplying the bus size by the frequency of the RAM in GHz and then dividing that by the amount of memory chips on the board.
STILL CLOSER TO A REFERENCE GTX670 THAN ANYTHING ELSE
Initially, no GTX660 Ti on the market will be a reference version. Nvidia chose to rather let third-party manufacturers get their overclocked and tweaked versions out first at around the $300 price point and this works out better for them in the end, since it means that they don’t have to bear the cost of the inevitable price wars while they wait for the market to settle down a bit. The PCB is the standard length of a reference GTX670, another way in which Nvidia saves money by re-using GTX670 parts and specifications. The two 6-pin power connectors deliver up to 225W of power, but generally the card pushes 160W as its max power consumption under load. Some variants like the MSI Power Edition pictured above specify 190W under load due to heavy overclocking. Most partners will leave their PCBs naked and only MSI covers most of it in a LN2-friendly shroud like this one.
Output-wise, the card has the same ports as the GTX670, including Nvidia’s design decision to use a full-sized HDMI port. Most enthusiasts who will use the card for gaming across three monitors don’t have Displayport-packing units – they’re forced to use converters and those are always of the mini-DP variety. I hope Nvidia’s partners choose to put in an adapter for free in the card’s bundle, for those peeps using an Apple Cinema HD monitor for gaming. Once again, gaming across three monitors can be done using the two dual-link DVI ports and the HDMI port.
Please take note that if you opt to use three 30″ screens, you need one on the Displayport-out, because HDMI has a resolution cap if you’re using the incorrect cable or don’t have the right screen – the HDMI 1.4a standard does support anything up to quad-HD resolution, but you must have the 1.4a-compatible cable to do it as well as a capable graphics adapter and monitor certified to the standard. That aside, what’s going to make this card fun is the dual SLI fingers, meaning you’d be able to pair three together for better gaming performance. If prices come down nicely, you could bag three of these puppies instead of a GTX690.
WAIT, WHY WOULD I WANT TO DO THAT?
Following from Tom’s Hardware’s extensive research into GPU stuttering, it was found that triple SLI or Crossfire setups didn’t suffer from1 that jarring issue, because the third GPU was taking on the last bit of load and also syncing things up nicely before the final render on your screen. If you’re sensitive to the issue, don’t go for the GTX690, three of these cards will do you much better. Speaking of benchmarks, lets have a look at some of those.
BATTLEFIELD 3 AND BATMAN: ARKHAM CITY
Battlefield 3 is a bit of a wonky game to test because in the single-player, the game is mostly GPU-limited. However, the multiplayer is greatly CPU-limited and reliant and some have found it almost impossible to play on high graphical settings without a quad-core CPU thrown into the mix to keep things going. Moving from 1080p, where the card is targeted to run because it’s a mainstream player, to native 30″ resolution (2560 x 1600, or 1600p) shows that it scales well, but you’ll have to hold back on the Ultra settings if you want something playable. Batman: Arkham City tells another story altogether, showing that the weaker memory configuration in the GTX660 Ti brings it down in games that tax the card’s bandwidth, allowing even the GTX580 to beat it when it comes to minimum framerates. The card scales well moving up to 1600p, with the GTX580 drawing up alongside it easily. This is the first indication we get that for some GTX580 owners, the only reason why you’d want to upgrade is for the power savings.
CRYSIS 2 AND DIRT: SHOWDOWN
Right on the money, the GTX580 draws up alongside the GTX660 Ti in games that tax the memory hard like Crysis 2, giving it the legs to perform nearly identically at 1080p and pull ahead at 1600p with the High preset. It’s no mean feat because that’s still playable performance right there and its almost two years old now. However, thanks to GPU boost which also makes an appearance on the GTX660 Ti, framerates are a little more stable and help towards a smoother experience overall. DiRT Showdown has been erratic for benchmarks until recently, with a patch making the game a lot more CPU-limited. This is evident in the fact that all the cards perform more or less the same, even when scaling from 1080p to 1600p. Oh, there’s a tip for you though. Don’t use Ultra settings in Showdown, apparently global illumination is a setting that practically makes the game behave just like Crysis, only it won’t spawn any irritating memes.
METRO 2033 AND THE ELDER SCROLLS V: SKYRIM
Metro 2033 is an important benchmark because the sequel, Last Light, will probably use the same visually stunning game engine. The game is taxing on both the memory and texturing subsystems, giving most cards an equal chance of performing at their best. The game does favour Radeon cards but on High settings the GTX660 Ti manages to keep within a hair’s breadth of the more expensive Radeon HD7950. In Skyrim the game is more CPU-limited but the card turns in a great performance, never more than 5fps behind the behemoth GTX580 and the competing HD7870. Skyrim changes into a memory-straining game at higher resolutions but still remains easily playable at 1600p. With the card delivering just about 60fps at 1080p, it bodes well for other RPGs like Diablo III and even Dragon Age II, both being CPU-limited as well to a point.
WHAT ABOUT SLI PERFORMANCE?
In most games that Tom’s Hardware tested, SLI scaling was superb, achieving almost 100% performance when two cards were paired. For the most part average framerates are high, but there’s a knock in performance in the Batman benchmark, suggesting a driver fault because the cards only scored 5fps at their lowest minimum. That’s quite odd, but thankfully the rest of the games don’t display behaviour to those extremes. As I mentioned before, though, the Radeon HD7950 performs better in some games thanks to its better RAM arrangement and resulting higher bandwidth. Both Crysis 2 and Dirt show this well and its enough to tell you that if you want to game on multiple monitors, you’d better be doing it on two GTX670s or two Radeon HD7950 cards. Any thing less than that will be better suited to use with a 3D screen or multi-monitor gaming but with detail settings greatly lowered.
This begs the question of SLI-ing GTX660 and GTX650 cards in future – will it be worth it? From a performance standpoint, at least not for the GTX650. It’ll be too weak with a constrained memory system and at that point you’d be better off with one GTX670 anyway. It’s odd because users of Radeon HD7770 and HD7850 cards in Crossfire setups have reported great performance despite using two crippled cards, suggesting that the only thing holding Nvidia’s lower-cost Keplers back is RAM performance. Would a little overclocking help that?
OVERCLOCKING, TEMPERATURES AND POWER CONSUMPTION
The only games that will benefit from overclocking would be the ones that are GPU-limited thanks to their reliance on textures and tesselation and those that strain the memory bandwidth. With Tom’s Hardware’s unit overclocked by 150MHz core and 250Mhz on the memory for an effective 7GHz, there’s very little difference in the benchmarks at all, suggesting that the card is operating at its very limit already. Temperatures stay in the mid-60° range, perfect for a card that will live in mainstream chassis and possibly even in some ITX ones as well. Power consumption is also great, easily ducking in under the load a single GTX580 creates and even showing that a decent 700W power supply will be enough for two in SLI. If you’re sticking to one card, you only need a minimum of a 500W power supply to make things work. If you’re upgrading from a GTX260; 275; 280, 460; 470, or Radeon HD4870; 5850; 5870; 6850; 6870 or the 6950, it’ll be a simple in-place upgrade without any worrying about your power supply requirements.
CONCLUSION: HOLY CHEESE AND CRACKERS
At $300 (approx R2500) for the promised reference versions and with most overclocked ones going for around $330 (around R2700) this would be a kick-ass card for mainstream buyers…if our local stores could stick to something like that. With most online retailers pricing their cards above R3000, they’re playing the market on their own by putting them against the HD7870 which, in reference form, hovers just over R2800 if you do your homework. Regardless, its a great card and the price isn’t anything to complain about – in fact, once the reference versions pop up, there’ll be a very tight price war with AMD dropping their prices on retail HD7870 and HD7950 units just to keep things even.
But while the GTX660 Ti does serve the mainstream market so damn well, there’s still the sub-R2800 bracket to corner. With stock of the GTX560 Ti slowly trickling out and the GTX570 currently still going for around R2900, it may be that uninformed buyers will grab one of the older cards instead of wait for the inevitable price drops in three months. With a few driver tweaks, the GTX660 Ti may perform even better, prompting Nvidia to raise prices because there’s no serious competition.
So who should upgrade? Well, if you’re still running a GTX-200 series card, do it. Do it now, because you’re just falling further and further behind the curve. The GTX560 replaced most of the remaining Geforce 8800GT and 9800GT cards left over and this will surely do the same for the GTS250, the GTX260; 270;275;280 and GTX285, cards that today are just lagging behind unless you’re now playing your games at 720p with no AA. Its a bit harder for Radeon owners because despite the fact that you can’t turn on gobs of AA for your game, a HD5870 still holds its own in a lot of titles today, the HD6870 even more so (and I own one, so I know).
The same story applies for the GTX580, because as we’ve just seen there’s no real reason to turf it just yet – most games actually run pretty well and without hassle. Perhaps the inevitable GTX-700 series will do more to convince those buyers, but stick with what you have for now. The only counter AMD has left now is a Ghz edition of the HD7870 and an updated BIOS for existing buyers, as well as a price drop to $280. Those of you who own one can only hope.
For everyone else, this is the new mainstream king. All glory to the GTX660 Ti!