So here we are, three weeks on from Nvidia’s worldwide launch of their flagship GTX680 and stock is so limited you’re lucky if you even get one now. There’s a huge demand for a card from Nvidia that boasts both a better price and better performance than AMD’s flagship and completely leaves the previous generation in the dust.

In my earlier analyses, I showed you that the GTX680 was unlike any Geforce card we’ve ever seen from the company. Nvidia has scaled down performance in areas not-so-important and improved things in general elsewhere, even dropping power usage a whole 60watts down from the GTX580 on average. This card punches so far above its weight, it matches a dual-GPU Radeon HD6990 and GTX590 in some benchmarks.

If it wasn’t so scarce right now, your grandmother would be running one. But we must finish off this three-part saga with a final look at the other things that make the GTX680 a worthy card to nab the performance crown, starting off with something everyone’s asking about: overclocking.

GPU BOOST

Or rather, the apparent lack of needing to. Kepler launches this year with a new feature called GPU Boost. One of the things about turbo-boosting a chip is that consumers no longer have to sit with the same performance in a single-threaded app. When Intel debuted their Turbo Boost (a variant of the old Speedstep), users of single-threaded apps like iTunes recorded a marked improvement when the single core in use was dynamically overclocked to TDP limits that the chip imposed.

With GPU Boost, the graphics card’s drivers constantly monitor your performance in applications and dynamically adjust the clock rate to improve performance. For example, when you’re in a room or closed-off building in Battlefield 3, the GPU clock will drop back to stock speeds to save on power and heat – you’re only drawing as far as you can see in the room, so why waste resources and power by using everything at your disposal to render a tiny model?

By default, Kepler increases voltage automatically according to a strict table relating to clockspeed.

Once you’re out of the room, your clock rate gets adjusted according to current temperature, thermal limits and power draw – and you’ll get better performance outdoors. Well, this is all in theory as there are several scenarios where your GPU will be at its power draw limits already, and won’t be able to overclock itself. In the case of running benchmarks, GPU boost doesn’t activate for all of them as the card has already been approaching its power draw and won’t dynamically accelerate itself.

However, when overclocking the card yourself, its worth noting that you can’t turn off GPU Boost. If you raise clock speeds to 1.2Ghz, the software will automatically assume that you can boost up to 1.3Ghz. If you previously were fine at 1.3Ghz in stress tests and benchmarks then this should be perfect, but for card that won’t clock that high this will be a problem. On the other end of the scale, if the card thinks you’re already at its power limit it won’t dynamically boost your manual overclock. You can get around this in the driver’s settings, but Nvidia needs to release an overclocking manual for noobs who won’t know about this (or you can just bookmark this column, either way the noob wins). Right now, Nvidia has imposed a limit of 225watts for the maximum power draw of the GTX680, and third-party manufacturers are already using a 8 and 6-pin combo to get higher overclocks.

DIRECT COMPUTE TAKES A SMALL HIT

Moving along to Direct Compute, I showed you all earlier a single slide (see left) that compared performance in Civilisation V against other Compute-aware cards. Civilisation V uses Direct Compute to accelerate the decompression of textures used on the buildings that are saved on the hard drive. Nvidia made sure that they would hold back Direct Compute a bit, since many enthusiasts who would normally use a Quadro card went for the GTX570 or GTX580 instead, both offering the same or similar performance for far less green bucks.

In a bid to keep their Quadro lineup attractive, I expect Nvidia to artificially limit Compute performance in the GTX690 so that it beats the HD7990, but stays away from what the Quadro lineup will offer. Yes, you can get a Quadro card and still play games with it, but its not advisable. If you really still need the performance, get a board with more PCI Express slots and run two Nvidia cards. Or a board with Lucid Logix’s Hydra chip and then put a Radeon card in for gaming – either way, its going to achieve the same aim.

ADAPTIVE V-SYNC…ADAPTS!

The next must-have feature Nvidia decided to implement was Adaptive V-sync. V-sync works against your high-performance graphics cards to artificially limit the framerate to a maximum of 60FPS. Without it, you would see frame tearing on your screen every time you moved the camera from side-to-side. Early adopters of Eyefinity setups would have noticed a line running down the middle screen every now and then – this was a driver issue and for some reason disabled V-Sync on the main screen showing the Windows desktop.

For all owners of Geforce 600-series cards, a new feature in the drivers will be available that allows the card to card to turn V-Sync off when it isn’t needed. If you fall under the 60FPS mark the drivers turn off V-Sync to prevent stuttering in-game. Go back to 60FPS and it turns back on to prevent tearing. Its such a simple, welcome improvement that many have no idea why this wasn’t implemented long ago (you probably could do it with a customised version of something like FRAPS, but only enthusiasts, game developers/programmers and hard-core gamers are willing enough to put in that kind of work). If you previously had V-Sync on and it took a little longer than usual to render a frame, V-Sync would force that frame to wait until the next screen refresh.

For the most part V-Sync won’t turn off while you’re playing with the GTX680 and really taxing games will be the only thing likely to activate this feature. Still, a welcome addition and it enables smooth gameplay with the card even a year or two onwards. If you run SLI in any form, you’ll have this on all the time to combat micro-stuttering as well, especially playing with multi-monitor configurations. Along with this, I also wrote earlier that Nvidia has made optimisations to the frame rate in the centered screen in a multi-monitor setup.

Well your middle screen will always run at 60FPS with Adaptive V-Sync and the monitors in your peripheral vision won’t run at the same speed at the same time, but rather at a slightly lower speed. This is again a measure to combat micro-stuttering in multi-monitor and SLI setups and should help gamers enjoy a far better experience this time round. I’m not sure if Nvidia will include this in a driver update for all Geforce cards, but AMD’s driver team had better get on this horse pretty quick.

SOMETHING NEW: BINDLESS TEXTURES

This is something I didn’t report on earlier because I wanted to address it here separately. What Nvidia’s done with the re-arranging of Kepler is allow for more threads to be executed per core clock. In a single clock cycle the CUDA cores in Kepler’s Shader Modules chew through 64 threads regardless of dependancies, enabling Kepler to do double the work of Fermi in the same time frame. What I didn’t tell you is how games are going to use this to their advantage and here’s how. Bindless textures is an improvement to the method of applying textures from a pool of textures to polygon models once they’ve been drawn and put in their place. Once textures in the binding pool are loaded into RAM, they can be assigned to polygon models to give each one their own unique look.

Now lets say you have a room to draw up with lots of detailing and there’s 128 different textures to be processed (for reference, the maximum number of textures in a binding pool you could address with Fermi was limited to 128). For each polygon model, you’d have to assign a set amount of textures from the pool because you didn’t want to waste resources – a new polygon  that needed its own texture pool was a huge waste of resources. Fermi could address textures that were binded to polygons better than AMD’s comtemporary designs could and that’s the reason why Crysis 1 and 2, both texture-heavy games, performed better on Geforce 400-series cards. It also helped that the GTX470 and GTX480 both featured framebuffers larger than 1GB, which was a crucial weakness for any card that you played Crysis on.

However, there was still a limit of 128 textures per polygon model that developers had to work around. If you wanted more, you’d have to keep a second set of textures ready for binding to polygons that needed to look a little different from the rest. If you were careful, you could have a group of like-objects and have them share a pool of 128 textures – this was how Bioshock and Borderlands worked past that limit and you’ll notice it whenever you load a level from the desktop. On lower-end systems, you would watch various textures being loaded while the game was initialised properly – in Borderlands it was your gun and surrounds that loaded last, while both Bioshock games loaded up weapon and special textures like scratches on the wall last.

Your games could look as detailed as this soon enough.

With Kepler and Bindless textures there’s no more of this binding crap. The shader module can now reference up to one million textures, allowing polygon models to have loads more details put onto them. If you wanted to create a scale version of the Sistine Chapel and individually draw and texture each painting, each tiny detail, you’d now have the power to do so without having to work within the constraints of a low memory count or a crappy hardware limitation. And now with double the number of available texture units, it’ll be drawn up and processed in half the time, too.

Civilisation V’s benchmark that I showed you earlier is a great metric for performance, because games that use lots of textures will have to do the same thing in future. Having a 3GB frame buffer on a card might be a big help, but its easier to have the textures fetched from the hard drive, decompressed and cached into RAM when they were needed. You can address up to a million textures with Kepler, but you’d never really want to.

Its far too much work.

FXAA AND TXAA: HOW THEY AFFECT YOU

Finally, the biggest development to come from Kepler is the new anti-aliasing algorithm, TXAA. But hang on, I mentioned FXAA as well? I want you to look at the Samaritan demo below first, run off entirely on the Unreal 3 Engine:

[youtube]RSXyztq_0uM[/youtube]

For those of you with 4Mb lines who are able to watch the demo in the full 720p resolution, take note at how the demo has no jagged edges in straight-edged objects. Thats FXAA at work and it works better than regular AA or MSAA (Multi-Sampled AA) because it smooths over all lines in the demo that need AA applied to it. Fast Appropriate Anti-Aliasing (FXAA) analyses all the pixels on the screen before it gets shown to you on your monitor. If there are any pixels that create artificial edges, FXAA smooths them over by adding in extra pixels to take out the rough edges. Other AA options like MSAA introduce extra pixels to smooth over lines while the polygon and textures are being melded together, increasing draw and render time because things all have to be worked out in advance on a more complex scale.

Not even bleeding edge cards could keep up with S.T.A.L.K.E.R. with Dynamic Lighting enabled.

Many hardware reviewers chose to use a specific level in S.T.A.L.K.E.R that incorporated rays in sunlight beamed into a room as a measure of how much the game could stress your card. S.T.A.K.E.R. used dynamic lighting as much as possible in every level and if you couldn’t make it through that specific part of the game without choking at your native resolution, your experience would degrade from there. As you moved through the room, the game’s engine would apply AA to smooth out lines and ridges, going so far as to correct the beams of light themselves. Enabling MSAA created a huge performance hit, because everything had to be re-drawn constantly as you moved through the room. It was a good measure of how your GPU coped with applying AA to games and is still used as a performance metric for some sites, even adding on DX11 to up the ante and really stress things out.

In Samaritan, three GTX580 cards in SLI were used to run the Unreal 3 engine at its highest levels of detail. Today, a single GTX680 would be able to achieve a similar level of detail, as would AMD’s GCN architecture in cards fast enough to run it. But while FXAA end up being better and faster than any form of AA before it, appearing in many mainstream games today following its inclusion into Skyrim, TXAA will be available to Geforce 500 owners in a future driver update, however there isn’t any word on how AMD’s GCN will run with it.

Temporal Anti-aliasing (TXAA) is a hardware upgrade to MSAA and works on the same principles. That means FXAA is still faster, right? In some respects, yes, but the slight speed deficit is a decent trade-off for the huge increase in image quality. TXAA has the performance hit of 2x MSAA, but with a much higher image quality at 4x MSAA. MSAA works by taking all lines on the edges of polygons in the game engine, applies textures to the polygon model and then reduces or fills in pixels where necessary, ensuring a smooth look all-round. MSAA works in most games and can even be forced in older versions of game engines that unofficially supported it. But while regular AA only does a single run over the polygon model, MSAA makes several passes over it to make sure things are all smooth and kept relative to each other (hence, Multi-Sampled AA).

TXAA improves this by instead running a few passes over the object and then altering the light source to make sure that there’s no aliasing. If Onona reads this, I’m sure she’d pipe in and say that the film industry has used this technique for a while for CG effects. To smooth out and sharpen objects on-screen, TXAA takes out any aliasing it detects just like MSAA, but then also does something a little more special. While applying AA, the edges of a moving object are slightly blurred to avoid unnecessary sharpening of an object that doesn’t need it.

Oh wait, you’ve all seen something similar in two games that I know of – Need For Speed Carbon and GTAIV. Both games employed AA to smooth out the edges, but both also blurred objects in motion intentionally to make things look smoother and crisper at the same time. Look at the image to the right, and you’ll see what I mean – the car is in full focus, fully anti-aliased and looking great, while the game only aliases the bridge above slightly and uses a blurring effect to make it look less visible.

Sneaky, huh? Unfortunately it works, as I’m sure many of you wouldn’t have noticed this. Sadly this wasn’t possible on consoles, and so things will continue to look crappier until we see what the PS4 and Xbox Durango are capable of. If both use AMD’s HD7000 series cards properly, we’ll see FXAA on both as well. TXAA, unfortunately, won’t be coming to that party.

And that’s that, finally! After a good, in-depth look into what the GTX680 brings to gamers today, its clear that Nvidia’s on the warpath again not just for the performance crown, but also for graphical dominancy and superior image quality. The GTX680 at R5500 not only invalidates the need to buy a Radeon HD6990 or even a GTX590 because both end up being beaten in many scenarios by a card with a single processor, but also provides remarkable value for those looking for a new card that they won’t have to upgrade for several years.

TECH: NVIDIA GTX680 ANALYSIS, PART ONE

NVIDIA GTX680 ANALYSIS, PART TWO: PERFORMANCE

More stuff like this: