Nvidia’s Mantle-beating driver out in beta later today

Nvidia logo HD

AMD’s Mantle has had a lot of media coverage over the past few months for it’s ability to almost completely eradicate all traces of CPU bottlenecks when you’re pairing a high-end GPU with something like a Core i3 or i5 or FX processor. Mantle removes a lot of API bloat and prioritises multi-threaded code, resulting in games that aren’t limited in single-core performance but this requires a lot of work and, occasionally, a complete re-engineering of a game engine in order to support the renderer.

Nvidia, not content with letting AMD get the performance crown on unequal grounds, has been working in the shadows to improve performance on Geforce graphics cards with results that could rival Mantle’s offerings. But there’s a little more to it than that.

The driver version is 337.50 Beta 1 and Nvidia has been quiet about it for most of 2014. While details are scarce, a few graphs that were chucked out at the GPU Technology Conference 2014 held recently in San Jose, California, during Jen-Hsun Huang’s keynote have some interesting implications. The graphs reveal what Nvidia’s work has achieved in optimising their hardware and software drivers for DirectX 11 rendering – it certainly looks very promising for Team Green.

But hold your horses before you jump to conclusions – almost certainly, the increases in performance indicated on these graphs are slightly twisted to suit Nvidia’s agenda. Looking closer reveals two things – one, that CPU performance is indeed increasing a slight bit with these drivers and two, that the Radeon R9 290X in this scenario almost certainly is running in Quiet mode with the stock fan and is under-performing in Thief.

Nvidia DX11 efficiency improvements

The first example presented is a run-through of the Star Swarm and Eidos’ Thief. As is expected, the Mantle driver for the R9 290X on Star Swarm almost doubles performance while we also see the Geforce GTX780 Ti scaling up in performance with new driver iterations. Nvidia claims that with the new R335 driver compared to R334, framerates in their tests jump from an average close to 55fps to more than 65fps.

The Thief results are a bit interesting because the graph isn’t centered to zero, instead it starts at 48fps. You think those numbers are high? Think again – there’s barely a 2fps difference between the Mantle driver and the GTX780 Ti’s score with the R335 driver. In the month-and-a-bit since the release of Thief Nvidia has done a good job with optimisation, but it’s not as incredible as they’ve painted here. Unless these tests are being run at 4K resolution with medium settings, they are actually underperforming at what is assumed to be 1920 x 1080, going by Techspot’s results from the game pre-Mantle and Anandtech’s own 1920 x 1080 tests with Mantle enabled.

Nvidia GPU CPU perf scaling

In his keynote address, Jen-Hsun Huang noted that the reason why GPUs saw such drastic increases in performance is because they are add-in boards – increasing overall power doesn’t mean an entire system swap, it’s just another hardware components that can be added in with minimal fuss. And so over time, much of the performance improvements to the desktop came from GPU upgrades because we reached the “good enough” plateau with Intel’s Sandy Bridge family.

In terms of computing power and bandwidth, GPUs see performance scaling in a much more drastic manner than any other component in the system (aside from SSDs, which hit the SATA speed limit really, really quickly). Bandwidth sees an overall improvement over time in a more relaxed fashion while GFLOPs performance increases as a result of both nanometer production improvements and newer architectures.

Its not been plain sailing for Nvidia either – the Fermi architecture seemed to result in a stall for the company, showing much less improvements between the GTX480 and the GTX580 than people had expected. Many, many jaws were dropped when a Geforce GTX680 was shown to be exactly as fast as a single GTX590 while consuming a third of the power.

Nvidia API efficiency improvements

When the driver releases later today, Nvidia might take some time out to detail their improvements to the R335 drivers. In a slide leaked by Videocardz, Nvidia claims an almost 9-fold increase in draw efficiency (not to be confused with draw calls) and improvements in two other vital areas, both related to the frame buffer. Most likely, these result in both an overall rendering efficiency based on the way in which the driver parses commands to the GPU as well as a lower bandwidth hit for cards in SLI communicating over the SLI bridges. I’m entirely unsure what CB-Set is or what it improves so if anyone knows, please let me know in the comments below.

Comparisons to what Mantle is doing

BF4_2560x1440_OFPS_0 BF4_2560x1440_PLOT_0

Recently Battlefield 4 saw an update which enabled a few rendering options for the mantle version of the game and that included the ability to overlay the necessary bands of colour on the left of the screen in a manner that makes frame information accessible to Nvidia’s FCAT software for deeper analysis. Previously this wasn’t possible because the colour overlay was injected into the rendering process at a set point in DirectX and the tool was incompatible with FCAT.

Now, look at what Mantle does when applied to PC Perspective’s test rig fitted with the Radeon R9 290X. In single-card mode, performance increases overall very slightly, but it’s definitely there. That’s the case of improving on the already powerful Core i7-3960X. Had it been a Core i7-4770K or a i5-4670K, the differences would be a bit bigger.

But watch what happens when Mantle is applied to a Crossfire pair of the R9 290X. Not only does performance scale when compared to single-card use, it also results in a frame time graph that is mostly flat and gameplay that would be super-smooth. In fact, the flat line is a result of the experiments that DICE are running with the Mantle renderer. You can have the renderer prioritise the game for low frame variance or high frame rates or something in between the two – presumably the second and third options haven’t been tested by PC Perspective just yet.

With the defaults applied, what is implemented is something very similar to the behaviour of V-Sync, but without the increased input lag and resulting drop in player performance. That is very similar to what G-Sync offers, but G-Sync requires new technology that isn’t widely available and isn’t cheap.

That’s what Nvidia is up against – they don’t have their own form of Mantle, they can’t influence Microsoft’s DirectX renderer directly (at least not now) and they can’t go any further than driver improvements. Its an unfair match against Mantle and I suspect that this is how it’s going to be for the next two years until DirectX 12’s eventual launch with Windows 9, leveling out the playing field once more.

AMD can offer incredibly low frame latency and consistent frame delivery, something that will be an important part of how they face off against Nvidia G-Sync and improve 3D VR performance. Maxwell’s overall performance and efficiency may be higher under DirectX but the two rivals’ approach to smooth frame delivery is drastically different and will be the key factors defining their offering to gamers until the launch of Windows 9.

Moving forward from the launch of the R335 drivers in WHQL form, this is going to be a very interesting battle between the two giants. Make sure you have lots of popcorn ready.

Sources: PC Perspective, Computerbase.de (via Videocardz)