There’s this word in the English language that I love and consider to be a favourite: context. Context; the circumstances that form the setting for an event, statement, or idea, and in terms of which it can be fully understood. Merely looking at something and dismissing it without understanding it, or taking things in context of the object, event or idea you’re considering or observing, means that you could miss the point entirely about what’s really going on, or what the big picture is.
So take into context the following: Nvidia’s Geforce GTX750 Ti is twice as fast as the GTX550 Ti and is twice as efficient. It is also as fast as the GTX480 and needs four times less energy to operate at the same level of performance. It sometimes punches out the GTX650 Ti Boost and at no point does it ever exceed power usage of 70 Watts.
Mobile first – A first for Nvidia
The GTX750 Ti, together with the standard non-Ti version, is the world’s first GPU based on Nvidia’s Maxwell architecture. Maxwell is the successor to Kepler and it’s an impressive one considering the absurd gains Nvidia has managed to achieve. It ditches the monolithic design of Kepler and shaves down the design of the shader modules – now called a Streaming Multiprocessor Module. All in all, it’s a much more efficient design than Kepler and a much cheaper one to make as well.
It’s also a sign of the times that this is the first GPU with the Maxwell architecture and not a high-end card. You see, this is the same GPU that’s going into laptops and ultraportables. It’ll be in the Lenovo Y500 series at some point this year and you can bet that many laptops in the midrange segment between R8000 to R10000 will have this exact GPU inside.
In a way, this is Nvidia finally taking Intel’s lead and making their products for the most lucrative markets first. Destkop graphics shipments may have gone up last year, but overall the market share for new desktop hardware continues to shrink. This was possibly their plan right from the moment they realised that the GTX680 based on GK104 was powerful enough to counter AMD’s mighty Radeon HD7970. AMD, no longer focused on outright efficiency, wasn’t going to be much of a hindrance to their existing market share.
Maxwell is Kepler 2.0
Compared to Kepler, Maxwell isn’t too different but it also isn’t too similar. In each shader module it packs 128 CUDA cores and subdivides these cores into groups of 32. It then divides the specialised units inside what used to be Kepler’s SMX and allocates them to a separate cluster inside the SMM. CUDA cores and their needed load/store units are now completely separate from each other, only sharing a few critical resources like cache, texture units and double-precision shaders.
A lot of sites may not put the reasons behind the changes too plainly, but Maxwell is designed for sheer efficiency. In the past, SMX modules in Kepler would be told to share a programming load. Even if workloads could have been done by the full GPU and in a much quicker time, it used up a lot more energy to utilise the entire GPU for smaller jobs. In addition, not only was the rest of the hardware idle for the majority of the time, it was also consuming power while doing so.
For Maxwell, then, Nvidia decided that things needed to change. Putting smaller numbers of shaders into each SMM and then dividing them again logically means that developers programming for Maxwell specifically or in CUDA generally can allocate workloads to separate SMMs and even to separate CUDA groups. Overall, the same processes can use up more logical blocks and allow the rest of the GPU to spool down and not use power at all.
Put more simply, a workload on Kepler that would take up 50% of GPU time and utilise 100% of a single SMX’s resources now takes up 100% of GPU time on Maxwell and only 50% of the resources of a single SMM. In both scenarios the program executes in the same time with the same result, but running it on Maxwell resulted in a 50% reduction in energy consumption and heat.
Nvidia says that in their own assessment, a single Maxwell SMX achieves 90% of the performance of a Kepler SMX in a much smaller die area.
There are quite a couple of other optimisations that Nvidia has made to Maxwell and a few of them aren’t done for efficiency either. There are improvements to the NVENC hardware which is used in game streaming and this has been improved to reduce latency during the encode and decode process. There are other improvements too, like the improvements to buffer capturing, which is useful with technologies like G-Sync and improvements to render target captures, which means that Shadowplay can capture higher quality video.
Nvidia has also been working on a H.265 decoder which will be necessary to decode video for the upcoming UltraHD 4K resolution. For the moment it’s not that beefy and Nvidia will rely on a combination of hardware and software to accelerate the decode process, but it’ll be the first Geforce part that’s ready for the 4K revolution.
For plain video buffs who plan on using this in a HTPC, there are also performance improvements to regular H.264 video decoding and lower power use overall. Sadly, Displayport is still at version 1.2 and HDMI is still stuck on 1.4a. Maxwell might be ready for 4K, but only at 30Hz for the interim while monitor manufacturers get a clue.
The reference design is tiny
Nvidia’s reference design for the first Maxwell cards, the GTX750 and the GTX750 Ti is pretty small and also seen in more recent cards in Nvidia’s stable. It’s barely longer than the PCI-Express slot attached to it and it only slightly overhangs over the motherboard. We’ve seen this design before in the GTX650 and GTX650 Ti as well as the GTX650 Ti Boost, the GTX660 and the GTX660 Ti. There are even some custom versions of the GTX670 made specifically for the ITX form factor that also are in this size. Last year, the GTX760 also debuted with a reference PCB of the same size.
You’ll notice that Nvidia kept the profile rather low for the card and that’s because it really doesn’t need any bells or whistles. It doesn’t need a six-pin PEG power connector, it doesn’t need a dual-slot cooler to exhaust heat out of the chassis and it doesn’t need a massive fan to keep it cool.
The GTX750 and the GTX750 Ti also doesn’t sport a SLI finger and in fact doesn’t enable it at all on the desktop. Perhaps we’ll see mobile variants of this GPU with SLI enabled. Interestingly, one of the possibilities with Maxwell that will be a feature in Geforce GRID is also possibly open to consumers as well – Ad-hoc API Shimming in DirectX. What this does, essentially, is allow for you to run separate instances of games and allocate them to separate Maxwell GPUs.
I’m not entirely sure that will work and it was only discussed in a single slide at Nvidia’s GTC conference in 2013 but it’s a fun idea either way. It may open up some interesting streaming options for Steam’s In-Home Streaming service a little down the road.
Frames, frames everywhere and not a stutter in sight
In PC Perspective’s game tests, the GTX750 Ti showed some strong placings against its competitors. In Battlefield 4, the GTX750 Ti is mostly faster than the Radeon R7 2670X by a few frames per second and 90% of the time will be close enough to be indistinguishable. That’s a good start for Maxwell seeing as AMD’s Bonaire GPU is the mid-range model to beat from the red team.
Bioshock Infinite saw many of the cards struggle and this is because the game is relatively taxing on memory bandwidth. The Radeon R7 265, with it’s Pitcairn-based GPU and 256-bit memory bus sees the lowest amount of stuttering and frame variance, while every card with a bus width equal to or smaller than 192-bits struggles somewhat with everything on full blast. Despite this, Maxwell does relatively well for itself, although performance in Infinite could be way better.
Crysis 3 is up next and shows us a completely different story. Every GPU performs well within their memory limits and the game tends to focus instead on raw shader power. This allows the GTX750 Ti a slightly larger lead over the Radeon R7 260X but it’s now much further behind its main competition, the R7 265, which is a much beefier chip. At 1080p with Ultra settings, it’s mostly unplayable anyway. Medium settings would be a much better representation of the card’s ability.
GRID 2 is more lenient and all of the cards perform well enough to make the experience playable. Once again the GTX750 Ti falls behind the R7 265 and matches the R7 260X most of the time. It is, however, much faster than the GTX650 Ti.
Metro: Last Light is a really taxing title just like the original Crysis and here the GTX750 Ti retains the placing with the R7 260X, but outperforms it in some parts of the benchmark. Lightening the load would help a little and the only cards providing playable performance here are the GTX660 and the R7 265. All are better than the slide show that is the GTX650 Ti.
The final set of results in PC Perspective’s review comes from Skyrim and most of the cards do very well at maximum settings. The GTX750 Ti sticks closely to the R7 260X once again but sees fewer lag spikes throughout the benchmark compared to other Geforce GPUs. The R7 260X and 265 both see very little frame variance and all of the cards stay well above the 30 fps mark for playability.
Overall, the GTX750 Ti doesn’t place too well in the performance stakes considering that the Radeon R7 265 is the same price but mostly manages a lead of 10% or more in most benchmarks. Although the GTX650 Ti Boost is missing in these benchmarks, it is EOL and Nvidia is no longer shipping new versions of those chips to partners, though it would be the equal of the GTX750 Ti shown here today.
In all honesty, the placement of the card and the price makes it a bit of a hard sell, but that’s before you look at power consumption.
Under load, PC Perspective’s test system with the GTX750 Ti draws the lowest amount of power and shaves slightly more than 30 Watts from its nearest competitor, the Radeon R7 260X. Being on average 50% faster and 20W under the GTX650 Ti is also a good achievement and Nvidia has a real winner on their hands. Paired with something less beefy, like a mini-ITX motherboard, Intel’s Core i3-4130 and 8GB of RAM, that could bring total system power consumption below 160W – well within the limits of a good pico power supply.
Paid that with the much smaller cooler and you also net lower operating temperatures, less system noise and a cleaner setup all-round because you don’t need that extra power cable. Its almost as if Nvidia designed the first Maxwell card to be perfect for ITX gamers and HTPC users, because the GTX750 Ti ticks all the boxes that most people would need for those uses. In laptops, it would be unparalleled.
Mining and Per-watt performance is interesting
In fact, the significance of what you’re seeing here doesn’t really sink in until you consider two metrics that have begun to be considered in most thorough reviews – performance-per-watt and the hash rate for the card running a Litecoin mining application.
According to TechpowerUp’s results, with the GTX750 Ti normalised to 100%, it’s far ahead of any of the other, more expensive cards in terms of how much performance it delivers when compared to energy usage. Its closest competitor, the R7 260X, isn’t even in the same league, dropping some 45% in the rankings, which means that it uses almost 50% more power than the GTX750 Ti to perform the same task with similar results. The GTX650 Ti comes close and ties with AMD’s best mid-range card, the Radeon R9 290X which is based on the HD7870.
Where it gets really interesting is used as a card for mining. In Litecoin when usung CUDAMiner to expose the card’s computational abilities properly, it achieves a similar performance/Watt score to the Radeon R9 270X, which, together with the R9 280X, is one of the better cards for mining because you get returns on it much earlier and the energy cost involved compared to your hash rate isn’t significantly high. In other words, it’s more profitable especially if you’re doing it in the short term.
Along with Mining, there’s also the chance that this can be a very good dedicated Physx card for Nvidia or AMD setups. If you don’t want your main GPU bogged down with Physx calculations being offloaded to it (as opposed to being run mainly on the CPU) this card would improve performance in that scenario with an insignificant energy increase.
Where it fails, where it wins
Where Maxwell falls short is in the price/performance metric and in its readiness for the future. The R7 260X is cheaper and performs equally well while the GX750 Ti, despite the advances in the NVENC hardware, isn’t ready for 4K at 60Hz and doesn’t support Displayport 1.3 at all. It also isn’t compatible with DirectX 11.2 and it doesn’t even have SLI in any form. Its a very locked-down card and there’s not a lot of overclocking headroom thanks to Nvidia’s limitations on how far these cards can go. 1.2GHz on the core with GPU Boost is possible, but not with all cards. Its debatable if it even needs extra power because that would skew the ratio in performance-per-watt assessments.
When it wins, though, it does so quite convincingly. It uses less power than a GTX650 Ti and outperforms it by almost 50%. Its literally twice as fast as the GTX550 Ti and uses half as much energy. Its also twice as fast as the previous king in the low-power segment, the Radeon HD7750. If you’re somehow still running a Geforce 8800GT (a museum piece by this point), this card is orders of magnitude faster and more efficient. If you’re on a GTX480, you might want to sit down to contemplate the fact that this tiny, puny card, is the equal to what used to be Nvidia’s flagship GPU, with an energy requirement five times lower.
Progress is delicious and simultaneously absurd when it comes to Maxwell.
It is tiny, nippy and scary fast. Its like dynamite packed into a container the size of a matchbox, but with the explosive force of a limpet mine. Nvidia has outdone itself by producing a GPU Architecture that produces the same performance benefits as a process node jump, but remains on the 28nm production node. Nvidia also doesn’t need to do anything to get Maxwell into laptops as it’s already significantly more efficient than the GT750M currently found in most mid-range devices.
The truly astonishing thing is that it’s possible that, should Nvidia use 28nm for the entire Maxwell range, we would be seeing a GTX760-equivalent card with similar performance and almost half the total energy consumption. A Maxwell-based GTX690 equivalent would run on two 6-pin PEG power connectors, not two 8-pin connectors.
Conversely, this is a big threat to AMD. None of their GPUs can compete on an efficiency basis. Hawaii is particularly in trouble because it’s a huge chip that runs at searing 90º Celsius temperatures and in terms of efficiency, heat generation and noise levels it is beat by the GTX780 Ti convincingly. Unless AMD has something on the 20nm node in the works, they’re going to be at a big disadvantage in the future.
And that’s one other thing about Maxwell that I haven’t spelled out yet – later this year, a higher-end version of this architecture is rumored to be made on the 20nm process. This would pile on the efficiency gains even further. Being attacked on the processor side by Intel is one thing, but being pummeled by Nvidia in the efficiency stakes while suffering inflated prices and shortages thanks to cryptocurrency mining is a problem they need to sort out immediately.