AMD’s Trinity launched yesterday, does reasonably well

I say reasonably well because AMD’s Llano and Bulldozer chips have been underwhelming performers thus far. The company decided to design a new architecture that focuses on conserving resources, improving power efficiency and reducing heat generation. Its a good idea on paper compared to Intel’s approach of continuing the growth of Moore’s law using 3D-layered transistors, but in reality it strangles single-thread performance and requires higher clock speeds to match anything from Intel’s stable. In addition, there’s not a lot to differentiate CPUs from the same family. In gaming benchmarks, AMD’s FX-4100 performs similarly to the FX-8120, with some margin of improved frame rates in certain games thanks to the higher clock speeds of the quad-core chip.

Left to Right: Llano, Trinity and Phenom, three distant cousins

Yes, despite that some people say AMD’s FX-8120 is a octo-core chip, its really four Bulldozer modules with two single-core chips per module, squashed together and forced to share cache, floating point units and bandwidth. Likewise for the quad-core chip, which has two modules and really can’t contend with even Intel’s Sandy Bridge-based Pentiums. For laptops and desktops, AMD promised that Trinity would improve performance by 15% overall and prove a worthy upgrade from the Llano chips of old. Lets see how they’ve delivered.

While AMD’s low-power Brazos and Llano chips may have used the integrated “Sumo” graphics core as leverage in the past, Intel’s Ivy Bridge quickly caught up with their improved integrated graphics core. As Sandy Bridge filtered down into the low-end laptops, AMD’s market and mind share took another dip. Only Brazos-based netbooks and cheap Ultrabook alternatives gained any traction and a reputation for being capable media consumption devices. Gaming was an option on Llano chips, but as you’ll see later, Intel’s HD4000 chip actually does really well for itself in comparision.

Trinity's lineup only consists of dual and quad Piledriver cores, with two processors for each core.
Devastator brings some design wins for AMD.

The graphics core improves with Trinity and launches with “Devastator”, an improved version of the previous chip inside Llano. One of the biggest problems with Llano is that its graphics core was, essentially, too big for its boots. Devastator is based on AMD’s VLIW4 architecture that was used in the HD6000 series and is a more streamlined version of VLIW5 which Llano used. VLIW4 chucks out an ALU (Arithmetic Logic Unit) that was sitting idle in Llano/s SIMD units. Most applications only take advantage for up to four ALUs so AMD cut out the unused one and used the extra space to fit in more texture units and thread processors (similar to Nvidia’s plan to increase the amount of CUDA cores in a chip, while cutting out the fluff).

In addition, Devastator brings a number of improvements that we saw with the release of the HD7000 series. Devastator supports up to four display outputs using DisplayPort (Sumo was limited to two). You have to daisy-chain on the fourth monitor, but that’s pretty impressive for a notebook processor. In addition,  each output supports its own audio path so that as you cycle a playing video in VLC between monitors with their own speakers, each one plays sound in turn as the video player is moved between each one. Its a really impressive piece of tech and will be useful to people who hook up their laptop to an HDTV and expect things to work without fiddling. Display grouping is also supported and you can group displays in relation to their position in front of you.







Trinity improves the original Bulldozer design but remains on the 32nm process. That has implications for die size and power consumption, but AMD is confident that they’ve got things under control with this new generation of processors. They’re also banking on apps and more software to make use of OpenCL, which allows acceleration using just about any graphics core regardless of make or model. The biggest flaw Bulldozer and Llano have is that floating-point performance is still mainly performed on the processor and OpenCL was nowhere near the kind of polish or potential it has today.

For those of you who are still confused, OpenCL is a framework for other software that allows code to be executed on the CPU and GPU simultaneously, cutting down on completion time and the resources required to accelerate the process. Adobe’s Photoshop  and CS5 suite famously support GPU-accelerated editing options when using certain Nvidia graphics cards. OpenCL aims to remove limits like those and open heterogeneous computing to anyone who needs it.










The next thing to note is that Piledriver improves everywhere that Bulldozer cut corners to get to market as quickly as possible. There are lots of small improvements made from the way code is handled, improved branch prediction and even faster and more optimised L2 cache memory. Its still a bunch of Bulldozer modules clustered together, but they’re made a lot more efficient and robust. Given all the improvements, Piledriver reduced power consumption by as much as 20% which is great given that its designed for laptops and low-power Ultrabook designs. In addition, the floating-point weakness of Llano has been addressed, but not by much – AMD still chooses to put its eggs in its basket with OpenCL.

The last notable improvement has to be AMD’s changes to the way Trinity uses Turbo Boost. When the chip identifies a GPU-intensive workload, it underclocks the processor and raises the clock speed on the GPU. Likewise for a CPU-intensive load, the GPU is underclocked and the CPU’s speed is raised to as much as 3.2Ghz as we see below in the case of Cinebench. Its an interesting idea, and allows Trinity to stay within the 35w power envelope. What does this mean for battery life? You won’t have to worry about how long it will last you, regardless of the workload you’re doing.










This approach will be interesting for overclockers on the desktop if it’s implemented there as well. You might have to turn it off completely to keep settings stable, or you could overclock to find the chip’s operating limits and then keep boost on while raising default clock speeds, allowing the chip to speed up to the maximum stable speed for certain workloads. If we could raise the default power envelope and use our desktops at a nominal 2Ghz but increase that to 3.6Ghz on all cores when there’s a multi-threaded CPU-intensive workload, that would automatically be a far better option in terms of power consumption than Intel’s Turbo boost because it only goes up to a certain point before hitting the maximum power envelope.

In general, performance has improved in all areas and we see this in the benchmarks below. Anandtech’s tests showed Trinity gaining a small lead in PCMark 7’s benchmark, achieving roughly half the computational power of Intel’s Ivy Bridge-based Core i7-3720QM. That scenario plays out in the rest of the benchmarks too, showing better performance than Llano and almost catching up to Sandy Bridge dual-core chips in the Productivity benchmark. Moving to the Creativity test, its shown the the optimisations to Piledriver do their best to improve performance when creating content. However the Computational test shows that Trinity, even though it improves hugely over Llano in the Computation test, still falls very far from the baseline set by Intel’s Core i3-2367M.






Its an improvement nonetheless and for mobile users the battery savings implications are huge. Moving onto battery life tests, Anandtech found a notable boost in performance, gaining nearly an hour extra in the idle test with the screen at medium brightness settings, achieving nearly nine hours off the wall. Its interesting to see how performance actually drops with Ivy Bridge – even with a bigger battery, Intel’s Core i7-3720QM achieves roughly half the battery life of the Core i7-2637QM. Moving onto browsing, Trinity takes a huge leap forward with its efficiency and only loses under two hours of  battery life from idle. The Ivy bridge i7, however, barely loses half an hour, suggesting that the processor underclocks itself severely to let the GPU take all the rendering load of the web browser. In the video playback test things take a dive to just under four hour’s running time, but it does draw up favourably with Ivy bridge.






For gaming, however, here’s where we see the real improvements to Trinity. In Batman: Arkham Asylum we see the Devastator GPU drawing up alongside Llano’s Sumo chip, delivering playable performance at medium details. Intel’s HD4000 chip in the Ivy Bridge processors also provides playable performance and once overclocked, may even break the 60fps barrier. Battlefield 3 is a disappointment but it is a texture-heavy game, with Trinity beating out Ivy Bridge despite the more expensive solution having a lot more power available to throw at the game. Nvidia’s GT630M is the only solution providing a playable frame rate here and shows that EA’s latest cash cow is GPU-limited. Civilisation V shows Trinity shooting to the top thanks to its improved Compute and tesselation abilities, easily besting the GT640M. However, settings had to be turned to low here, demonstrating the power Civilisation V has of bringing stronger systems to their knees.






Moving onto DiRT3, Ivy Bridge and Trinity square off again with 40fps at medium details. DiRT3’s EGO game engine is used in Codemaster’s other racing games like F1 2010/2011 and in Racedriver: Grid. Skyrim again shows the 1-2 finish, but with both chips barely showing performance differences. I doubt many gamers will notice the 5.8fps difference in-game. Nvidia’s GT640M is looking to be the gamer’s choice here, almost always ending up with 60fps in most games. Portal 2 is processor-limited, and surprisingly shows Trinity in a huge lead over Ivy Bridge. This is again seen in Total War: Shogun 2, with Trinity showing nearly double the performance of the HD4000 inside the Ivy Bridge processor.






So what should we take away here? Firstly, Trinity almost always usurps the combination of Intel’s Core i7-2820QM in games along with the HD3000 and is now the baseline for acceptable gaming at medium details and native 15.6″ resolution. Ivy Bridge may end up performing marginally better, but that chip combination costs nearly $300 more to implement for manufacturers. Should you find a nice Ultrabook with the Trinity A10-4600M, that’s your best choice by a clear mile if you spend a lot of time gaming or at LANs. In fact, compared to Ivy Bridge, AMD’s price/performance ratio shoots through the roof.

Its only in very specific scenarios where you’d be better off with an Intel chip, especially if your workload is almost exclusively related to content creation – but consider that if OpenCL attains the kind of lofty support AMD hopes, there’ll be no reason to buy an Intel quad-core processor in a notebook over Trinity unless your application makes use of Intel’s Quicksync, which really can’t be matched in the notebook segment. For all other workloads, Trinity performs very well and the Turbo boost feature is implemented nicely.

So while AMD can celebrate a win over Intel here, what does it mean for consumers? Ultrabooks are approaching the $800 price point and the same notebook with Trinity could save you $200. However, given Intel’s history of forcing out its only competitor by asking suppliers to stick with its chips even when they’re not as good,  for Ultrabook builds I’m guessing there’s little chance you’ll see an AMD-based one in your local Game, Makro or even Incredible Connection. HP’s Sleekbook range is the first that I’ve seen with a focus on Trinity and it remains to be seen how other manufacturers approach AMD’s platform.

But certainly, we’ll see some interesting designs. Given the 35watt TDP, who could say no to a Trinity-powered Ultrabook alternative with an 11.6″ screen to rival the Macbook Air at half the price? How about options from Alienware to have this in their M11x? Or a 15.6″ R5000 Makro bargain with Windows 7 Home Basic and the A6-4455M processor? There’s a good chance AMD’s next Bulldozer refresh will bump up performance by another 15%, so things are looking up for the underdog once again. Shove an SSD into the chassis and you’ll be good to go. Things are even more promising for desktop users, as Piledriver for socket FM1/AM3+ is still coming later this year.

I only hope AMD can capitalise on the small lead they now have to improve their standing with customers and suppliers. After all, its the consumer they’re servicing and what better way to serve that consumer than to give them options?

Source: Tom’s Hardware, Anandtech, Hardware Canucks

Discus this in the forums: Linky