On Tuesday this week, AMD invited members of the press to tune into a virtual press briefing about the Zen architecture. Aside from what we already know from an earlier reveal, this briefing went in a lot deeper than everyone expected, going right down into the heart of what made it tick, and revealing some interesting details about the chip’s optimisation and performance courtesy of Jim Keller, who designed the architecture. Zen is a grounds-up reboot, something like the contrast between the last Hulk movie and the first Ironman – it’s completely different, and brings AMD back into the CPU race once more.
I won’t bore you with all the nitty-gritty details about the Zen briefing – you can find detailed analysis elsewhere on the internet if you’re that way inclined – and besides that, I believe my audience isn’t usually composed of developers or programmers. Instead it’s you, the gamers and pro-consumers, who look at these sorts of things and make decisions based on how buying hardware affects your wallet or experience. I’d like to present these changes using that frame of mind, because Zen is still very interesting even without the more technical details being unearthed. It might even be a winner, sales-wise, if not a contender for the performance crown in some key areas.
If you haven’t been following the news lately, or read my earlier write-up on Zen, this slide serves to bring you back up to speed. Basically, we’re looking at a chip that improves the amount of instructions per clock (IPC) it can chew through by more than 40%, a massive jump from AMD’s latest architecture called Excavator. Excavator is a derivative of Bulldozer, and when comparing cores at identical clock speeds is itself about 30% faster than Bulldozer.
Taking into account all the lessons learned over the years, as well as production process improvements, we’re looking at a core that’s almost 70% faster than Bulldozer, and may be almost double Bulldozer’s performance if it can ramp up the clock speeds high enough. All of this, using the same energy cycle as Excavator, adds up to a really impressive jump in performance – basically larger than anything AMD or Intel has achieved in the last decade in a single generation.
In 2017, Zen is expected to replace every other CPU architecture AMD currently sells to their customers. It’s going to replace the socket AM1-based Jaguar family, headed up by the quad-core Athlon 5370, and the basis of the processors inside the Xbox One and PlayStation 4. It will eventually replace Excavator as well. Zen will also eventually go into servers and desktops starting in 2016, and into laptops and NUC-like computers starting in 2H 2017. It will replace AMD’s semi-custom silicon offerings for consoles and enterprise designs, and it’ll go into APUs for the desktop where discrete graphics aren’t needed.
Nothing stops the Zen train from taking over. It’s what they’re counting on to make them more competitive in the marketplace, and they’re all in on this play. There’s no backing out now, and the time for a change in plans flew by in 2013 already. That’s probably a little scary for anyone who isn’t in the engineering department.
We’re learning a new word today: CCX. It stands for “CPU complex”, which is somewhat analogous to the modules present in Bulldozer. However, unlike Bulldozer, the execution units in a CCX are not shared between the cores. Each core has it’s own L1 and L2 cache, each core has its own hooks into L3 cache mounted in the middle of the complex. Thus, one CCX comes with 8MB of L3 cache connected and shared between all four cores.
Multiple CCX units can be combined into a single package, and there’s a custom data fabric that connects them up together, like PCI-Express for processors. AMD’s designs top out at eight CCX modules, for a total of 32 cores, 64 threads, and a massive 64MB of L3 cache.
The unfortunate part of this design is that it also includes L3 – it takes up an enormous amount of space on its own, easily over 30% of the available die space. L3 cache has always been an expensive addition to any design, and despite the improvements in the production process, it’s always going to take up a lot of space. I mean, just look at this Zen die shot to the right! The L3 cache in this dual CCX arrangement takes up more space than two Zen cores.
But the complaints about the cache size on the die are probably going out the window now that Zen is also a full system-on-chip. The South Bridge is integrated into it, finally, which means that there’s no need for the chipset to be integrated onto the motherboard itself. That’s nice because now system features can be prioritised over spending time implementing chipset features.
In addition to this, the CCX design also has one big positive for Zen APUs in the future – if you had to chop out one CCX in the die shot above, you could package that as a quad-core, hyper-threaded CPU, or fill the now-empty space with GCN cores to make an APU. Nothing else needs to change, since the data fabric between the CCX units is generic and apparently doesn’t mind that the cores on the other side are graphics cores.
That raises many more questions than we have answers for now, though. For example, how many GCN clusters can we expect to go into a package as a replacement for a CCX? If it’s more than one, is it addressed as a Crossfire pair? How much power needs to be allocated for them? Earlier this year I asked AMD it multi-chip designs were on the cards for GCN in the future and they confirmed in the positive, but I’d like more details about Zen APUs at this stage. Multi-chip modules are the future as we see slower ramp-ups to better and smaller production processes, and if this is the direction AMD is going in, then both their consumer and semi-custom designs are going to make them a lot of money.
In closing with this brief, Zen represents a major shift in thinking at AMD. The first of those is the addition of simultaneous multithreading, which is similar in concept to hyper-threading, but not a like-for-like adoption of technologies. AMD has in the past assumed that the future of software would move to clustered threading scenarios that would have been better served by Bulldozer and its derivatives, and we may one day see those applications appear to vindicate all the hard work they did with those chips. It’s quite possible for someone to build a custom Linux kernel that implements CMT by default, with custom applications that use it to maximum benefit, but that’s a lot of work for so very little gain. By contrast, getting things up and running on Intel’s Core architecture, which has been working in more or less the same way for a decade, is a lot simpler to manage and develop for.
That’s where Zen is going – it’s mimicking some of the things that Intel’s Core does, but it adds some original spins to the way things work. It stands a better chance of outpacing Intel in workloads that aren’t 7zip or x.264 encode, and it clearly leeches from Jim Keller’s work on mobile chips to promote efficiency. Just about every part of the chip can be power or clock gated to save on energy, and there’s even the scenario where entire CCX units can be turned off. It’s a smart design, but not one that’s very easy to understand, and we’ll have to wait until AMD or its industry partners, like David Kanter, get to dive in deeper to explain why it’s shown to be beating a Core i7-6900K in a Blender benchmark on a render mode that’s traditionally made AMD lose performance when run on the CPU.
AMD is expected to launch Zen close to the end of 2016 on the socket AM4 platform. It will be complimented by Bristol Ridge (based on Excavator) while it lacks an APU offering, but that will be retired halfway through 2017. No pricing or product configurations is available yet, but be prepared to pay a lot of money to own a Zen-based PC in the final weeks of 2016 – general availability is expected only in 2017, with socket AM4 motherboards coming in a little before that.