I was writing other things about half an hour before starting this, but my concentration broke when I received a Google Alert for one of my keywords that I had set up: “Zen architecture performance”. Turns out, AMD was all sneaky-sneaky this week, luring journalists into a conference room in San Francisco on the evening of 17 August, at the same time that they were attending the Intel Developer Forum conference this past week. AMD’s done this same thing before a few years back, inviting journalists to a preview of their Bulldozer architecture during an IDF conference, but back then they also made puns about it by handing out super-hot chilli-flavoured potato chips. This time, they were a bit more serious than that.
First off, let’s recap quickly. You might have heard about Zen for the last few months, but perhaps you don’t have the full picture of what it’s about, or why it exists. Here’s some points to consider:
- Zen is a ground-up new design led by Jim Keller and his design team at AMD.
- Zen is not Bulldozer, although it takes some things learned from Bulldozer’s failures to improve performance and design efficiency, and streamline throughput.
- If it’s similar to anything from AMD’s stables, Zen is more like the Jaguar cores found inside the PS4 and Xbox One. Architecturally, it’s as big of a shift as Pentium 4 to Core was back in the day.
- It’s built on a 14nm FinFET process. Older CPUs and APUs from AMD were on 28nm planar processes, so there’s an inherent savings in power draw as well as heat production.
- Zen scales. There are chip packages in the works that accommodate a 32-core, 64-thread design on the new server sockets. Bulldozer could never do that on a single package.
- With Zen comes a new socket, AM4, as well as a completely new chipset supporting all the latest connectivity features.
- Zen’s expected desktop launch is somewhere early in Q1 2017. Mobile versions come in Q2 2017 for notebooks. Four and eight-core hyper-threaded versions are expected to be available at launch.
- Socket AM4 boards supporting DDR4, Zen, and the Carrizo-based Bristol Ridge family come out in Q4 2016. Alternatively, there’s also room for a September release, but no-one in the media knows how likely this is to happen.
- Zen is expected to be 40% faster than Carrizo at the same clock speed, and possibly able to keep up with Intel’s Broadwell Core i7 family at the high-end. Zen isn’t targeting Intel Skylake or Kaby Lake products.
- Zen is only the first generation of products. A successor, Zen+ is already in the pipeline, and may close the gap to whatever Intel has out in 2018.
It’s funny looking over those bullet points because up to now, that’s all that anyone outside of AMD or their partner’s labs knew. Zen has been such a well-kept secret up to this point that it’s quite astonishing to see the lack of any credible leaks from WCCF Tech, Guru3D, or Videocardz fill in the gaps. It’s possible that Intel has good ideas about where Zen will fall in line on the market, but they also have sketchy and incomplete information to go on. We may all be in for a surprise if AMD pulls this off.
AMD doesn’t provide any concrete timelines with regards to Zen’s launch next year, or even when its successor is supposed to appear. The lack of a time scale on this graph suggests that they’re playing it as close to the chest as possible – even the expected performance uplift from Zen+ is a guess at best (and it’s almost certain that they have Zen+ designs working already, either in early silicon, or in a chip simulation). It’s clear that AMD also wants to leave the Bulldozer legacy behind – it doesn’t even mention the in-between architecture updates on this slide, which is unfortunate because the engineering teams worked their magic a lot to bring up performance to where it is today with Carrizo/Excavator, where the Athlon X4 845 occasionally betters the Intel Core i3-6100 in some synthetic benchmarks.
For Zen, AMD also claims that there’s a large efficiency gain on the table as well. This isn’t actually given any number to relate to, and there’s good reason for that. AMD is not revealing clock speeds or final TDPs for any Zen processors, and probably won’t be doing so until closer to the launch window. In fact, putting any number in there to talk about “energy per cycle” may even give Intel an idea of how close they are to the performance of Broadwell and Skylake. Simple maths may give the game away.
The confusing thing about this slide is that the promised energy per cycle is listed as being at the same level as Excavator. That may turn out to be different in real life scenarios, because I don’t think AMD is taking process advancements into account here. If this ends up being correct, however, then Zen’s efficiency may quite interesting. It might use the same amount of power for a job as excavator, but deliver 40% more performance or, alternatively, it could be configured to deliver identical performance using 40% less power. The way the graph is drawn seems to imply that all of the efficiency gains are purely from the IPC improvements. What does that mean for mobile parts? I guess we’ll have to wait until next year to find out.
It should be noted, though, that the last time AMD had to choose between less power with identical performance (the jump from Kaveri to Godavari), or more performance with identical power, it chose to save energy rather than make their product faster. That may well be the choice they make with Zen, too, leaving it up to enthusiasts who decide to mess around with clock speeds and multipliers.
AMD didn’t reveal everything about Zen’s architecture, but we have enough now to make some guesses about performance. Core to the improvements is a new micro-operation queue with its own operation cache, which helps single-thread performance. In Bulldozer, the micro-op cache was often shared between modules, and sometimes code would stall in the pipeline because the cached information had to be fished out of another cache and then copied, adding extra steps into the cycle. A lot of Bulldozer’s problems were related to cache performance, something that’s been dogging AMD as far back as the Athlon II family. No matter how much development they put into other areas, caching and cache bandwidth has always been an issue for AMD’s CPU architectures. Now if a micro-op has to be run, the core will first look in the cache to see if it’s done that operation before in a recent session.
I’m not educated enough, or well-read in enough whitepapers, to point out any other obvious changes in performance from this small amount of information alone, but there is an interesting point about “instruction level parellism” that AMD notes here. Basically, the idea is that at an instruction level, the CPU can look at what code it’s supposed to run, and figure out if some operations can be done in parallel, or in-order. Processors based on in-order execution don’t run parallel code in parallel, if that wasn’t obvious from their name already, while out-of-order will run code in any fashion and even in parallel, so long as there are no dependencies, in which case it will try to run the dependent code first before.
Where branch prediction comes in is by analysing the code and figuring out which ones need to be run in-order and which lines can be run out-of-order or in parallel. You can also use speculative execution, where code is executed before its determined if it should be run out-of-order, but it’s not always efficient. Now take that knowledge you’ve just gained and look at the other improvements – AMD can schedule more instructions in their scheduler window, so that branch prediction works better. They can also allocate more execution resources to relieve any bottlenecks in the pipeline while code is being analysed and run. All of this points to higher single-threaded performance, but whether it’s actually going to produce any benefits is another story. This might all be low-hanging fruit that solves some bottlenecks and not others, and AMD may end up using simultaneous multi-threading to brute-force their way through heavy workloads.
Here’s something interesting to think about, though. AMD may have been ready to talk about Zen months ago, but delayed their announcement for a few months. The above slide, ostensibly shown off during the 2016 AMD financial analyst day set in May, was pulled from any and all press releases, and it’s only shown up online as part of leaks which at the time were declared fake, or that qualified as your “risky click of the day”. Take note that nothing in this slide has actually changed compared to the information we’re receiving today. Why was that decision taken in May to put a detailed architecture preview on hold until August? I’ve no idea. Three months isn’t a long time to wait and they could have talked about this at any time.
However, it’s possible that AMD thought Intel’s Kaby Lake would be out sooner, potentially aiming for a mid-year launch. It seems as if Intel has also run into some launch timing issues or concerns, and Kaby Lake now has no release date set for this year. It could still happen, but it’s unlikely. Perhaps they’re content selling through Skylake stock until they have enough Kaby Lake processors available to flood the market as they usually do.
With the looming threat of Kaby Lake hanging over AMD’s head, they chose to set up their preview later, closer to the HotChips presentation that was inevitable anyway, and sneakily used the opportunity of having basically every high-ranking tech journalist in San Francisco attending IDF at the same time to dole out a Zen preview as well. It’s a bold move, but it’s one they can make now that they’re confident that Kaby Lake won’t steal the limelight. It’s an interesting situation, to say the least.
There is also this cropped and labeled image of a rumored Zen die that popped up in the background of AMD’s website also in May 2016. This is now a confirmed hit, and that’s really what Zen looks like with the covers pulled back on the packaging. There’s still a lot of stuff to be discussed and revealed about the processor before it’s launched, and we may yet find out if the die areas labeled “GMI” really do allow hooking up multiple processors using AMD’s Freedom Fabric interconnect (as alluded to in the slide above). This is all very exciting and interesting, but remember the golden rule: “Wait for benchmarks!” This could turn into a revival for AMD, or it could be another flop like the first Phenom family.
But it’s probably not going to flop. Hell, it was designed by Jim Keller! Short of Alan Moore appearing on stage and giving Zen his blessing, this is the best chance AMD stands of making a comeback.
We’ll have to wait until the HotChips presentation, taking place less than a week from today, to see how much all of these decisions benefits Zen. Until then, we have naught but this quick overview of AMD’s presentation that they gave, in secret, to the press in San Francisco. Some of the comparisons to the Broadwell-E family are cool, but also keep in mind that if AMD’s slide detailing efficiency is accurate, it’s probably beating Broadwell-E in the Blender test while also having a lower TDP and a clock speed that isn’t a final shipping decision, while Broadwell-E is underclocked from its base 3.2GHz speeds. Zen is really interesting architecturally, and AMD is going to be spending a lot of time fending off accusations that it’s just copying an Intel Core design (it sort of does, but not really?), but it’s definitely a step in the right direction.
Even if it doesn’t break Intel’s stranglehold on the market, even if it doesn’t bring AMD immediate overnight success, it’s still going to be a winner for consumers. Competition in the CPU space is sorely lacking, and you only have to look at what’s in the shelves in your nearby PC stores, or at online retailers, to see that AMD’s presence locally could be better.