It’s been over two years now since we first heard that AMD was planning a new CPU microarchitecture, and the time has finally come to unveil Ryzen. If you haven’t already seen the leaks of today’s slides yet, or found any of the super-early, possibly slightly buggy reviews, then this is exactly the article to be reading right now. Today we’ll talk about Ryzen, about how parts of its architecture work (although a detailed discussion on its boosting mode will be available separately), as well as a third article about cooling and motherboard choices, because this has just gotten a little trickier for system builders – but only if you don’t read today’s articles. Follow me!
Also, please excuse the use of the placeholder slides I received from AMD, rather than the originals that should have been sent to reviewers earlier. This is a measure AMD took to reduce the possibility of slides leaking out early, although by the looks of things that happened anyway. If I receive the cleaned up ones on time, I might switch them out.
A quick recap on Ryzen
Let’s have a quick refresher on Zen and Ryzen. It is a clean sheet design created by Jim Keller and his former team inside AMD’s engineering department. It does away with the Bulldozer design completely, and implements only the bare minimum of shared design elements. One Zen quad-core module is called a CPU Core Complex, or CCX for short. Each CCX carries four cores with their requisite L2 and L3 cache levels, along with the logic needed for enabling SMT, or simultaneous threading.
AMD can weave up to four of these together into what will eventually be their Naples server platform, which boasts up to 32 cores and 64 threads (16 cores on each chip in a dual-socket system), which is quite impressive.
Looking at a Summit Ridge die shot, composed of two CCX modules, you can also see some other interesting parts that are separate to each CCX, as well as what is shared. To the top left are the two memory controllers, which are linked to each CCX. On the top right and bottom left are what is currently assumed to be the Infinity Fabric logic that links everything together, allowing for transparent communication between each CCX. For a single CCX design, we might see that this die gets effectively cut in half.
The rest is all Greek to me – where the chipset logic, SATA controllers, and on-die USB 3.1 Gen 2 support go is a mystery, It will be great to see someone eventually create a labelled die shot from this work of art so that I can appreciate it more.
Zen is designed to hold back the flow of data as little as possible. Its branch predictor has been upgraded with neural branch prediction, improving how accurately the CPU logic can look at instructions it needs to run in the future and guess their outcome using historical data. It has a large op cache, which acts as a temporary store for instructions that it’ll have to do commonly when running a particular workload. It can schedule more instructions for each core, it retires old instructions faster (which is basically pushing them into L2 cache), and it has a beefed up floating point unit composed of two 128-bit units, although the way it works is a bit nuanced. For every clock cycle, a Zen core has to break down one 256-bit floating point instruction into two 128-bit commands, then complete one 128-bit instruction in the same cycle, and run the second 128-bit instruction in the following cycle.
This is done to reduce the complexity of the design and save on power, because if you don’t need to issue a 256-bit instruction to the FPU, you only need to put one of the FPU units to work. By contrast, Intel’s FPU can perform one 256-bit instruction in a single clock cycle, which means that in two cycles it is twice as fast as a Zen FPU. It will be interesting to see how Ryzen handles floating point workloads like AVX 2.0 instructions, but my gut feeling tells me they’ve figured out how to match Intel’s design by using more separate FPUs on other cores.
Zen is also no longer plagued by poor cache design, and it uses its cache less aggressively than Bulldozer did (necessary because Bulldozer’s cache was slow). This will help tremendously for workloads which reside in the cache system and not main memory. The microarchitecture is designed for low power operation, and it can power-gate and turn off almost every non-critical part of the system to save on power. You’ll see later on that this is to AMD’s benefit – they are more efficient clock-for-clock and core-for-core than Intel’s Broadwell-E platform.
Finally, Zen exceeds every expectation AMD had of it on paper, which is surprising to me. They clearly had everything riding on this architecture working well, and it does to the tune of being 52% faster clock-for-clock than Excavator, their previous architecture. Considering that Excavator was already 30% faster than Bulldozer in IPC improvements, this puts AMD close to a 100% improvement over Bulldozer in its initial design, and close, if not equal, to Intel’s Broadwell-E architecture.
Let me put this into perspective for you. Skylake is only 5% faster than Broadwell when it comes to IPC improvements, and Kaby Lake has no IPC improvement over Skylake. Both get most of their performance boosts from running higher clock speeds. Zen is basically less than 10% away from matching Skylake in single-core workloads at the same clock speed. We have an unofficial saying in the industry for moments like these: “That is some good shit right there!”
Ryzen in the market
AMD briefly told me how things were going to work out for Ryzen in various market segments, and it’s clear that they’re aiming to be aggressive in every possible way. There will be three segments to the Ryzen family: R7, R5, and R3, which cover the enthusiast, high performance, and mainstream markets. Inside these segments will be individual chips with denominators in their name which tell you which performance level it sits on – 7 and 8 for enthusiasts, 4 to 6 for high performance parts, and 3 and below for mainstream processors, which still have to be detailed. The other two numbers are used for indicating speed bumps or a different SKU in the future, but as we see with Polaris’ naming scheme, AMD never really ended up using these.
The power suffixes AMD provided are interesting, though a bit confusing. The “X” suffix is presumably for Ryzen CPUs with AMD’s new extended frequency range (XFR) turbo boost range, but we later learned that XFR is enabled on all Ryzen processors. The “X” suffix indicates a larger XFR window than normal, and on non-X chips this window is 50MHz above the top boost clock. “G” and “S” is a deskop Ryzen chip with a GCN graphics core attached, which will be used for Raven Ridge APUs. AMD is also shoving Ryzen into laptops as we speak, readying the platform for a mobile launch at CES 2017 later this year. “H”, “U” and “M” are reserved for those parts.
XFR and Precision Boost
This was, by far, the slide that AMD spent the most time on in the presentation, and it includes a lot of hidden gems that people might not be aware of. Let’s start off with the base clock, the brown line at 3.6GHz. In a R7 1800X, this is the default clock speed for all cores when not thermally throttled, which is quite good – this is on par with the default clock speed of the Intel Core i7-6900K. If your chip is thermally throttled for whatever reason, be it the workload or the cooling capacity, it will automatically downclock itself to 3.2GHz, a 12% reduction in clock speed. If you see benchmarks today where the single-core workload is around 12% lower than other reviews using better coolers, this is why.
Above the base clock speed is the all-core boost clock of 3.7GHz. This is how the R7 1800X gains its place right next to Broadwell-E, and why it closes the performance gap in AMD’s early benchmarks – it was running approximately 2.7% faster than the base clock speed of a Core i7-6900K. From there, if you have a workload that uses two cores, or less, the chip activates its boost speed of 4.0GHz for both cores and the associated threads with that core, effectively creating a 4.0GHz dual-core, four-threaded processor. This is how the R7 1800X is going to maintain its performance parity in lightly threaded games with Intel’s Skylake processors, by beating the average clock speed of the Core i5-6600K.
Above the regular turbo boost is the XFR range, which nets you an extra 100MHz of clock speed so long as you have a good cooler. XFR boost is, again, only with two or less active cores, and so reaching it with all four cores, even on a 1800X, will be a lofty goal. This is why the 1800X is still behind a Core i7-7700K in single-core benchmarks, but not by much – the jump from 4.1GHz to 4.5GHz is an 8.9% increase in clock speed, and in real-world terms, Kaby Lake is 7% ahead in benchmarks using a single core.
There is more to this discussion than I can add in this article, and an extended look at Precision Boost, along with AMD’s frequencies is available in a separate article today. The reason why I’ve written that article is because all the pieces of the puzzle seem to fit, but one in particular doesn’t, and that is the stock performance of the R7 1700 against the Core i7-7700K.
One last thing before we move on – XFR is not limited to just the X-series chips, and it is not limited to just the X370 family of motherboards. It works on all processors and all motherboards, and the boost clocks are the same on an A320 motherboard as they are on an X370 full-ATX behemoth.
A whole new ecosystem of motherboards and chipsets
While I’ve revealed and discussed some of the chipset features and functions before, the latest press briefing revealed some new details we weren’t aware of until now. To recap, at launch we will see motherboards with the X370, B350, and A320 chipsets from AMD on socket AM4. The table above details the chipset options that you get by default from the processor itself, and anything extra on the motherboard will come from add-on third-party manufacturers. However, the line between where chipset features from the CPU, and those from the third-party chipset come from, is very blurred, and thus motherboard choice is equally important if you want to get the most out of the new platform. The chipsets that will commonly govern extra USB 3.1 and SATA connectivity will typically come from ASMedia, although as time passes other solutions, possibly better ones, will be available.
Of note is that no motherboard on launch will support NVMe RAID through the M.2 slots. There just isn’t the required logic to make this work, plain and simple. It is possible that a third-party chipset might be able to handle two x4 PCIe NVMe drives in RAID, but not only is that going to be expensive, that chipset would also have to be very, very well architected.
The new information that I was talking about earlier is the full specification of the X300 and A300 motherboard chipsets. These are primarily designed for small-form-factor products like mini-ITX, although AMD told me that nothing technically prevents motherboard manufacturers from putting this chipset on, say, a mATX or ATX motherboard. This presents some interesting possibilities. A vendor like ASRock might decide to build a budget mATX board based on the X370 chipset and price it among Intel’s cheaper mATX Z270 motherboards. However, if they want to be super-aggressive with taking over the mainstream market, they could alternatively build a mATX/mini-ITX/FlexATX X300 chipset motherboard, priced similarly to cheap A320 motherboards, with the option to overclock any socket AM4 processor.
It’s easy to predict how it would be laid out as well. An mATX X300 series motherboard would have one PCIe 3.0 x16 slot at the top, with two SATA 3.0 ports from the CPU, one PCIe 3.0 x2 M.2 slot for solid state drives, no SATA express port, and a second PCIe 3.0 x4 slot for a second graphics card or a PCIe NVMe solid state drive. Compared to a similar board with the A320 chipset, you lose USB 3.1 Gen 2 support, two SATA ports, and the ability to use RAID 10. That is a compromise I’m sure many budget-minded enthusiasts would be happy to live with.
With that in mind, the slide above now gets way more interesting than AMD presented it during the press call. X300 is intended for smaller or cheaper motherboards without a lot of space for routing traces, and it’s cheaper to implement than the regular X370 chipset. Intel cannot compete with a fully unlocked product stack on both the processor and motherboard side.
AMD also noted how memory support would work by default on their platform, and this is why pre-ordering those motherboards on 25 February was a risky proposition. If you’re filling up all four DIMM slots, you’re limited to either 2133MHz or 1866MHz frequencies, depending on whether your RAM has chips on one side of the PCB, or both sides. The same goes for filling two DIMM slots, with slightly higher frequencies on offer. Keep in mind that this will change as AMD works on improving the performance of their integrated memory controller, and this is also highly motherboard dependent.
Vendors like ASUS and Gigabyte are already working on DDR4-3400 support on their X370 and B350 motherboards, and other vendors are putting in the work required to increase the baseline frequencies advertised by AMD here. It’s not a train smash by any means because Intel’s B250 and H270 Kaby Lake motherboards are limited to DDR4-2400 support.
Finally, gaming performance
While Neo’s review of Ryzen and an X370 motherboard is delayed somewhat by early teething issues, you can take a look at the kind of performance AMD expects will be on offer with the R7 1800X, 1700X, and the R7 1700, taken with salt because the Intel results are run with the separately available stock cooler. For 4K gaming, AMD pitted the R7 1800X against the Intel Core i7-6900K, with both systems running an NVIDIA GeForce GTX Titan XP, and you can see that in some games they are neck-and-neck with each other, with the Ryzen system eking out some small wins over Broadwell-E. In the 99th percentile frames, which is the minimum framerate you’re going to see from such a system 99% of the time, the R7 1800X does have a small lead in four of the games tested, while the average results show Broadwell-E only leading in one benchmark.
At 1440p running a GeForce GTX 1080, the R7 1700X squares off with the Core i7-6800K quite well, but has less of an even footing than the 1800X does against its rival. The lead Intel has in these benchmarks is small, and perhaps easily overcome with a slight overclock on the R7 1700X. What is very interesting is the results on the R7 1700 compared to the Core i7-7700K, which is very, very close in the GPU-reliant benchmarks, and only trails significantly in GTA V, a CPU-intensive game. The low clock speed of the R7 1700 is a detriment in this area, but the rest of the results bode well for it – in GPU-limited scenarios, there’s no reason to consider the Kaby Lake processor over the Ryzen chip, and vice versa. They’re both largely going to be on an equal footing.
Bringing this brief and recap to a close, this chart just about sums it up in one image. In one generation, AMD has managed to massively improve their standing in the market, and when it comes to multi-threaded performance, the price is quite scary. For the same amount of money, AMD offers up to 60% more performance than competing Intel processors, and it invalidates the Core i7-6900K by a significant margin for gamers and professionals who don’t need all that PCIe connectivity or massive amounts of memory bandwidth. The Ryzen 7 family is good value for money, and it’s going to be difficult to counter this by price drops alone, because the X99 and Z270 platforms are not cheap by any means. AMD can get away with using cheaper motherboards based on, say, the B350 chipset, and still offer the same performance with a greatly reduced platform cost.
That leaves one wondering what Intel will do next. The last time this happened, Intel ceded market share to AMD for a year and kept their Pentium Extreme Edition processor at its default price point – people still bought it despite AMD’s Athlon 3200+ offering identical performance for less money. That may turn out to be the case here as well, but Intel is in a precarious position at the moment. They have no new architecture to debut next year for the consumer market, because Cannon Lake was intended for consumer sockets, and no 10nm process to boast about either, because that has been delayed and will come to datacenter and enterprise products first.
Regardless of the rumours about “Xeon Gold” processors, or a fabled X299 chipset coming in the second half of this year, the reality is that AMD has caught Intel by surprise, and there is very little they can do now to stop AMD regaining their market share.