Tech: AMD’s GCN architecture explained

We’re almost two months out of the birthday of the AMD Radeon HD7970, and AMD has been keeping details of the lower-priced models a secret. While that doesn’t help system builders who’re wondering which GPU they should go for, everyone should first stop and have a look at the new architecture AMD had been pandering – GCN, or Graphics Core Next, has the potential to send existing Radeons running to hide away in shame with its notable performance benefits. Not to mention that Nvidia’s already sweating away to finish current development on their next Geforce series.

So what does GCN bring to you, the system builder/home user/power user?

Firstly, it needs to be said that the current VLIW4 (Very Long Instruction Word) architecture in the Radeon HD6000 series isn’t performing to full capacity. In fact, if I’m honest, neither were the preceding two generations – threads that were lined up in the older architecture had to wait until ones they were dependent on were finished. Similarly to Bulldozer’s current issue, threads weren’t moved onto compute units that were idle or had just finished their workload, and that meant that many clock cycles were used to finish one line of code. With GCN, that’s no longer the case.

With code that has dependencies, developers can now have those linked threads parked separately. For example, if you have 24 threads ready for processing and six of them are dependent on each other the old designs would have finished half the threads first. If thread 6 was dependent on the outcome of thread 14, it couldn’t be finished concurrently and had to be parked in a core, waiting for a result. If thread 8 finished early, nothing could replace it in the current clock cycle while waiting for other code to finish, so resources would go wasted.

GCN can change this by going through the dependencies chronologically first to check which threads require the other to finish beforehand. If threads 1 and 3 aren’t dependent but threads 6 and 7 require outcomes from 14 and 15, the compute unit addresses threads 1, 3, 14 and 15 in the first clock cycle. Now there’s no wasted space and all threads are completed in less cycles and in a much less complex manner. Developers who had to spend hours optimizing code for VLIW4 can now code away in a much more logical manner now that the GCN design is more intelligent and will now sort out the code for you.

What does this mean for the layman? For one, better application performance. While the current crop of apps requiring a Radeon/Nvidia GPU aren’t showing any performance gains compared to VLIW4, its mainly because those apps have been coded to run in a certain way, and they’ll only run the same way for GCN anyway. From now on, we can expect more efficient use of the GCN compute units – GPU-accelerated applications will receive a performance boost, games will run better, and GPGPU will also see some nice gains as well.

All this doesn’t bode well for the Nvidia camp. Their next game-changer, Kepler, is only slated for a April/May 2012 release, and we’ll be taking that one with a pinch of salt. If their release is anything like the 400-series, general availability will only be two to three months from release date and by then AMD will be dropping pricing and detailing their mid-to-low-end cards. This year looks like a no-win for Nvidia; lets hope they make up for their poor start with outstanding performance. After all, the competition will only benefit us in the end!

Evo Online
Evo Online fighting game tournament cancelled over claims of sexual misconduct