Go Google Intel’s scrapped Larrabee project and you’ll wonder why Intel abandoned development so early on in the game. Its possible that they may have had knowledge of both AMD and Nvidia’s plans for their respective Compute abilities long before those products got out of the early prototype process, or they may have realised that they could actually do something very special with the project. So they scrapped it before birth and there’s been few follow-up details since. Intel’s own benchmarks showed a 48-core Larrabee performing quite admirably in a few games and it could have gained a lot of ground with the right marketing push and developer incentive. It became a running joke for many April fools gags, including one in NAG’s April 2010 issue.

Announced by Intel in 2008,  Larrabee was actually intended to drive a wedge between desktop graphics chips and AMD’s FirePro and Nvidia’s Tesla/Quadro graphics chips, both enjoying good uptake in the server and workstation markets. Nvidia’s Tesla in particular is the go-to solution for businesses, universities and professionals looking for strong computing performance and double-point precision for things like video renders, software development, real-time computing (necessary for robotics) and crunching large datasets into something more useable. Larrabee was shelved on May 2010. On Tuesday it resurfaced as Intel’s Knights Corner GPU, a high-performance, multi-purpose heterogenous GPU that will be a part of their new Xeon Phi graphics line. 

Firstly, there are a couple of details to take in. Intel recently acquired a part of Cray’s IP by buying out their heterogeneous computing solution that Cray has used in their servers for years, known as Cray Interconnect. The gist of Interconnect is that it allows large-scale deployments of processors or full Cray servers together, managed by and used to run a single version of an OS. To a large degree it allows granular control of each server, right down to which processors are powered, which are used to run specific tasks and how much memory may be allocated to each CPU. Its an incredibly powerful solution in the right hands and Intel bought it out for use in their own server lineup and, subsequently, Xeon Phi. Using two or more Xeon Phi processors will be better managed using Cray Interconnect, scaling things up very nicely with little to no performance deficits. Although the tech is never mentioned, its likely the only way information could pass between the CPU and the GPU so seamlessly as Intel claims below.

Secondly, Larrabee’s initial prototypes stopped at 48 cores. Intel was at a loss of how to overcome the issues they were having with the prototype builds and the 45nm process. They were able to cram lots of Pentium I-like chips onto the PCB but there were hiccups, particularly with heat issues. Computing performance was up at a very acceptable 1TFLOPS for double-point math, but it was way behind in getting anywhere near to market in 2010. It was never designed for games but could run Crysis if you asked Crytek for help very nicely.

Thirdly, Larrabee was developed with X86 processors and it was actually a branch of the Compute project that Intel’s been working on for the last decade, something they decided to try on a whim to see if it would work. It was very different from any GPU ever made and most likely wouldn’t enjoy the kind of support in OpenGL and Direct X it would have needed. Some of the lessons Intel learned with Larrabee were used in putting together their own APU, adding the HD graphics chip to their desktop and laptop processors and using it to power technologies like QuickSync. Things are really heating up in the graphics department for Intel and they’re only continuing to perform better and better as time passes. And some of the things it learned from Ivy Bridge it concurrently applied to the Larrabee project, resulting in the Xeon Phi Coprocessor.

All the lessons learned, experience gained and IP bought has resulted in what could be, quite possibly, the best all-round multi-threaded compute solution on the planet. The Xeon Phi GPU integrates as little as 50 Pentium I cores on a single PCB with the 22nm process using the same 3D-layered transistors as seen on Ivy Bridge processors, with floating-point math performance sitting at the original 1TFLOP goal Intel decided on back in 2008. Memory is up at 8GB of GDDR5 and is serviced through either a 256 or 512-bit bus. And it comes with its own OS embedded, a customised, instant-on Linux distro that allows full control over the card and all its processors. It is, in fact, a supercomputer on a Radeon HD7970-size PCB.

Where would it fit in?

Intel’s plan is to have it sitting alongside Xeon workstations and servers that require a lot of power. It will mainly contend with the higher-priced parallel GPUs like Tesla and FirePro and could, rather unlike Tesla, fit into any computer you run. Coexisting with a Xeon chip, the GPU could be called to calculate certain things that your regular CPU can’t crunch through on its own, powering down when its not needed. The best thing about it is that all the chips are regular x86 processors. Developers and programmers would have a field day getting things to run, without having to rely on support for Nvidia’s CUDA or AMD’s OpenCL open-source standard that its been pushing for the past few months.

In its own right, Xeon Phi is a force to be reckoned with. If Intel manages to get it sold in regular channels that Xeon chips appear in, it could break through with a fairly large market share potential right off the bat. Nvidia’s Tesla currently requires its own particular setup and software and also can only be bought through Nvidia-certified OEMs. Guess what? Cray is one of them, so already Intel has a foot in the door. If Intel ever releases slightly crippled versions in 25-core and 10-core editions, they’d easily find a large market of users needing good parallel performance, but not so much that they’re ready to siphon off from their retirement savings for it.

Why did it take so long to get this far?

Well, mainly because Intel’s ambitions back in 2002 were a little less serious. They still had the idea of using Pentium I chips, but had no way of powering them or getting heat issues fixed. In 2008 Larrabee tried to take on both the Geforce and Quadro lineups at once, but Intel realised that it could either perform on the graphics or parallel computing front – but not both at the same time. The introduction of GDDR5 RAM helped shape the final design and this year we had 22nm processors, PCI-Express 3.0 and a 3D-layered transistor design. It may have taken ten years but its a labour of love for Intel and its finally going to bring some benefits to the company. It also opens up the market for Intel into massively parallel computing scenarios and gets in the face of its biggest competitor, Nvidia.

What does this mean to you, dear reader?

Nothing directly, at the very least. The biggest change that Xeon Phi will be making is housing server farms – you no longer need a room with eight to ten server racks to make it viable, you could have one rack with up to eight dual-Xeon servers, powering up to four Xeon Phi chips each. That alone cuts down on support, energy and programming costs, as well as floor space and designing a proper server room. Having such a scalable x86 interface also makes it easier for a lot of heavy workload scenarios like weather prediction, robotics control, servicing a huge number of users through a terminal server and doing real-time calculations and predictions, helpful when you’re a city planner and have to accommodate processing performance for a number of tasks that need to be automated like traffic lights.

It could also improve performance in things like Apple’s Siri information lookups and speech recognition. Currently Siri doesn’t actually work on the phone – its a front-end app that really pushes all the hard work to a local dedicated server. Apple’s Siri servers are huge but there are times when the load just becomes too much. Expanding the server’s potential is rather easy with a card like this, rather than buying extra floor space and adding in more and more racks.

All in all, Intel’s foray into massively parallel computing is something that I’ve been waiting for, for years. If the blue team plays their cards right, they could eventually bundle a entry-level, 50-core Xeon Phi with every high-end Xeon server sold. Intel’s strength in the past hasn’t always been pure computing muscle, but rather a clever marketing team and enough influence to strongarm a few key industry players. That’s how you make a product succeed – it can be good, it can be better than the competition but it always needs to be marketed properly. And that’s Intel’s strong point.

Source: Anandtech, Fudzilla

Discuss this in the forums: Linky

More stuff like this: