There has been a spate of Larrabee information during the last two weeks. Two GDC talks (slides near the bottom of this page), a prototype library, and an article by Michael Abrash on the Dr. Dobb’s website.
Dr. Dobb’s Journal has been out of print since February, but for many years it was one of the leading software publications. When initially published in 1976 (as Dr. Dobb’s Journal of Computer Calisthenics & Orthodontia) it was the first journal focusing on software development for microcomputers. Michael Abrash wrote many articles for Dr. Dobb’s over the years, including a series on the Quake software renderer in the mid 90’s. This series made a great impression on me; when it was published I was considering a career change from microprocessor design to graphics programming. At the time, I was working on Intel’s P55C processor, publicly known as “Pentium with MMX Technology”. This chip was notable both for being the first X86 processor with a SIMD (single instruction multiple data) instruction set, and for being the last CPU to use the in-order Pentium micro-architecture.
When Michael Abrash wrote the Quake articles game rendering was 100% software, mostly written in assembly language. Abrash was the uber-game programmer, having worked on DOOM, written the Quake renderer, and published (in addition to his Dr. Dobb’s articles) many influential books about graphics programming, assembly and optimization (the last of which is available online).
Within a few years (around the time I finally made the jump from CPU design to game graphics programming), it seemed to many that graphics hardware and compiler improvements had made software rendering and hand-coded assembly obsolete. This was mirrored by my own experience; I was hired to my first game industry job on the strength of a software rasterization demo (written mostly in assembler) and by the time the game shipped, it required graphics hardware and contained very little assembly (none written by me). Abrash started applying his considerable skills to what he saw as the next unsolved hard problem: natural language processing.
But he couldn’t stay away from graphics for long; when Microsoft started working on the XBox console he got involved in its design. In the early 2000’s, he figured out that there was a market for software renderers after all, mostly due to the mess of caps bits, unorthogonal feature support, and flaky compliance that characterized low-end graphics hardware at the time (Intel was among the greatest offenders; compounding the problem, its graphics chips sold very well so there were a lot of them out there). With Mike Sartain (another XBox designer), he wrote Pixomatic, a software renderer published by RAD Game Tools (until then mostly known for the Miles sound library, perhaps the most widely-used middleware in the games industry). Of course, he published another series of articles in Dr. Dobb’s about the experience, where he discussed how he made use of SIMD instruction sets such as MMX and SSE when optimizing Pixomatic.
I found this particularly interesting due to my personal involvement with these instruction sets. After working on the first MMX hardware implementation I helped define its successor, which was twice as wide (128 bits instead of 64) and added support for floating-point SIMD. This instruction set was at first called MMX2, then VX, and finally split into two separate instruction sets: SSE and SSE2. By this time SIMD instruction set extensions were becoming quite popular; AMD had their own version called 3DNow!, and PowerPC had the AltiVec instruction set. Intel kept on adding new SIMD extensions: SSE3, SSSE3, SSE4.1, SSE4.2, and AVX.
As Abrash details in the Larrabee article, Larrabee got started when he decided to talk to Intel about some ideas for SIMD instructions to accelerate software rasterization. As a result, Larrabee includes a powerful set of SIMD instructions. Much wider than previous instruction sets (512 bits instead of 128, or 256 in the case of AVX), Larrabee’s instruction set contains several instructions tailored to software rasterization. It is also general enough to allow for automatic code vectorization of a wide variety of loops. Abrash had a key role in the design of the instruction set, bringing software rasterization back into the mainstream.
Besides a good instruction set, Larrabee also needed an efficient hardware design with a large number of cores. Each of these cores needed to be very efficient in terms of performance-per-Watt and per-transistor. Since the Larrabee team started out as a skunkworks, they couldn’t afford to design a brand-new core so they looked at previous Intel cores, and the old in-order Pentium core (almost the same one I used in the P55C) was the one chosen.
What I find fascinating about this story is that Abrash managed to follow rasterization all the way around the Wheel of Reincarnation. This term refers to the common process where a piece of computing functionality is first implemented in software, then moves to special-purpose hardware which gradually becomes more general until it rivals a CPU in complexity, at which point the functionality is folded back into software. It was coined in a 1968 article by T. H, Myer and Ivan Sutherland (the latter is widely considered the father of computer graphics).