Author Archives: Naty

GPU REYES Implementation

Pixar’s Renderman rendering package is based on the REYES rendering pipeline (an acronym for the humble phrase “Render Everything You Ever Saw”). Most film studios use Pixar’s Renderman, and many others use renderers operating on similar principles. A close reading of the original REYES paper shows a pipeline which was designed to be extremely efficient (it had to be, to run on 1980’s hardware!) and produce very high quality images. I have long thought that this pipeline is a good fit for graphics hardware (given some minor changes or an increase in generality), and is perhaps a better fit to today’s dense scenes than the traditional triangle pipeline. A paper to be published in SIGGRAPH Asia this year describes a GPU implementation of the subdivision stages of the REYES pipeline, which is a key step towards a full GPU REYES implementation. They use CUDA for the subdivision stages, and then pass the resulting micropolygons to a traditional rendering pass. Although combining CUDA and traditional rendering in this manner introduces performance problems, newer APIs such as DX11 compute shaders have been designed to perform well under such conditions. Of course, this algorithm would be a great fit for Larrabee.

Anyone interested in the implementation details of the REYES algorithm should also read “How PhotoRealistic RenderMan Works”, which is available as a chapter in the book Advanced Renderman and in the SIGGRAPH 2000 Renderman course notes.

I found this paper on Ke-Sen Huang‘s SIGGRAPH Asia preprint page. Ke-Sen performs an invaluable service to the community by providing links to preprints of papers from all the major graphics-related conferences. This preprint page is all the more impressive when you realize that SIGGRAPH Asia has not even published a list of accepted papers yet!

Gamefest presentations and other links

Christer Ericson points out in a recent blog post that Microsoft has uploaded the Gamefest 2008 slides. These include a lot of relevant information, especially in relation to Direct3D 11. Christer’s post has many other links to interesting stuff – I particularly liked Iñigo Quilez’s slides on raycasting distance fields. Distance fields (sometimes referred to as Euclidean distance transforms, though that properly refers to the process of creating such a distance field) are very useful data structures. As Valve showed at SIGGRAPH last year, they can also be used for cheap vector shapes (the basic form of their technique is a better way to generate data for alpha testing, with zero runtime cost!).

Kayvon Fatahalian on Comparing GPUs

On his new blog, Kayvon Fatahalian has an interesting article which explains how to compare the confusing core counts quoted for new GPUs by companies such as NVIDIA, AMD and Intel. It casts a rare ray of light into an area often obscured by marketing claims.

Disk-Based Global Illumination in RenderMan

In Section 9.1 of our book we discuss Bunnell’s disk-based approximation for computing dynamic ambient occlusion and indirect lighting, and mention that this technique was used by ILM when performing renders for the Pirates of the Caribbean films.

Recently, more details on this technique have appeared in a RenderMan Technical Memo called Point-Based Approximate Color Bleeding, available on Pixar’s publication page. Pixar have implemented an interesting global illumination algorithm based in part on Bunnell’s disk approximation, which is used for transfer over intermediate distances. Spherical harmonics are used to approximate distant transfer and ray-tracing is used for transfer between nearby points. This technique is now built into Pixar’s RenderMan and has been used in over 12 films to date, including Pixar’s own Wall-E. it is interesting to see a technique originating from real-time rendering used in film production; the opposite is much more usual. The paper is worth a close read – perhaps someone will close the loop by adapting some of Pixar’s enhancements into new real-time techniques.

Pixar’s publication page is a valuable resource. The papers span a quarter century, and most of them have been very influential in the field. The first seven papers gave us the Cook-Torrance BRDF, programmable shaders, distributed ray tracing, image compositing, stochastic sampling, percentage-closer filtering, and the REYES rendering architecture (upon which almost all film production renderers are based). the page includes many other important papers as well as SIGGRAPH course notes and Renderman Technical Memos.

SIGGRAPH 2008: Bilateral Filters

The class “A Gentle Introduction to Bilateral Filtering and its Applications” was very well-presented. Bilateral filters are edge-preserving smoothing filters that have many applications in rendering and computational photography. The basic concept was clearly explained, and then various variants, related techniques, optimized implementations and applications were discussed. The full slides as well as detailed course notes are available here. Currently they are from the SIGGRAPH 2007 course; I assume the 2008 slides will replace them soon.

A related technique, which appears to have some interesting advantages over the bilateral filter, was presented in a paper this year, titled “Edge-Preserving Decompositions for Multi-Scale Tone and Detail Manipulation”. It presents a novel edge-preserving smoothing operator based on weighted least squares optimization. The paper and various supplementary materials are available here.

OpenGL 3.0 spec released

The Khronos group has finally released the spec for OpenGL 3.0. More details can be found here.

SIGGRAPH 2008: Beyond Programmable Shading Class

This class was about non-traditional processing performed on GPUs, similar to GPGPU but for graphics. As we discuss in the “Futures” chapter at the end of our book, this is a particularly interesting direction of research and may well represent the future of rendering. The recent disclosures on Direct3D 11 Compute Shaders and Larrabee make this a particularly hot topic.

The full course notes are available at the course web site.

The talk by Jon Olick from id software was perhaps the most interesting. He discussed a sparse voxel octree data structure which is rendered directly using CUDA. This extends id’s megatexture idea to geometry and may very well find its way into id’s next engine in some form.

SIGGRAPH 2008: The Authors Meet

All the work on the book was done remotely, via email and CVS. In fact, I had never met Tomas until this morning at SIGGRAPH. Here you can see all three of us, pleased as punch that the book is finally done. Left to right, Eric, Naty, Tomas:

(Eric here. I guess this is a tradition: Tomas and I didn’t meet until after the first edition was published.)

SIGGRAPH 2008: Advances in Real-Time Rendering in 3D Graphics and Games

I attended the “Advances in Real-Time Rendering in 3D Graphics and Games” class today at SIGGRAPH. This is the third year in a row Natasha Tatarchuk from AMD has organized this class. Each year different game developers as well as people from the AMD demo team are brought in to talk about graphics, and some of the best real-time stuff at SIGGRAPH in the last two years has been in this course.

Unfortunately, due to Little Big Planet crunch, Alex Evans from Media Molecule was unable to give his planned talk and a different speaker was brought in instead. This was a bit of a bummer since Alex’s SIGGRAPH 2006 talk was very good and I was hoping to hear more about his unorthodox take on real-time rendering.

The remaining talks were of high quality, including talks by the developers of games such as Halo 3, Starcraft 2 and Crysis. Unlike previous years, where it took many weeks for the course notes to be available online, the full course notes are already available at AMD’s Technical Publications page – check them out!

Direct 3D Details Part V: Other Features

This grab-bag of a post summarizes the various other features of Direct3D 11 which Microsoft described at Gamefest.

Dynamic shader linkage is supported (similar to the interfaces feature of Cg). This allows for separate light and material shaders to be written and compiled. These are later linked when the shader is set. This offers a solution to the combinatorial explosion resulting from a variety of lights and materials (this explosion, and some other solutions to it, are discussed in section 7.9 of our book).

Two new compressed texture formats have been added. BC6 supports high dynamic range RGB textures, using 1 byte per texel (instead of 6 bytes for an RGB 16-bit float texture). BC7 supports low dynamic range RGB or RGBA textures. It also uses one byte per texel (like DXT5/BC3), but offers significantly higher quality than texture formats available in D3D10. Both formats offer multiple block types (the compression tool selects the appropriate block type based on its content).

The block compression formats in D3D9 and D3D10 are based on the idea that each 4×4 texel block has all its values arranged along a single line, and the bits for each texel encode where on the line it is placed. For example, in DXT1/BC1, a line in RGB space is represented by two RGB endpoints, and each texel gets two bits to select one of four points along the line.

The new D3D11 formats support block types with one, two or even three (in the case of BC7) color lines. There is a tradeoff between the number of lines and the number of points along each line, since each block takes up the same amount of memory.

In principle, a 4×4 block with two color lines would need 16 additional bits per block to determine which line each texel was associated with (even more bits are needed for three color lines). To reduce storage requirements, only a subset of possible line association patterns are supported. The compression tool selects the best association out of this subset for each block.

Direct3D11 also tightens up the texture specifications. Decompression results must be bit-accurate, and subtexel/submip filtering precision is required to be at least 8 bits.

Direct3D11 increases the texture size limits from 8K texels to 16K texels. Note that a 16K x 16K DXT1/BC1 texture takes up 128MB – not many games will have textures this large! In general, D3D11 allows for resources as large as 2GB.

Hardware can optionally support double-precision floats. This was the only optional feature of D3D11 mentioned at Gamefest.

There was a slide listing a bunch of other features without further explanation. Most are a bit mysterious, but I list them here in case someone else is able to puzzle out what they mean:

Addressable Stream Out
Draw Indirect
Pull-model attribute eval
Improved Gather4
Min-LOD texture clamps
Conservative oDepth
Geometry shader instance programming model
Read-only depth or stencil views

This completes my report on Direct3D11 from Gamefest. Check the XNA Presentations Page for the slides and audio – they are not up there yet, but hopefully will be there soon.