SIGGRAPH 2008: Bilateral Filters

The class “A Gentle Introduction to Bilateral Filtering and its Applications” was very well-presented. Bilateral filters are edge-preserving smoothing filters that have many applications in rendering and computational photography. The basic concept was clearly explained, and then various variants, related techniques, optimized implementations and applications were discussed. The full slides as well as detailed course notes are available here. Currently they are from the SIGGRAPH 2007 course; I assume the 2008 slides will replace them soon.

A related technique, which appears to have some interesting advantages over the bilateral filter, was presented in a paper this year, titled “Edge-Preserving Decompositions for Multi-Scale Tone and Detail Manipulation”. It presents a novel edge-preserving smoothing operator based on weighted least squares optimization. The paper and various supplementary materials are available here.

SIGGRAPH 2008: Beyond Programmable Shading Class

This class was about non-traditional processing performed on GPUs, similar to GPGPU but for graphics. As we discuss in the “Futures” chapter at the end of our book, this is a particularly interesting direction of research and may well represent the future of rendering. The recent disclosures on Direct3D 11 Compute Shaders and Larrabee make this a particularly hot topic.

The full course notes are available at the course web site.

The talk by Jon Olick from id software was perhaps the most interesting. He discussed a sparse voxel octree data structure which is rendered directly using CUDA. This extends id’s megatexture idea to geometry and may very well find its way into id’s next engine in some form.

SIGGRAPH 2008: The Authors Meet

All the work on the book was done remotely, via email and CVS. In fact, I had never met Tomas until this morning at SIGGRAPH. Here you can see all three of us, pleased as punch that the book is finally done. Left to right, Eric, Naty, Tomas:

(Eric here. I guess this is a tradition: Tomas and I didn’t meet until after the first edition was published.)

SIGGRAPH 2008: Advances in Real-Time Rendering in 3D Graphics and Games

I attended the “Advances in Real-Time Rendering in 3D Graphics and Games” class today at SIGGRAPH. This is the third year in a row Natasha Tatarchuk from AMD has organized this class. Each year different game developers as well as people from the AMD demo team are brought in to talk about graphics, and some of the best real-time stuff at SIGGRAPH in the last two years has been in this course.

Unfortunately, due to Little Big Planet crunch, Alex Evans from Media Molecule was unable to give his planned talk and a different speaker was brought in instead. This was a bit of a bummer since Alex’s SIGGRAPH 2006 talk was very good and I was hoping to hear more about his unorthodox take on real-time rendering.

The remaining talks were of high quality, including talks by the developers of games such as Halo 3, Starcraft 2 and Crysis. Unlike previous years, where it took many weeks for the course notes to be available online, the full course notes are already available at AMD’s Technical Publications page – check them out!

Direct 3D Details Part V: Other Features

This grab-bag of a post summarizes the various other features of Direct3D 11 which Microsoft described at Gamefest.

Dynamic shader linkage is supported (similar to the interfaces feature of Cg). This allows for separate light and material shaders to be written and compiled. These are later linked when the shader is set. This offers a solution to the combinatorial explosion resulting from a variety of lights and materials (this explosion, and some other solutions to it, are discussed in section 7.9 of our book).

Two new compressed texture formats have been added. BC6 supports high dynamic range RGB textures, using 1 byte per texel (instead of 6 bytes for an RGB 16-bit float texture). BC7 supports low dynamic range RGB or RGBA textures. It also uses one byte per texel (like DXT5/BC3), but offers significantly higher quality than texture formats available in D3D10. Both formats offer multiple block types (the compression tool selects the appropriate block type based on its content).

The block compression formats in D3D9 and D3D10 are based on the idea that each 4×4 texel block has all its values arranged along a single line, and the bits for each texel encode where on the line it is placed. For example, in DXT1/BC1, a line in RGB space is represented by two RGB endpoints, and each texel gets two bits to select one of four points along the line.

The new D3D11 formats support block types with one, two or even three (in the case of BC7) color lines. There is a tradeoff between the number of lines and the number of points along each line, since each block takes up the same amount of memory.

In principle, a 4×4 block with two color lines would need 16 additional bits per block to determine which line each texel was associated with (even more bits are needed for three color lines). To reduce storage requirements, only a subset of possible line association patterns are supported. The compression tool selects the best association out of this subset for each block.

Direct3D11 also tightens up the texture specifications. Decompression results must be bit-accurate, and subtexel/submip filtering precision is required to be at least 8 bits.

Direct3D11 increases the texture size limits from 8K texels to 16K texels. Note that a 16K x 16K DXT1/BC1 texture takes up 128MB – not many games will have textures this large! In general, D3D11 allows for resources as large as 2GB.

Hardware can optionally support double-precision floats. This was the only optional feature of D3D11 mentioned at Gamefest.

There was a slide listing a bunch of other features without further explanation. Most are a bit mysterious, but I list them here in case someone else is able to puzzle out what they mean:

  • Addressable Stream Out
  • Draw Indirect
  • Pull-model attribute eval
  • Improved Gather4
  • Min-LOD texture clamps
  • Conservative oDepth
  • Geometry shader instance programming model
  • Read-only depth or stencil views

This completes my report on Direct3D11 from Gamefest. Check the XNA Presentations Page for the slides and audio – they are not up there yet, but hopefully will be there soon.

Direct3D 11 Details Part IV: Multithreaded Rendering

Direct3D 10 only allows graphics commands to be issued from a single thread (there is a multithreaded mode, but Microsoft explicitly warns against using it due to its poor performance). In an API such as Direct3D, issuing graphics commands involves a fair amount of CPU overhead. Given the trend towards increasing the number of cores on a processor rather than the performance of a single core, it is desirable to efficiently spread this work among multiple threads.

Direct3D 11 adds the ability to create display lists from multiple threads and execute them from the main rendering thread. In addition, the Device (which creates resources) has been separated from the Context (which issues graphics commands). This enables creating resources asynchronously. Deferred Contexts are used to create display lists and the Immediate Context issues graphics commands to the GPU, including the execution of display lists created on Deferred Contexts.

Unlike the other features in Direct3D 11, multithreaded rendering is not a hardware feature at all. With the appropriate drivers, D3D10 (perhaps even D3D9) hardware will be able to perform multithreaded rendering efficiently (some level of multithreaded performance will be available even without new drivers, but it was unclear what the limitations would be in this case).

Direct 3D 11 Details Part III: Compute Shaders & Unordered Memory

GPGPU (General-Purpose computation on GPU) approaches such as NVIDIA’s CUDA have become increasingly popular the last few years, recently coming full-circle with various non-traditional rendering algorithms (perhaps this should be called GPGPUG?). However, the existing solutions are vendor-specific, often requiring reprogramming even for different GPUs from the same vendor. They also tend not to “play well” with the traditional graphics pipeline. For example, on GeForce 8000-series GPUs using CUDA there is a large delay when switching between CUDA and traditional graphics rendering.

Direct3D 11 introduces a new kind of shader called a Compute Shader. A compute shader is invoked as a regular array of threads. The threads are divided into groups. Each group has 32KB of memory shared among the threads in the group. Thus the threads can use partial results computed by other threads in the same group, improving performance. Threads can also perform random-access reads and writes to graphics resources such as textures, vertex arrays or render targets. These memory accesses are unordered, although various synchronization instructions exist to impose ordering when needed.

Pixel shaders can also perform random-access (unordered) writes. This allows them to write data structures such as linked lists that can then be processed by a compute shader, or vice-versa (pixel shaders have always had the ability to perform random access reads via texture lookups).

Several examples of compute shaders were shown at Gamefest, performing post-process operations such as finding the average luminance of a render target, or computing a luminance histogram (both used in tone mapping). For these operations, a 2X speedup was quoted over the best performance possible using pixel shaders.

Compute shaders can also perform operations such as computing summed-area tables and fast-Fourier transforms significantly faster than traditional GPU methods. Microsoft is looking into providing library functions to perform such operations.

Microsoft speculated that algorithms such as A-buffer rendering and ray tracing could also be performed efficiently, but they don’t have any hard performance numbers for those.

Larrabee

Solid information about Intel’s new Larrabee architecture came out a few days ago, the Level of Detail blog has a good set of links. The major news is that Intel’s SIGGRAPH paper is now available for download from ACM’s Digital Library. Unfortunately, not everyone has access to this site’s resources (it costs money to subscribe). My contribution to the cause:

http://softwarecommunity.intel.com/UserFiles/en-us/File/larrabee_manycore.pdf

Thanks to Tom Forsyth for the link.

I’m excited by Larrabee not because of any particular technical feature (though I’m entirely savoring the paper itself, reading two pages a day at lunch), but rather by the fact that it opens up a whole new ecosystem for implementing graphics algorithms. Regardless of whether Larrabee wins or loses in the long-run, it will have a huge effect in increasing our knowledge by helping us explore different hardware and software designs for rendering.