Author Archives: Naty

Exploiting coherence at GDC 2009

A few months back, I wrote a blog post discussing techniques which exploit coherence, either spatial (like multiresolution rendering) or temporal (like reprojection caching).

Both of these were represented at GDC this year. Jeremy Shopf presented a talk on Mixed Resolution Rendering, and the ambient occlusion technique presented in the talk Rendering Techniques in Gears of War 2 (available on the GDC Vault site) made use of both methods. The ambient occlusion factors were rendered at a downsampled resolution. In addition, reprojection caching was used to reduce temporal aliasing. This is the first use I have seen of reprojection caching in a shipping game.

In my previous blog post, I was skeptical of reprojection approaches, since it seemed to me that as an optimization method they did not address the worst case (where the camera angle changes abruptly). Using such approaches to improve quality instead (as Epic did) makes more sense.

Connections: Larrabee, Michael Abrash, Intel, Dr. Dobb’s and me

There has been a spate of Larrabee information during the last two weeks. Two GDC talks (slides near the bottom of this page), a prototype library, and an article by Michael Abrash on the Dr. Dobb’s website.

Dr. Dobb’s Journal has been out of print since February, but for many years it was one of the leading software publications. When initially published in 1976 (as Dr. Dobb’s Journal of Computer Calisthenics & Orthodontia) it was the first journal focusing on software development for microcomputers. Michael Abrash wrote many articles for Dr. Dobb’s over the years, including a series on the Quake software renderer in the mid 90’s. This series made a great impression on me; when it was published I was considering a career change from microprocessor design to graphics programming. At the time, I was working on Intel’s P55C processor, publicly known as “Pentium with MMX Technology”. This chip was notable both for being the first X86 processor with a SIMD (single instruction multiple data) instruction set, and for being the last CPU to use the in-order Pentium micro-architecture.

When Michael Abrash wrote the Quake articles game rendering was 100% software, mostly written in assembly language. Abrash was the uber-game programmer, having worked on DOOM, written the Quake renderer, and published (in addition to his Dr. Dobb’s articles) many influential books about graphics programming, assembly and optimization (the last of which is available online).

Within a few years (around the time I finally made the jump from CPU design to game graphics programming), it seemed to many that graphics hardware and compiler improvements had made software rendering and hand-coded assembly obsolete. This was mirrored by my own experience; I was hired to my first game industry job on the strength of a software rasterization demo (written mostly in assembler) and by the time the game shipped, it required graphics hardware and contained very little assembly (none written by me). Abrash started applying his considerable skills to what he saw as the next unsolved hard problem: natural language processing.

But he couldn’t stay away from graphics for long; when Microsoft started working on the XBox console he got involved in its design. In the early 2000’s, he figured out that there was a market for software renderers after all, mostly due to the mess of caps bits, unorthogonal feature support, and flaky compliance that characterized low-end graphics hardware at the time (Intel was among the greatest offenders; compounding the problem, its graphics chips sold very well so there were a lot of them out there). With Mike Sartain (another XBox designer), he wrote Pixomatic, a software renderer published by RAD Game Tools (until then mostly known for the Miles sound library, perhaps the most widely-used middleware in the games industry). Of course, he published another series of articles in Dr. Dobb’s about the experience, where he discussed how he made use of SIMD instruction sets such as MMX and SSE when optimizing Pixomatic.

I found this particularly interesting due to my personal involvement with these instruction sets. After working on the first MMX hardware implementation I helped define its successor, which was twice as wide (128 bits instead of 64) and added support for floating-point SIMD. This instruction set was at first called MMX2, then VX, and finally split into two separate instruction sets: SSE and SSE2. By this time SIMD instruction set extensions were becoming quite popular; AMD had their own version called 3DNow!, and PowerPC had the AltiVec instruction set. Intel kept on adding new SIMD extensions: SSE3, SSSE3, SSE4.1, SSE4.2, and AVX.

As Abrash details in the Larrabee article, Larrabee got started when he decided to talk to Intel about some ideas for SIMD instructions to accelerate software rasterization. As a result, Larrabee includes a powerful set of SIMD instructions. Much wider than previous instruction sets (512 bits instead of 128, or 256 in the case of AVX), Larrabee’s instruction set contains several instructions tailored to software rasterization. It is also general enough to allow for automatic code vectorization of a wide variety of loops. Abrash had a key role in the design of the instruction set, bringing software rasterization back into the mainstream.

Besides a good instruction set, Larrabee also needed an efficient hardware design with a large number of cores. Each of these cores needed to be very efficient in terms of performance-per-Watt and per-transistor. Since the Larrabee team started out as a skunkworks, they couldn’t afford to design a brand-new core so they looked at previous Intel cores, and the old in-order Pentium core (almost the same one I used in the P55C) was the one chosen.

What I find fascinating about this story is that Abrash managed to follow rasterization all the way around the Wheel of Reincarnation. This term refers to the common process where a piece of computing functionality is first implemented in software, then moves to special-purpose hardware which gradually becomes more general until it rivals a CPU in complexity, at which point the functionality is folded back into software. It was coined in a 1968 article by T. H, Myer and Ivan Sutherland (the latter is widely considered the father of computer graphics).

2009 Conference Paper Preprints

The ever-amazing Ke-Sen Huang already has paper pages up for I3D 2009 and Eurographics 2009. Both conferences are currently in that twilight zone where the authors have been notified (and are putting notifications and preprints on their web pages) but the official paper list has not yet been published.

There are already several interesting papers there: Approximating Dynamic Global Illumination in Image Space (available here) extends the popular SSAO (screen-space ambient occlusion) technique to support directional occlusion and single-bounce diffuse reflection. Automatic Linearization of Nonlinear Skinning (available here) introduces a method to automatically place virtual bones, resulting in quality similar to dual quaternion skinning but using traditional linear skinning. Multiresolution Splatting for Indirect Illumination (available here) speeds up reflective shadow maps by using a multiresolution data structure. Bounding volume hierarchies are important for many algorithms (including ray tracing), so a method to rapidly construct them on the fly is useful. Such a method is detailed in Fast BVH Construction on GPUs (paper web page here). The final paper has a somewhat self-explanatory title: Temporal Glare: Real-Time Dynamic Simulation of the Scattering in the Human Eye (available here).

Two papers, although lacking preprints as of yet, have particularly interesting titles, and I look forward to reading them: Soft Irregular Shadow Mapping: Fast, High-Quality, and Robust Soft Shadows and Real-Time Fluid Simulation using Discrete Sine/Cosine Transforms.

Speaking of bilateral filtering…

The mention of bilateral filters in the last post reminded me – the updated SIGGRAPH 2008 version of the bilateral filtering course slides have recently been put up on the course web page. The course (also discussed in a n older post) is an excellent introduction to this increasingly important topic.

Exploiting temporal and spatial coherence

Exploitation of temporal and spatial coherence is among the most powerful tools available to a graphics programmer. Several recent papers explore this area. Accelerating Real-Time Shading with Reverse Reprojection Caching (GH 2007, available here) uses reverse reprojection to reuse values cached from previous frames. An Improved Shading Cache for Modern GPUs (GH 2008, available here) analyzes the performance characteristics of this technique and proposes some efficiency improvements.

Such caching schemes involve analyzing each pixel shader to find appropriate values to cache. Care must be taken to use values which are expensive to compute but have low directional dependence. Automated Reprojection-Based Pixel Shader Optimization (to be published at SIGGRAPH Asia 2008, available here) proposes a method to automate this process. Another option is to apply reprojection caching to a specific, well-defined case like shadow mapping. This is discussed in Pixel-Correct Shadow Maps with Temporal Reprojection and Shadow-Test Confidence (EGSR 2007, paper web page). This paper was also mentioned in our book.

Personally I’m a bit skeptical of reprojection caching techniques, since whenever the view changes abruptly the cache will be completely invalidated resulting in performance dips. Many applications can’t use acceleration techniques which don’t help worst-case performance. Applications with restricted camera motion may certainly benefit. Enhancing these techniques with fallbacks which degrade quality (instead of performance) in cases of abrupt camera motion may make them more generally applicable.

A different approach is discussed in Geometry-Aware Framebuffer Level of Detail (EGSR 2008, available here). Here the idea is to render certain quantities into a lower resolution frame buffer, using a joint bilateral filter to resample them during final rendering. As with the previous technique, care must be taken in selecting intermediate values; they should be both expensive to compute and vary slowly over the screen. This powerful acceleration technique was also used in the paper Image-Based Proxy Accumulation for Real-Time Soft Global Illumination (PG 2007, available here). Variations of this technique have been used in games, with perhaps the most common case being particle rendering, as the Level of Detail blog points out in this interesting post on the subject. The same blog also has insightful posts on many of the papers mentioned here, as well as another related paper.

Direct3D 11 Technical Preview

The newly-released November 2008 DirectX SDK (available here) contains a technical preview of Direct3D 11. Details can be found in the release notes.

New stuff from NVIDIA

NVIDIA have finally finished posting all of the chapters of GPU Gems 2 online (the first GPU Gems is available as well). This is a great resource with many useful and interesting articles. NVIDIA have also been posting many of the presentations from their NVISION conference, which can be found on their news page.

Face and Skin Papers at SIGGRAPH Asia 2008

Ke-Sen Huang has recently added three papers relating to human face and skin rendering to his excellent list of SIGGRAPH Asia 2008 papers. Human faces are among the hardest objects to render realistically, since people are used to examining faces very closely.

The first two papers focus on modeling the effect of human skin layers on reflectance. The authors of the first paper, “Practical Modeling and Acquisition of Layered Facial Reflectance” work in Paul Debevec’s group at the USC Institue for Creative Technologies, which has done a lot of influential work on acquisition of reflectance from human faces (the results of which are now being offered as a commercial product). Previous work focused on polarization to separate reflectance into specular and diffuse. Here diffuse is further separated into single scattering, shallow multiple scattering, and deep multiple scattering (using structured light). Specular and diffuse albedo are captured per-pixel. Unfortunately specular roughness (lobe width) is only captured for each of several regions and not per-pixel, but since normals are captured at very high resolution they could presumably be used to generate per-pixel roughness values which could be useful in rendering at lower resolutions, as we discuss in Section 7.8.1 of Real-Time Rendering. The scattering model is based on the dipole approximation of subsurface scattering introduced by Henrik Wann Jensen and others. NVIDIA have shown real-time rendering of such models using multiple texture-space diffusion passes.

The authors of the second paper, “A Layered, Heterogeneous Reflectance Model for Acquiring and Rendering Human Skin” have also written several important papers on skin reflectance, focused more on simulating physical processes from first principles. They model human skin as a collection of heterogeneous scattering layers separated by infinitesimally thin heterogeneous absorbing layers. They design their model for efficient GPU evaluation, similar to NVIDIA’s approach mentioned above (one of this paper’s authors also worked on the NVIDIA skin demo). “Efficient” here is a relative term, since their model is too complex to be real-time on current hardware, and as presented is probably too complicated for game use. However, ideas gleaned from this paper are likely to be useful for skin rendering in games. The authors also present a protocol for measuring the parameters of their model

The third paper, “Facial Performance Synthesis Using Deformation-Driven Polynomial Displacement Maps” is also from Debevec’s USC group and focuses on animation rather than reflectance. They use the same facial capturing setup, but with different software to capture animated facial deformations instead of reflectance (this too has been turned into a commercial product). This paper is interesting because it extends previous coarse / fine deformation approaches to multiple scales, and uses a novel method to relate the different scales to each other. They use a polynomial displacement map, which uses the same form as Polynomial Texture Mapping (an interesting technique in its own right) but for deformation rather than shading. This method also bears some resemblance to the wrinkle map approach used by AMD for their Ruby Whiteout demo, which they presented at SIGGRAPH 2007.

More on Disk-Based Global Illumination

Bunnell’s disk-based global illumination algorithm has been discussed on this blog before, and it is an interesting example of GPU-oriented global illumination. I just read (on the Level of Detail blog) that Bunnell’s company, Fantasy Lab, is now selling an SDK incorporating an advanced version of this algorithm.

Real-Time Rendering

Tracking the latest developments in interactive rendering techniques