Category Archives: Reports

2009 Academy Sci & Tech Awards

Oops – I forgot to include Christophe Hery in the point-based color bleeding award below. This has now been fixed; apologies and congratulations to Christophe. Many thanks to Margarita Bratkova for pointing out the error!

Last week, the Academy of Motion Pictures Arts and Sciences (most known for its annual Academy Awards, or “Oscars”) announced the winners of it’s 2009 Scientific & Technical Awards. No Awards of Merit (the highest award level) were given this year – those are the ones that come with an “Oscar” statuette and are shown in the Academy Awards telecast (Renderman and Maya have won Awards of Merit in previous years).

Two computer graphics-related Scientific and Engineering Awards were given this year; these are the second-highest award level and come with a bronze tablet:

Per Christensen, Michael Bunnell and Christophe Hery for point-based indirect illumination; an an interesting inversion of usual practice, this fast approximate global illumination / ambient occlusion technique started out as a real-time GPU technique and ended up as an offline rendering CPU technique (first used in Pirates of the Caribbean: Dead Man’s Chest, it is now a standard part of Pixar’s Renderman). A recent SIGGRAPH Asia paper describes a closely related technique.

Paul Debevec, Tim Hawkins, John Monos and Mark Sagar for Light Stage and image-based character relighting. The work done by Paul Debevec and his team at USC’s Institute for Creative Technologies on image-based capture and lighting has been hugely influential, resulting in widespread adoption of light probes, multi-exposure HDR image capture, and many other techniques commonly used in games as well as film.

One of the Technical Achievement Awards (the third level, which comes with a certificate) is also of interest to readers of this blog:

Hayden Landis, Ken McGaugh and Hilmar Koch for ambient occlusion. The pioneering work on ambient occlusion for film production was done by these guys at ILM; first publication was at the Renderman in Production course at SIGGRAPH 2002 (the relevant chapter of the course notes can be found here). Of course, ambient occlusion is heavily used in real-time applications as well.

In an interesting related development, eight separate Scientific and Engineering Awards and two Technical Achievement were given for achievements related to the digital intermediate process (digital scanning and processing of film data), many of them for look-up-table (LUT) based color correction (LUTs have also been used for color correction in games). The Academy tends to batch up awards in this way for technologies whose “time has come” (two years ago there were a lot of fluid simulation awards). Given that another of the Technical Achievement Awards was for a motion capture system, we can see how quickly digital technology has come to dominate the film industry. As recently as 2005, most of the awards were for things like camera systems; this year only one of the awards (for a lens motor system) was for non-digital technology.

Congratulations to all the winners!

More on Larrabee

I wrote earlier on Larrabee being delayed. A coworker pointed out this article from Jon Peddie Research, who know (and usually charge) more than I do. It makes a plausible case that cancelling this first version of Larrabee was the correct move by Intel, and that the experience gained is not wasted. JPR argues that the high-performance computing market is also high-margin, so needs fewer sales to be profitable. There are other gains from the project to date – anyway, a worthwhile read. I’ll be interested to see what’s next for Larrabee.

The magic of marketing and price differentials is fascinating to me. Books like The Underground Economist have some entertaining tales of how prices are set. Here’s a marketing story I heard (elsewhere), and it might even be true: HP had two versions of the series 800 workstation in the late 80’s/early 90’s, the only difference being, literally, one bit on a ROM chip. If the bit was set, then HP-UX could not be run on the workstation. Amazingly, the price for this version of the workstation was higher, even though it was seemingly less capable. This version was marketed to hospital administration, which at the time didn’t use HP-UX (so didn’t care); the workstations that could run HP-UX were sold to engineers. HP could honestly say there was a difference between the two workstations, say that one was tailored to hospital admin and the other to engineers, and so justify the price differential. If anyone wants to confirm or deny, great!

Larrabee Chip Delayed/Cancelled

The news for the day is that the current hardware version of Larrabee, Intel’s new graphics processor, for the consumer market has been delayed (or cancelled, depending on what you mean by “cancelled”). Intel is not commenting on possible future Larrabee hardware, so the Larrabee project itself exists. I don’t see an official press release (yet) from Intel. The few solid quotes I’ve seen (in CNET) is:

“Larrabee silicon and software development are behind where we hoped to be at this point in the project,” Intel spokesperson Nick Knupffer said Friday. “As a result, our first Larrabee product will not be launched as a standalone discrete graphics product,” he said.

along with this:

Intel would not give a projected date for the Larrabee software development platform and is only saying “next year.”

The Washington Post gives this semi-quote:

Intel now plans its first Larrabee product to be used as a software development platform for both graphic and high performance computing, Knupffer said.

See more from The Inquirer, CNET, ZDNet, Washington Post, and the Wall Street Journal. Many more versions via Google News.

In my opinion, Intel has a tough row to hoe: catch up in the field of high-performance graphics, when all they’ve had before is the ~$2 chip low-end GMA series. This series probably has a larger market share in terms of units sold than NVIDIA and AMD GPUs combined (basically, any Intel computer without a GPU card has one), but I assume makes pennies per unit and by its nature is limited in a number of ways. Markets like high-performance computing, which make the most sense for Larrabee (since it appears to have the most flexibility vs. NVIDIA or AMD’s GPUs, e.g. it’s programmable in C++), is a tiny piece of the market compared to “I just want DirectX to run as fast as possible”. The people I know on the Larrabee team are highly competent, so I don’t think the problem was there. I’d love to learn what hurdles were encountered in the areas of design, management, algorithms, resources, etc. Even all the architectural choices of Larrabee are not understood in their particulars (though we have some good guesses), since it’s unreleased. Sadly, we’re unlikely to know most of the story; writing “The Soul of An Unreleased Machine” is not an inspiring tale, though perhaps a fascinating one.

HPG 2009 Report

I got to attend HPG this year, which was a fun experience. At smaller, more focused conferences like EGSR and HPG you can actually meet all the other attendees. The papers are also more likely to be relevant than at SIGGRAPH, where the subject matter of the papers has become so broad that they rarely seem to relate to graphics at all.

I’ve written about the HGP 2009 papers twice before, but six of the papers lacked preprints and so it was hard to judge their relevance. With the proceedings, I can take a closer look. The “Configurable Filtering Unit” paper is now available on Ke-Sen Huang’s webpage, and the rest are available at the ACM digital library. The presentation slides for most of the papers (including three of these six) are available at the conference program webpage.

A Directionally Adaptive Edge Anti-Aliasing Filter – This paper describes an improved MSAA mode AMD has implemented in their drivers. It does not require changing how the samples are generated, only how they are resolved into final pixel colors; this technique can be implemented on any system (such as DX10.1-class PCs, or certain consoles) where shaders can access individual samples. In a nutshell, the technique inspects samples in adjacent pixels to more accurately compute edge location and orientation.

Image Space Gathering – This paper from NVIDIA describes a technique where sharp shadows and reflections are rendered into offscreen buffers, upon which an edge-aware blur operation (similar to a cross bilateral filter) is used to simulate soft shadows and glossy reflections. The paper was targeted for ray-tracing applications, but the soft shadow technique would work well with game rasterization engines (the glossy reflection technique doesn’t make sense for the texture-based reflections used in game engines, since MIP-mapping the reflection textures is faster and more accurate).

Scaling of 3D Game Engine Workloads on Modern Multi-GPU Systems – systems with multiple GPUs used to be extremely rare, but they are becoming more common (mostly in the form of multi-GPU cards rather than multi-card systems). This paper appears to do a through analysis of the scaling of game workloads on these systems, but the workloads used are unfortunately pretty old (the newest game analyzed was released in 2006).

Bucket Depth Peeling – I’m not a big fan of depth peeling systems, since they invest massive resources (rendering the scene multiple times) to solve a problem which is pretty marginal (order-independent transparency). This paper solves the multi-pass issue, but is profligate with a different resource – bandwidth. It uses extremely fat frame buffers (128 bytes per pixel).

CFU: Multi-purpose Configurable Filtering Unit for Mobile Multimedia Applications on Graphics Hardware – This paper proposes that hardware manufacturers (and API owners) add a set of extensions to fixed-function texture hardware. The extensions are quite useful, and enable accelerating a variety of applications significantly (around 2X). Seems like a good idea to me, but Microsoft/NVIDIA/AMD/etc. may be harder to convince…

Embedded Function Composition – The first two authors on this paper are Turner Whitted (inventor of hierarchical ray tracing) and Jim Kajiya (who defined the rendering equation). So what are they up to nowadays? They describe a hardware system where configurable hardware for 2D image operations is embedded in the display device, after the frame buffer output. The system is targeted to applications such as font and 2D overlays. The method in which operations are defined is quite interesting, resembling FPGA configuration more than shader programming.

Besides the papers, HPG also had two excellent keynotes. I missed Tim Sweeney’s keynote (the slides are available here), but I was able to see Larry Gritz’s keynote. The slides for Larry’s keynote (on high-performance rendering for film) are also available, but are a bit sparse, so I will summarize the important points.

Larry started by discussing the differences between film and game rendering. Perhaps the most obvious one is that games have fixed performance requirements, and quality is negotiable; film has fixed quality requirements, and performance is negotiable. However, there are also less obvious differences. Film shots are quite short – about 100-200 frames at most; this means that any precomputation, loading or overhead must be very fast since it is amortized over so few frames (it is rare that any precomputation or overhead from one shot can be shared with another). Game levels last for many tens of thousands of frames, so loading time is amortized more effiiciently. More importantly, those frames are multiplied by hundreds of thousands of users, so precomputation can be quite extensive and still pay off. Larry makes the point that comparing the 5-10 hours/frame which is typical of film rendering with the game frame rate (60 or 30 fps) is misleading; a fair comparison would include game scene loading times, tool precomputations, etc. The important bottleneck for film rendering (equivalent to frame rate for games) is artist time.

Larry also discussed why film rendering doesn’t use GPUs; the data for a single frame doesn’t fit in video memory, rooms full of CPU blades are very efficient (in terms of both Watts and dollars), and the programming models for GPUs have yet to stabilize. Larry then discussed the reasons that, in his opinion, ray tracing is better suited for film rendering than the REYES algorithm used in Pixar’s Renderman. As background, it should be noted that Larry presides over Sony Pictures Imageworks’ implementation of the Arnold ray tracing renderer which they are using to replace Renderman. An argument for replacing Renderman with a full ray-tracing renderer is especially notable coming from Larry Gritz; Larry was the lead architect of Renderman for some time, and has written one of the more important books popularizing it. Larry’s main points are that REYES has inherent inefficiencies, it is harder to parallelize, effects such as shadows and reflections require a hodgepodge of effects, and once global illumination is included (now common in Renderman projects) most of REYES inherent advantages go away. After switching to ray-tracing, SPI found that they need to render fewer passes, lighting is simpler, the code is more straightforward, and the artists are more productive. The main downside is that displacing geometric detail is no longer “free” as it was with REYES.

Finally, Larry discussed why current approaches to shader programming do not work that well with ray tracing; they have developed a new shading language which works better. Interestingly, SPI is making this available under an open-source license; details on this and other SPI open-source projects can be found here.

I had a chance to chat with Larry after the keynote, so I asked him about hybrid approaches that use rasterization for primary visibility, and ray-tracing for shadows, reflections, etc. He said such approaches have several drawbacks for film production. Having two different representations of the scene introduces the risk of precision issues and mismatches, rays originating under the geometry, etc. Renderers such as REYES shade on vertices, and corners and crevices are particularly bad as ray origins. Having to maintain what are essentially two seperate codebases is another issue. Finally, once you use GI then the primary intersections are a relatively minor part of the overall frame rendering time, so it’s not worth the hassle.

In summary, HPG was a great conference, well worth attending. Next year it will be co-located with EGSR. The combination of both conferences will make attendance very attractive, especially for people who are relatively close (both conferences will take place in Saarbrucken, Germany). In 2011, HPG will again be co-located with SIGGRAPH.

ShaderX7

ShaderX7 has been out for a few months now, but due to its size (at 773 pages, it is by far the largest of the series) I haven’t been able to finish going through it until recently. Here are the chapters I found most interesting (click the link for the rest of this post): Continue reading →

Deferred lighting approaches

In Section 7.9.2 of Real-Time Rendering, we discussed deferred rendering approaches, including “partially-deferred” methods where some subset of shader properties are written to buffers. Since publication, a particular type of partially-deferred method has gained some popularity. There are a few different variants of this approach that are worth discussing; more details “under the fold”.

Continue reading →

I3D 2009 Report

I was holding out in hopes that Jeremy Shopf would do a summary of days 2 and 3 of I3D 2009, since he did such a nice piece for day 1. No such luck, so here’s mine. What follows is a brief overview of I3D, mostly the papers that I cared about most, i.e., those on rendering and maybe a little modeling. The goal is to put enough information to let you skim through many papers and decide which you want to read.

There were about 100 attendees. As usual, you can find the paper titles and links to many of them on Ke-Sen Huang’s website.

Day 1:

Pat Hanrahan’s keynote was on do-it-yourself UI. He encouraged people to get their hands dirty and try making some dirt-cheap UI hardware, just to see where it might lead. How to summarize an hour-long talk? “Just Do It”, Arduino, Maker Faire, and this video, which is brilliant.

Multiscale 3D Navigation – The interesting nugget was the idea of rendering a cube map around the viewer to get the depths at pixels, then use this depth data to adjust the near and far planes, adjust the amount of distance traveled when moving forward, and perform collision detection and object avoidance when moving forward.

Gigavoxels – Make everything voxels. They use an octree at the upper levels, 3D textures at the nodes. Compress. Render.

A Novel Paged-Based Data Structure for Interactive Walkthroughs – A “what if our model has 110 million triangles?” paper. Usual idea of textures for far away stuff. Key idea is to divide the scene into coherent chunks that each fit into a disk page, with a k-d tree atop. Lots of preprocessing. Nice result of 20 FPS of the scene at good quality, on a laptop.

Terrain Sketching – UI and algorithms for artist-controlled creation of heightfields. Various ways to create them (silhouette, then “spine”, or vice versa). I liked where he uses spectral noise analysis of real terrain to fill in the artist’s shaky silhouette to make the final result more convincing.

Animation’s not my forte, so I’m skipping reporting that session. That said, Kavan’s paper seems to offer a nice thing, his (or any) higher quality non-linear skinning can be automatically turned into linear skinning, which CAD tools and GPUs support well.

Soft Irregular Shadow Mapping – similar to an earlier paper by Sintorn, now for Larrabee. Wild stuff: take the scene from the eye, take those samples that are seen and transform them into light space as a group. Cells in the light’s “grid” view of these samples are processed. Samples are compared to geometry and an approximation of the coverage of the occluder of the sample is made. Quite involved, but imaginable that this could be “the future”. Table 3 is particularly valuable to anyone interested in shadows, as it’s a summary of previous work and what features each has (more or less).

Hair Self Shadowing and Transparency Depth … – a different way of quickly creating a deep shadow map (like a shadow buffer, but with an opacity function at each pixel instead of a depth). Interesting use of buckets to count hairs at each depth, an “occupancy map”. Good error analysis & corrections done. Seems to work pretty well.

Approximating Dynamic Global Illumination in Image Space – SSAO has no view-dependent component, so as the lights move this self-shadowing component never changes. With a bit more work and record-keeping you can also get shadows that are more affected by directional lights and can also get color bleeding. Some cool effects, some artifacts.

Multiresolution Splatting for Indirect Illumination – a paper where I understood most of it while watching the presentation, but coming back to it I realize I’d have to read the article carefully to know what it all means when put together. Virtual point lights, min-max mipmaps, all sorts of stuff. If you know reflective shadows maps, this extends those by using multiple resolutions to save on bandwidth. Seems a bit tricksy, but amazing that it works.

Day 2:

Started with the fluids session; I’m not fluid-oriented, so no descriptions for you.

Fast High-Quality Line Visibility – very nice work and a good presentation, giving a number of techniques for computing visible lines. 50x faster than using an item buffer, for complex models.

Dynamic Solid Textures… – solid textures for NPR. Problem: how to balance between 3D textures that stay in place on their surfaces while having a 2D look with a constant frequency of pattern on the screen? Basic idea is to use octaves of noise and fade depending on zoom level.

Laplacian Lines for Real-Time Shape Illustration – A new NPR line type. I have “Suggestive contours are only in concave areas. Laplacian lines are a bit faster.” Hmmm, I didn’t write down much else about this one. As is often the case, some things looked great, some things did not.

Real-Time View-Dependent Rendering of Parametric Surfaces – the summary, “screen-space flatness”. Use CUDA to tessellate patches dependent on how much the curve diverges on the screen from a straight line segment. I didn’t understand fully how cracking was ameliorated, but it came out in questioning that cracking was not fully eliminated (though almost so).

MLS-based Scalar Fields… – wild deformations. Magic.

Real-Time Creased Approximate Subdivision Surfaces – how to keep creases on Catmull-Clark subdivision surfaces. Valve is starting to use this in their pipeline, as they would like to have just one art asset be able to be used both in the game and for offline animation. Has some little glitches at concave corners, and there are fixes for these. Valve seemed to feel this method had some staying power for their modeling in the future. Given the limitations of Catmull-Clark surfaces (e.g. these can’t be concave) it allows a lot.

Posters session followed. To be honest, the one that most sticks in my mind is where the guy analyzed a large number of games to see how artists simulate first-person walking. Does your head bob, does your gun bob, do both? Would something else work better? Bob patterns included up and down, U-shaped, infinity-symbol-shaped – the latter two were slightly better. This one won best poster presentation, in fact. Other one that was of interest was combining SSAO and toon shading, which seemed to give a bit more detail to objects.

End of day 2; I already covered the NVIRT announcement at the banquet that night.

Day 3:

Granular Visibility Queries on the GPU – about occlusion culling. First gave a summed area table way of tracking occluders, but that method seemed fussy and complex. The hierarchical item buffer presented seemed like a winner.

Parallel View-Dependent Refinement of Progressive Meshes – indeed, how to do this in parallel. Some very nice visualizations during this talk.

Efficient … Audio-Visual Rendering … – If you have a CPU budget for rendering images plus generating sounds, which pays off best? Nice to see someone do something different.

Don Greenberg gave an enjoyable capstone talk about the history of perspective and its use in architecture. He focussed on historical Italian architects playing tricks with right angles in buildings to make corridors look longer, trompe l’oeil painting, etc.

… human motion papers… crowd patches… egocentric affordance fields… – not my areas, and I faded a bit as conference paper overload started to set in.

The last paper I attended to more carefully, since it was from my company and I’m working in this area (but have nothing to do with the paper):

Multiscale 3D Reference Visualization – Infinite multiscale grids and how to render them well when zooming in and out. Also, put objects on blue stalks (red if two blue stalks merge) with the stalk bases forming circles on the ground plane grid, to give clearer visual cues as to the location and size of objects. New word for the day: exocentric. Egocentric rotation is where the viewer rotates his head, exocentric is where the object stays still and the viewer orbits (or for you gamers, strafes) around it.

And then home the next day through a snowstorm. My car survived, I survived, life’s good.

Direct 3D Details Part V: Other Features

This grab-bag of a post summarizes the various other features of Direct3D 11 which Microsoft described at Gamefest.

Dynamic shader linkage is supported (similar to the interfaces feature of Cg). This allows for separate light and material shaders to be written and compiled. These are later linked when the shader is set. This offers a solution to the combinatorial explosion resulting from a variety of lights and materials (this explosion, and some other solutions to it, are discussed in section 7.9 of our book).

Two new compressed texture formats have been added. BC6 supports high dynamic range RGB textures, using 1 byte per texel (instead of 6 bytes for an RGB 16-bit float texture). BC7 supports low dynamic range RGB or RGBA textures. It also uses one byte per texel (like DXT5/BC3), but offers significantly higher quality than texture formats available in D3D10. Both formats offer multiple block types (the compression tool selects the appropriate block type based on its content).

The block compression formats in D3D9 and D3D10 are based on the idea that each 4×4 texel block has all its values arranged along a single line, and the bits for each texel encode where on the line it is placed. For example, in DXT1/BC1, a line in RGB space is represented by two RGB endpoints, and each texel gets two bits to select one of four points along the line.

The new D3D11 formats support block types with one, two or even three (in the case of BC7) color lines. There is a tradeoff between the number of lines and the number of points along each line, since each block takes up the same amount of memory.

In principle, a 4×4 block with two color lines would need 16 additional bits per block to determine which line each texel was associated with (even more bits are needed for three color lines). To reduce storage requirements, only a subset of possible line association patterns are supported. The compression tool selects the best association out of this subset for each block.

Direct3D11 also tightens up the texture specifications. Decompression results must be bit-accurate, and subtexel/submip filtering precision is required to be at least 8 bits.

Direct3D11 increases the texture size limits from 8K texels to 16K texels. Note that a 16K x 16K DXT1/BC1 texture takes up 128MB – not many games will have textures this large! In general, D3D11 allows for resources as large as 2GB.

Hardware can optionally support double-precision floats. This was the only optional feature of D3D11 mentioned at Gamefest.

There was a slide listing a bunch of other features without further explanation. Most are a bit mysterious, but I list them here in case someone else is able to puzzle out what they mean:

Addressable Stream Out
Draw Indirect
Pull-model attribute eval
Improved Gather4
Min-LOD texture clamps
Conservative oDepth
Geometry shader instance programming model
Read-only depth or stencil views

This completes my report on Direct3D11 from Gamefest. Check the XNA Presentations Page for the slides and audio – they are not up there yet, but hopefully will be there soon.

Direct3D 11 Details Part IV: Multithreaded Rendering

Direct3D 10 only allows graphics commands to be issued from a single thread (there is a multithreaded mode, but Microsoft explicitly warns against using it due to its poor performance). In an API such as Direct3D, issuing graphics commands involves a fair amount of CPU overhead. Given the trend towards increasing the number of cores on a processor rather than the performance of a single core, it is desirable to efficiently spread this work among multiple threads.

Direct3D 11 adds the ability to create display lists from multiple threads and execute them from the main rendering thread. In addition, the Device (which creates resources) has been separated from the Context (which issues graphics commands). This enables creating resources asynchronously. Deferred Contexts are used to create display lists and the Immediate Context issues graphics commands to the GPU, including the execution of display lists created on Deferred Contexts.

Unlike the other features in Direct3D 11, multithreaded rendering is not a hardware feature at all. With the appropriate drivers, D3D10 (perhaps even D3D9) hardware will be able to perform multithreaded rendering efficiently (some level of multithreaded performance will be available even without new drivers, but it was unclear what the limitations would be in this case).