"Light Makes Right"
July 20, 2010
Volume 23, Number 1
Compiled by Eric Haines, erich@acm.org.
All contents are copyright (c) 2009, 2010, all rights reserved by the individual authors
Subscription: http://groups.google.com/group/ray-tracing-news
Archive location:
http://www.acm.org/tog/resources/RTNews/html/
(text version at
http://www.acm.org/tog/resources/RTNews/text/)
You may also want to check out the Ray Tracing News issue guide, the index to the Ray Tracing News, and the (ancient) ray tracing FAQ.
Forums, blogs, vendor sites, and all the rest of the internet mean I get to be lazy about the RTNews. Nowadays I put out an issue when something in the field interests me. At SIGGRAPH 2009 I had dinner with Marcos Fajardo, and followed up with some emails asking to interview him. He was too busy at the time working with Sony to reply, but got back to me just a few weeks ago. Last weekend I interviewed him, with the intent of writing an article. So, that's why there's an issue this time around.
If you don't make it through the article, at least note that he'll be part of what looks like a great course at SIGGRAPH 2010, "Global Illumination Across Industries", Thursday, 2-5:15 pm. More about this course can be found on our Real-Time Rendering blog. Other ray tracing related things at SIGGRAPH, listed chronologically:
A brief description of the GPU Ray Tracing BOF, from Austin Robison: "We won’t have a projector or desktop machines set up, but please feel free to bring your laptops to show off what you’ve been working on! Additionally, I’ve created a Google Group mailing list that I hope we can use, as a community, to share insights and ask questions about ray tracing on GPUs not tied to any specific API or vendor. Please sign up and share your news, experiences and ideas: http://groups.google.com/group/gpu-ray-tracing."
back to contents
(from Matt Pharr)
____
A ray tracer written using LINQ database queries in C#.
And another one by the same author, fully LINQified, just a single database query.
____
Here's a ray tracer written in F#, a functional programming language.
I'm starting to believe writing a ray tracer is the graphics equivalent of writing a prime number generator, the canonical first program everyone makes.
____
More raytracers for your iPhone/iPad: - Mandelbulb, Julia sets, and spheres inside environment maps.
____
GPU ray caster using CUDA, with source code.
____
How cute, a JavaScript raytracer, and I like the layout. There's a tiny bit of interactivity with the scene, dragging the mouse moves the camera a miniscule amount.
____
Daniel Pohl at Intel is well-known for adding ray tracing to the various Quake programs; Quake Wars is his latest work.
I was interested to see that someone else has been doing similar work, and has released their ray tracer as a patch inside of ioQuake 3, thread here, screen shots here. That said, in the demos I can't see anything that looks particularly ray traced.
____
MiniLight is an unbiased MC path tracer for triangles, with progressive refinement, in 10 different computer languages, and on average 570 lines of code in length.
____
99 lines for an unbiased MC path tracer for spheres: smallpt.
____
100 lines for a ray tracer that both generates and raytraces the sphereflake.
____
A business card-sized ray tracer, with depth of field, that spells the author's initials.
(from Morgan McGuire's useful Twitter feed)
This ray tracer was made by Andrew Kensler, one of Pete Shirley's students at the time. Image here.
____
A great sentence (from a review with a great title): "Nvidia’s new line of unbelievably expensive cards will block out the sun, and ray-trace its own shadow in real time."
____
There was a Ray Tracing BOF at SIGGRAPH 2009. No summary from me, but here are some pictures, at least, among a bunch I took at SIGGRAPH; about a third of the way through, look for the projection screens. There are many other ray tracing researchers scattered among this photo collection. Gotta catch 'em all.
One BOF presentation was from Caustic Graphics, which I hadn't heard of before. See their their demo video and website.
Another at the BOF was Arauna (mentioned before in the RTNews), which is quite impressive.
There were plenty of other demos, but these are the ones for which I had links at hand.
____
Philip Slussalek's group is working on including ray tracing in a browser.
____
OpenRL is a proposed standard for real-time ray tracing. Beyond this, I do not know.
____
Sony Pictures Imageworks has released OSL, an open source shading language aimed at ray tracing (Imageworks has embraced ray tracing, see the next article). Main page is here, introduction here.
____
Crytek has released an updated model of "Sponza" for global illumination testing.
____
It's free to download a "cloudless earth" data set at as precise as a 250 meter resolution.
____
The perfect movie for a simple ray tracer where you really want to amortize the cost of creating your efficiency structure - nothing in the scene moves, it's all camera. Interesting feel.
(from Morgan McGuire)
____
"Compression and Persistent Estimation of Soft Shadows" is a non-existent but plausible-sounding paper title generated by this page (hit "refresh" for another title).
____
Matt Pharr pointed out to me this paper on quicksort on how to structure the data to get the slowest sort.
It made me wonder if there are scenes that give an interesting pathological pattern for graphics. For ray tracing, say you're given a million spheres, all in the view frustum, say an orthographic view and just shoot eye rays, and want to arrange them to be poorly structured for various algorithms: BSP, BVH, grids, etc. The model of "put all million spheres in the same location" does it - pretty dull.
If we say "but the million spheres can't overlap", then that sort of becomes interesting - I guess it also matters how dense the spheres are compared to the viewed volume. For grids you stuff as many spheres as you can in a single grid cell to make that cell expensive, but once you've reached the limit for a cell, then what's the best way to waste rendering time, i.e. cause the most intersection tests? Hmmm, that's sort of interesting. What about BVH (and how's the BVH formed), etc.? Might be good to first solve in 2D for squares.
____
The main place to go for ray tracing efficiency papers in the past few years is HPG, i.e., HPG 2009 and HPG 2010. Naty Hoffman wrote a report about HPG 2009.
However, there are other bits of work around the periphery on all sorts of topics. For example, I ran across "How Reliable are Practical Point-in-Polygon Strategies?" by Stefan Schirra, in which my "Point in Polygon Strategies" Graphics Gems code base and others are put through the ringer. It does verify something I've always thought: avoid those triangle-fan type of algorithms. The "on edge" conditions I don't care so much about, but for those that do, this is the article for you. CGAL wins, if you don't care about speed.
____
In the past few years a few researchers have written on the idea of considering rasterization and ray tracing to not be separate methods, but rather points on a continuum. See "3D Rasterization – Unifying Rasterization and Ray Casting" and "Ray-Specialized Acceleration Structures for Ray Tracing", for example.
____
I wrote a mini-rant in the RTNews a few years back on how "free for reuse by academics, not corporations" is mostly a waste. A friend noticed that reddit had a thread on it, which was interesting to see. In the interim, I also noticed a similar (and better expressed) article in an old Linux Journal about how the strict GPL is a lose-lose. In a similar vein, this article makes a good point, "Why Publish CS Papers Without Code?".
____
For ray tracing (or any) books, Google Books gives a number of previews (20% of the book, typically).
Some books, like "Mathematics for 3D Game Programming and Computer Graphics" by Eric Lengyel, have chapters on ray tracing available online.
____
Pete Shirley has a ray tracing blog. Admittedly, he's tapered off on blogging, like ten million other semi-ex-bloggers, but his site is still worth reading.
____
There's also Twitter, e.g., Matt Pharr's feed has graphics goodness in 140 characters or less, occasionally ray tracing related. The only Twitter feed I currently follow is Morgan McGuire's, since he uses it exclusively to post cool links.
____
The emission theory of vision is like ray tracing, and it's all explained here, by dinosaurs.
Really it's just Plato that proposed this emission idea (Epicurus' earlier intromission theory even weirder in some ways), and Aristotle then followed with a theory that became the basis of modern optics. See this article for a good summary of this era and others. I do like that Democritus believed there were four basic colors: red, green, white, and black.
Horrifying bit: the Wikipedia article on emission theory notes "Winer et al. (2002) have found recent evidence that as many as 50% of American college students believe in emission theory."
____
Random visual stuff:
____
Entirely off-topic, but my childhood hero was Martin Gardner, who passed away in May. I noticed the book "The Mathemagician and Pied Puzzler", in tribute to him, is free for download. I used a puzzle from this book in the Ray Tracing News (answer here).
back to contents
Marcos Fajardo is the chief architect of the "Arnold" renderer. As a start, here are some Arnold images from 2000. For that time, these images gave a realistic look rarely achieved, due to the hemispheric lighting and global illumination effects. These made quite a splash and brought Arnold considerable attention from studios and others. Since then, however, Marcos and Arnold have a pretty low-key presence except for those in the know. There are no Wikipedia articles about either, few web pages mention them, and the company's website is an "under construction" page. Some people have thought the renderer faded away. You'd hardly guess that you've probably watched a movie recently that used Arnold for rendering.
Marcos was born in Spain, and first became interested in computer graphics when he was 14. Like many of us, he first used an 8-bit computer, using BASIC and assembly to write little graphical programs. He wanted to study mathematics but ended up in Computer Science, starting university around 1992. A friend introduced him to the Vivid and POV-Ray ray tracers, which was a moment of illumination for him. He was thrilled to find mathematics being used to generate cool images. He started working on his own ray tracer in 1994.
The internet and web was just opening up at that time, and his second illuminating moment was running across "The Ray Tracing News". It made him realize that there was amazing research work going on by many people. The RTNews was kind of like a map for him, pointing out SIGGRAPH papers, books, etc. During the day he would go to the university library and photocopy papers, at night he'd read them, sometimes entire nights. Which took a toll: getting an hour or two of sleep and then going to the university in the morning caused his performance to degrade pretty quickly. He started to fail tests as he was having too much fun with ray tracing. His family became concerned. His three year degree turned into a five or six year degree, and he never quite completed it.
He put some images online of work he was doing with atmospheric effects, others noticed his efforts, and he started getting job offers. First he worked two months with REYES Infografica in Madrid. He then worked for (LightWorks in Sheffield, England for a year. Not seeing the sun for a whole year was depressing, so he quit and went to his first SIGGRAPH in 1997. This was another event that opened his eyes, meeting researchers and seeing all the work going on.
Instead of working for others, Marcos decided he wanted to make his own renderer. He saw the various hybrid renderers out there and compared them to elegant ideas such as Kajiya's path tracing research, concluding that pure stochastic ray tracing really was the future. The skeleton of what was to become Arnold was started in 1997 (the first file was probably sphere.c), when he was 24.
He's found it's been an advantage to be small, in that he can be more responsive to customers' needs and give them what they want. Since 1997 he's moved from project to project, and country to country, as an independent consultant. He's refined and improved the software in the process and has always managed to keep the intellectual property. This sometimes required extensive negotiations, such as with Sony, but the problems involved are nothing a couple of good lawyers can't help solve. Keeping the IP means his contracts don't make him rich, but at the end of the day he owns his software.
From 1997 on Marcos has worked with various firms. He noticed a company making a cool CAD stage lighting system in Toronto. He showed them the atmospheric scattering effects he had been adding to Arnold. They were blown away and hired him to use Arnold in their product. While he retained the IP, he wasn't paid much. Pretty much underpaid for this particular job, in fact, making it hard to survive in Toronto. However, it was a great experience, he had the freedom to work on what he wanted, his software improved, and he met a lot of people. This model - being paid by a company to add new features to Arnold - has been how he's lived the past 13 years.
During Siggraph 1997 Marcos slept on the floor of some friends' house in Bel Air. These friends introduced him to many of their own friends; one of them was Justin Leach, who was working as an animator at Blue Sky Studios. Justin knew Marcos was into rendering and introduced him to Carl Ludwig, cofounder and one of the original developers of Blue Sky's renderer. Carl was great, gave him a tour, and they spent some time chatting about ray tracing. Blue Sky Studios, the first studio to use (classical, later Monte Carlo to some extent) ray tracing for all rendering. Marcos was blown away by their images, especially one rendered by John Kahrs, with illumination coming from the sky. He was inspired, these images convinced him that full stochastic ray tracing was the way to go. Blue Sky's "Bunny" was done with stochastic ray tracing, though the bunny itself did not have indirect lighting (too expensive). It was probably the first use of unbiased Monte Carlo radiosity in film, back in 1998.
In 2001 the animated film "50 Percent Grey", a single-person effort, was nominated for an Oscar; Arnold was used for the rendering. It was a landmark for Arnold: just a few machines were needed, showing that stochastic ray tracing could produce high quality images with a small CPU budget.
In 2004 he became employed by Sony Pictures Imageworks. Pepe Valencia, a friend working on animation and previsualization for the Sony film "Monster House", introduced him to the visual effects supervisor and co-director, Jay Redd. Jay had a vision of what he wanted the film "Monster House" to look like, that it would look like marionettes a couple feet high. He wanted the characters to look solid and physical, not CG. So, global illumination was very important to him. They tried other renderers, but these packages were too slow for the effect he wanted. They did some tests with Arnold, and the results were very encouraging. However, many production-oriented features were missing, such as delayed reading of geometry archives, a texture caching system, a hair primitive, etc. So they assembled a small team of talented rendering developers that worked extremely hard supporting the production, tuning the software to meet the production's needs. Speed overall, and motion blur speed in particular, was considerably improved.
Marcos was supposed to be in Los Angeles a year, but one thing led to another and he was there for three. After five years, Sony is now 100% an Arnold rendering house. Arnold was used in "Monster House", aided in "Beowulf" (such as precomputing ambient occlusion), shots in "Eagle Eye", the glass palace (made for ray tracing!) and other shots in "Watchmen", "2012", "Cloudy With a Chance of Meatballs", "Alice in Wonderland", and the upcoming "Smurfs" and other unreleased titles. "Cloudy" was the second all-Arnold movie, and first to fully use Arnold's capabilities: motion blur, sub-surface scattering, hair, etc. While this article has been focused on Marcos, the credit for many of these new capabilities goes to the entire Arnold development team at Sony, in particular Cliff Stein and Chris Kulla, "two amazing hackers of all things rendering", as well as Rene Limberger, Scot Shinderman, Eduardo Bustillo, Solomon Boulos, and lately Larry Gritz, Alex Conty and Angel Jimenez.
In January of this year he is no longer directly working with Sony, but now has a different agreement in place. Last year he founded Solid Angle SL, in Madrid, to give things a bit more structure. They started as two people, currently consist of five engineers, and by the end of the year will be eight. Dealing with a company's needs means less time for coding, his first love, so he's looking for business specialists to take over these aspects. With all the improvements to his code from working with Sony and others, he's now interested in selling Arnold as a commercial product. They're working with a small number of customers and provides plug-in support for Maya and SoftImage. There are about a hundred beta users at companies of various sizes for the plug-ins, so support right now is a full-time job for the team. Customers have access to the source code for the plugins and shaders, using Subversion for source control and Trac for bug/feature tracking. Some beta example images are here. Arnold itself is a raw, multi-platform C/C++ API with Doxygen documentation, and some customers want to license just the rendering API itself.
Arnold is very much a batch renderer with an API; a typical final film frame takes a processor 3-4 hours. What is interesting about Arnold is that it's an unbiased stochastic ray tracer through and through. No rasterization tricks for eye rays, no irradiance caches or photon maps for light sources. Motion blur and depth of field effects weakens rasterization's appeal, since much sampling has to be done either way; these features are a natural part of stochastic ray tracing. Not being a hybrid system using some form of rasterization has a number of advantages. First, there's only a single version of the model stored, not one for the rasterizer and another for the ray tracer. Reducing memory footprint is critical for rendering large scenes, so this helps considerably. Arnold also uses compression and quantization of data (lower precision) in order to reduce memory costs. Not using two methods of rendering avoids other headaches: maintaining two separate renderers, fixing mis-syncs between the two (e.g., one thinks a surface is in one location, the other has a different result), dealing with a large number of effects consistently in both, etc.
Avoiding irradiance caches of any sort means that there is minimal precomputation time for Arnold - basically, just the efficiency structures needed are built. This means rendering can happen immediately, vs. waiting for precomputations to be completed. Combined with progressive rendering (where an image is roughed out and improves over time), this is an advantage in a production environment, as artists and technical directors can then iterate more rapidly. As discussed later, scene size is an important factor, due to memory constraints. No irradiance cache means no memory spent on this feature. He also finds that irradiance caches cause artifacts. More importantly, such algorithms expose to the users a fair number of control parameters, e.g. 12 to 15. Users have to master knowing how to set these in order to get high quality results in a reasonable amount of time. Arnold has one parameter, the number of samples, making it easier for the users to understand the system. Finally, Arnold is aimed at animated film production, so there's not a lot of reuse of irradiance cache computations as there would be in, say, an architectural walkthrough.
Marcos aimed to make Arnold as physically-based as possible. Sometimes customers will request modifications for usability. Surprisingly, in recent years some customers have been requesting direct physical controls, such as controlling metals' reflective characteristics by directly specifying the index of refraction (IofR). Tracking down these indices for metals can be a challenge. Marcos (and I) assumed that the IofR would not be a control artists would want, but it turns out that many of them do. The material models in systems such as 3DS Max use IofR as a control, so artists have gotten used to it.
Arnold is entirely CPU-based, it doesn't work with GPUs. That's bad, in that they could get more speed from GPUs, but most render farms are pure CPUs right now. GPUs have serious drawbacks: a different (and less mature) programming model, smaller overall memory space available, and the overhead of maintaining two code bases that must be 100% identical in results. Any differences in handling NaNs or infinities or anything else loses the all-important feature of platform transparency, that the same image is produced (or progressively refined) regardless of the piece of silicon producing it. Some related comments by Larry Gritz of Sony can be found here.
Similarly, Marcos has limited using CPU SIMD (SSE, etc.) types of operators in his code, due to the costs in programming time, maintainability, and readability. They're used, but in just time-critical parts of the code. He's been working on Arnold 13 years, and it's a package meant to work for years in the future, so short-term gains of say 10% are not worth the difficulties they present to keeping the code clean, fast, and reliable.
But, why the name "Arnold"? In 1999 Marcos was working with Station X Studios in Los Angeles. One night he went with two friends to an Arnold Schwarzenegger film, "End of Days". His friends imitated the Arnold accent from the rear of the theater, cracking up the audience. Marcos had never realized what a distinctive voice Schwarzenegger had, since he had only seen Arnold films in Spain, where they're dubbed. For a few years his renderer's API was called "RenderAPI" - really boring - and needed a code name. Andy Lesniak, one of the friends at the theater, suggested "Arnold" as a joke, and Marcos liked it. Marcos then started showing images from Arnold on the web. The code name was picked up by people, so the name became permanent. He's thought of choosing something more professional, but "Arnold" now has a reputation so he hasn't changed it (yet).
So, is rendering solved? Marcos feels there is definitely diminishing returns to research. There are new effects that come out every now and then, but they've become rarer. For example, he notes at EGSR 2010 there was a paper on refraction through non-constant media, "Interactive Rendering of Non-Constant, Refractive Media using the Ray Equations of Gradient-Index Optics", where the index of refraction changes throughout a volume, giving some interesting effects. Marcos also mentions the field of "predictive rendering", where researchers like Alex Wilkie are constantly pushing the limits of what our rendering simulations can do, exploring such exotic features as polarization, fluorescence, and birefringence. Volumetric scattering effects are also still a fertile area of research - the basic equations have been laid out for years, but it's still quite hard to make accurate simulations that are also efficient and usable.
A lot of his research goes into efficiency, optimizing various light transport paths and sampling methods. Better sampling, in particular, is a place where he feels there are still large improvements to be had. He says there are two main camps for sampling. One is that you have an 11 or 17-dimensional (or whatever) sampling pattern, as advocated by Alex Keller's QMC work. The other camp is to have a chain of 2D and 1D patterns for every isolated integral you're evaluating, as originally advocated by Rob Cook and Pete Shirley. For example, an optimal 2D pattern for pixel sampling, a 1D pattern for motion blur, etc. You have to chain these together with decorrelation between adjacent dimensions in the overall integral. That said, people are constantly coming up with new ideas, like adaptive multidimensional sampling, for more efficient motion blur and depth of field.
Arnold is currently aimed at fulfilling the needs of film animation, vs. lighting simulation, architecture, etc. Because it's a unidirectional (or classic) path tracer, effects such as true caustics are expensive and noisy in the system. Which turns out to be fine, as he has found no film-making client has ever needed to make caustics in their films. It would be great to have caustics, but really good diffuse global illumination and good area-light shadows are far more important for his clients.
As far as efficiency goes, he started off with uniform grids (ugrids). They were using a per-object ugrid (storing triangles) and then a global or "root" ugrid (storing objects), so in a sense it was a 2-level hierarchy of ugrids. However, some objects could be "procedural" objects, which when hit by a ray expand and create a list of other objects (which themselves could be other procedural objects!) which they would store in another ugrid. So in the end it could be a deep hierarchy of ugrids. But unlike in the HUG paper ("Filtering, Clustering and Hierarchy Construction: a New Solution for Ray-Tracing Complex Scenes"), their hierarchy was dictated by the scene graph (i.e., the artists building the scene) rather than computed automatically by trying to minimize some metric of render time cost.
To quote Marcos directly: "If I recall, I was choosing the number of voxels according to an IEEE CG&A paper that described a way to keep the voxels roughly cubical ["Faster Ray Tracing Using Adaptive Grids", some RTNews followup here - Eric]. This was well known for anybody using ugrids, cubical voxels worked better than, say, using csqrt(N) in each dimension (which can give inefficient, elongated voxels). On top of that, we had a multiplier for the number of voxels that was set to around 4-8 by default, I don't remember. In many cases, increasing this multiplier would decrease render time a bit (at the expense of memory).
"I remember trying hard to squeeze as much performance out of ugrids as possible with little tricks here and there. I stored in each voxel a list of quantized 30-bit bboxes, one for each triangle in the voxel [see RTNews here - Eric]. Using an integer bounding box of 6 values at 5 bits of precision for each axis cuts storage of the enclosing bounding box for a triangle from 24 bytes to just 4. This list was stored in a packed array [see RTNews here - Eric]. Another important optimization was to accurately clip the triangles against each voxel, to reduce false-positives stored in the voxel. I think there's code for this by Akenine-Moller in JGT [there is: "Fast 3D Triangle-Box Overlap Testing" - Eric] and that's what I used."
However, once deformation motion blur is needed, it turns out grids just don't work. The problem is that, within the frame's time, a triangle can move a considerable distance. This movement means the volume swept out by this triangle overlaps many grid cells, considerably lowering efficiency. One idea is to form a separate 3D grid for a series of time segments. For example, the frame's time interval could be split into say 4 segments. Now the triangle has four separate, smaller, swept volumes, one per grid structure. However, to get good performance in scenes with a lot of motion, you need to use 100+ time segments (and so, 100+ 3D grids), which quickly becomes prohibitive in terms of time and memory.
Because of this problem and others endemic to grids (teapot in a stadium, and non-uniform distribution), 4 years ago they moved to a bounding volume hierarchy, essentially. The details are proprietary information, but Marcos could say that it's not just a matter of choosing an algorithm but also working on the implementation quality and optimizations. They perform careful selection of the "accel constants", which are chosen according to the results of an extensive performance benchmark over a testsuite of many complex production scenes. They run this performance tuning separately for every primitive in the renderer (i.e. once for triangles, once for objects, once for hairs -- since they all have different intersection costs). This idea could be applied to the "number of voxels" problem as well, you could render your suite of stress tests with many values of the constant num_voxels(N), produce a table with results (build time, total render time, memory use), and report the value of the constant that gives you the absolute minimum, the average minimum etc.
Marcos feels efficiency is not a solved problem, since there are all sorts of variables in a scene and system: effects needed, type of data, available memory, etc. Memory is important: scenes of 100 to 200 million polygons were not unusual in "Alice in Wonderland", and these need to fit inside of 4 GB (8 GB if you're lucky) of machine memory. Memory for textures is not important, as caching usually keeps down the amount of texture data in memory. Arnold uses the open source OpenImageIO, started by Larry Gritz, for texture caching.
The two primitives used in Arnold are curved lines and triangle meshes. Bezier splines (directly intersected, not tessellated) are for hair, triangles are for everything else. Since curved lines use the same efficiency structures and all the rest, they ray trace quickly. This means that they can perform ray- traced motion blur on hair, something most studios won't even contemplate rendering directly (because it's so expensive) but rather use post-processing applications like Shake. An optimized, tightly integrated stochastic ray tracer makes this effect possible: the blurred hair itself, blurred shadows, blurred reflections, blurring seen through a glass sphere, etc. He's thought about level of detail for meshes, and ray differentials can give some sense of the detail needed, but it's a tricky problem. Luckily, most of the meshes he uses are made of curved patches that can be approximated to an arbitrary level of precision, so he has the luxury of generating more detail from a simple model, vs. the more difficult problem of decimation of complex meshes without losing important features.
I asked what he liked at EGSR 2010. He's focused on the few papers that are relevant to Arnold, or are very easy to implement (a rare quality for a paper these days). He mentioned the following:
"Bounding the Albedo of the Ward Reflectance Model" corrects the Ward anisotropic model.
"Two Methods for Fast Ray-Cast Ambient Occlusion" looked fast to accelerate AO, and it may also be applicable to global illumination.
"Compressive estimation for signal integration in rendering" seemed good for reducing noise from stochastic sampling for motion blur and DOF.
"Invisible Seams" he agrees is an important problem and useful work, properly hiding seams among separate texture atlas elements (separate in the atlas, touching on the model).
Of other recent work, he also particularly likes the PantaRay engine, to be presented at SIGGRAPH 2010. From NVIDIA and Weta Digital, PantaRay uses an out-of-core BVH to ray trace billion-triangle scenes. Weta used it in "Avatar" to perform all the ambient occlusion precomputations, a huge amount of work. Motivation here.
If you want to hear Marcos Fajardo talk about Arnold, he'll be speaking at SIGGRAPH (probably first) in the course "Global Illumination Across Industries", Thursday, 2-5:15 pm. I suspect Arnold will also get mentioned in the technical talk "Lighting and Rendering 'Alice In Wonderland'" by Adam Martinez and Terrance Tornberg, Monday, 9 am. I've heard this talk will focus on the use of physically based shaders.
As a last piece of eye candy, the "Assassin's Creed: Brotherhood" game trailer for E3 2010 was rendered with Arnold.
back to contents
Solomon:
Yes, the genesis of this was me telling Matt he had to do this and it's faster than Brian Smits's depth first ordering. Like you said, for "closest hit" it's probably an improvement since it probabilistically finds closest more quickly. For shadow "any hit" results, it's less obvious since it seems "any hit" is the goal. My retort at SIGGRAPH to both of you is that "it's any hit within [tmin, tmax]".
Eric:
I think Matt and I discussed the idea of local lights: if you're tracing a shadow ray from a surface towards a light, it's probably better to choose the box closer to the surface's origin, i.e. the sorted box, just on the theory that the farther box could indeed be entirely beyond the light itself. Even if the farther box is not beyond the light, odds are that the stuff in this farther box is not blocking the light. Think of a typical room, where there are lights on the walls and ceiling. The walls and ceiling geometry will be in the "far" box, but won't block, so testing the far box first is probably a waste. Anyway, that's my intuition, but I've found my intuitions are sometimes wrong when it comes to ray tracing. Were there other reasons you felt contributed to sorted-BVH being faster for shadows? Also, what about lights at infinity, where "the far box stuff is beyond the light" does not apply?
Solomon:
The local light example is mine: anything that isn't a directional light (good example, and you seem to be correct on it -- no benefit to be had) is contained within something for the most part. However, one bit of confusion is that while we just want "any hit within the interval" it would seem like hitting the parent box implies that the order we traverse the children shouldn't matter because the ray-box test is already saying "this is inside the interval". But therein lies the logic mistake and why I think this ordering is still beneficial even for shadow rays: the parent ray-box test tells you that the ray can reach the parent box at some point in [tmin, tmax] it doesn't tell you that the ray passes through the full box (i.e., at tmax you might just barely intersect the parent box, so you'd prefer to still traverse the near children first).
Arguments about "is the occluder closer to tmin or tmax" would muddle this more and I like to avoid such things (little note below though).
Eric:
An amusing corollary to the "intersect the near box, then the far box, for shadow rays" result you've found is that, if you're tracing from the light to the surface (who does that? Though it's fine to do so), then you'd want to reverse the rule and intersect the far box first, then near.
Solomon:
All the packet tracing kids did this (shoot from the light). Tracing from the light to the hit points gives you a common origin optimization for packet tracing (and is the corollary of shadow mapping). Though there is an interesting realization that Ingo/Pete/I had a few years ago: putting a point light directly above a tessellated sphere means that tracing the rays from the hit points to the lights will graze all the geometry in the scene (doing lots of work), while tracing from the light to the point would need to do that as well for unoccluded rays but might find the occluder first and save lots of work.
I don't see your "reversed order" argument though... If you trace the ray in the opposite direction, it already changes sign and effectively traces the rays in the same sort of order you'd want. This seems to boil down to whether you think the occluder is close to tmin or tmax (so in this case near the light or near the surface instead of the other way around).
p.s. Can we think of other cases with tmax = positive-infinity where you want "any hit"? Direct lighting from an environment map sampling came to mind for me (which sounds less goofy than "you should care about this for the all important directional lights"). It's an interesting claim that one of us should test out (might suggest doing Smits style traversal for all shadow rays that you know have tmax = positive-infinity).
____
That's where I left our email exchange. I believe Solomon sums it up nicely, it's really a matter of whether you think the occluder is closer to the hit point or the light. I suspect for a typical scene the answer is indeed "it's closer to the hit point" in most scenes, as people tend to place lights "out there" and not inside complex geometric enclosures.
Thinking about it today, I believe this area deserves further study and testing. Solomon notes that when a ray hits a box, there's a logic error of thinking of the ray's interval is entering and exiting the box. There are four states for ray/box: ray entering (but not leaving) the box, ray exiting, ray fully enclosed, and ray entering and exiting (call it "piercing"). For fully enclosed and piercing, the order of testing the children for shadow rays shouldn't matter, at least in theory (it might matter due to the nature of the scene itself). For rays that enter a box, it's certainly true that testing the nearer child box will be more efficient: either the ray hits something, which is more likely to be in the closer box, or is unoccluded, in which case order doesn't matter. Conversely, if a ray starts inside a box and is exiting it, the "far" box should be tested first.
However, this "exiting" case is likely to be the norm early on for a ray leaving a hit point and traveling towards a light. It starts at an object which is deeply buried in the BVH, so starts inside many boxes and exits these boxes. So this sort of case gives the opposite to what Solomon was finding, it says that a "closer" box (first along the ray direction) should be tested after the "farther" box, since the ray origin itself is more likely to be beyond the "closer" box.
Anyway, this is an interesting area to think about. Typically we test a ray against a box and get a boolean back, hit or miss. Perhaps also using the three important states - entering, exiting, inside/piercing - along with the directional plane we can gain a bit more efficiency for shadow rays. Alternately, as is often the case with such optimizations, the additional complication of computing and using the state may outweigh the benefits. Another possibility is that the theory is correct ("exiting shadow rays should test the far box first"), but in actual scenes the rule doesn't work in practice, due to the nature of how such scenes are constructed.
back to contents
Here's the idea for a ray tracer: a ray is traversing a bounding volume hierarchy. We open a parent box and try child box 1 and box 2, in that order. Box 2 has an object that blocks the ray (or is the closest intersected object, for eye rays). The idea of branch prediction is simply: let's try box 2 first from now on, since it was a winner last time and so looks more likely to terminate the search earlier. Ideally you'd want every parent, grandparent, etc., up the hierarchy to be tested first in future testing, since their child was the one that stopped the ray and so should get tested first. Doing all this reordering for the whole chain of parents is way too much work due to a single ray's result, I believe.
One variant that is less expensive is to make a note of the last "second box" that is tested. That is, you're going down the hierarchy, opening boxes that are hit and testing box 1 and box 2. Any time you test box 2, make note of the parent box - not in a list, just save the pointer to that parent, replacing any previous parent whose second box was tested. At the end of testing, if a ray hit an object, then go modify the parent box (if any) you noted along the way. This was the last parent box where its second child box was found to be useful, one where you wish you had tested this box first instead of second. Swap the parent's box 1 and box 2. In the future, the "useful" box 2 will now be tested first.
By modifying just the last parent box found to be out of order, you minimize work per ray - a maximum of one swap. You will also percolate branch prediction on up the hierarchy, if it indeed turns out that a particular object is occluding a long series of rays. Say a ray hits some object, and that object is in box 2 of its parent, which is in box 2 of the grandparent, which is in box 2 of the great-grandparent. After the first hit, the parent's boxes are swapped and the object is now in box 1 of the parent. The next ray also happens to hit the object. In this case, the grandparent's boxes 1 and 2 are swapped, since the grandparent's box 2 was the last "second box" found and noted. In this way, as the ray hits the object more and more, the hierarchy continues to get modified, a little at a time, so that the object hit gets tested earlier and earlier.
I can imagine reasons this method may not work for various systems. If you flatten the hierarchy and can't modify it on the fly, then this scheme cannot be used. The scheme might be a bit tough in a multiprocessor context, since you're modifying the hierarchy on the fly (maybe not so bad, an atomic operation on a single node per ray, at most). More likely, there's little coherence among the sets of rays each different processor is working on, so they could just be fighting each other, constantly swapping boxes as each goes down a different part of the hierarchy. However, I suspect that, given enough shadowing in a scene, and given that some parts of the BVH are going to occlude more than others, this method will reduce the total number of ray/box intersections performed overall. Whether these savings outweigh the recording of parents and swapping of boxes determines whether this scheme is worthwhile.
There are variants, of course. You might keep a list of the last two or three "second box" parents and modify them all. More cost, but the idea is that you want to shift the hierarchy more quickly to get to the occluding objects in the scene.
I like the elegance and balance of the idea: a single ray doesn't wreak huge changes on the hierarchy, changes that may be entirely misguided due to one unlikely ray/object intersection. Continued intersection of an object makes it be intersected earlier and earlier by subsequent rays. If an object, a single triangle of a larger structure, occludes a ray a number of times, then is missed by subsequent rays, the larger structure's other triangles will benefit from the earlier predictions, as their shared parents will be tested earlier on.
back to contents
These are versions of scenes from the films that we have in our benchmark suite, not necessarily stuff that's in the final frames. More precisely, some of these are scenes that were in production, had some problem, and then we kept them as reminders for regression testing (particularly the ones with very low light counts). Moreover, I didn't run this with the production shaders (which might have done other things) but rather with our debugging shaders (which should still roughly end up with a similar number of shadow rays anyway).
Beowulf, Mead Hall: 11 lights, 32 shadow rays per path, 50% hit rate 2012, Loading Dock: 8 lights, 28 shadow rays per path, 35% hit rate Cloudy, exploding building: 4 lights, 10 shadow rays per path, 62% hit rate Alice in Wonderland Rabbit Hole: 32 lights, 15 shadow rays per path, 93% hit rate Alice in Wonderland, Battle: 2 lights, 21 shadow rays per path, 18% hit rate Alice in Wonderland, Forest: 7 lights, 10 shadow rays per path, 56% hit rateI'll summarize the most salient features. The ones that are fairly enclosed (mead hall, the rabbit hole, forest) have fairly high hit rates, because obviously there's lots of stuff for them to hit. As a side note, Beowulf was a prman-based show so I think these lights were set up for that purpose (mostly point and spot lights with no area lights). The ones with extremely low hit rates (2012 Loading Dock and Alice in Wonderland battle scene) have a lot of empty space (early versions of the scenes) and, importantly, have a sun-like light source as well. Before looking at the pictures, I would have guessed that directional lights commonly get occluded (any hit within 0 and infinity counts), but after looking at them it became obvious that if a lighter is trying to place a light to be the sun, most of the scene will actually be illuminated (duh).
I've also excluded some of our tests that are more like simple characters floating in space. These had low hit rates as well (10-ish percent), for similar reasons - it doesn't usually make sense to light your character with something that is fully occluded.
You'll also note from my numbers of rays per path that shadow rays make up a huge portion of our ray budget. We've commonly seen scenes where 90% or more are shadow rays. So to answer your "does optimizing shadow rays matter" question, I can definitively say it does. The conclusion may not apply to all other usages, in particular because film rendering commonly leverages light linking (where a given light is assigned to affect only a certain set of objects). It's difficult for me to determine how much this matters in the results above without a lot of instrumentation -- I'm not even sure what I'd "check". Of course, in your architectural walkthrough example, disabling light linking would only boost the hit rates higher (presumably lights in other rooms would "cast shadows" to the current room).
[... which is an interesting way to think of the world, by the way: there might be 100 light sources in the room you're in, but you're also in shadow of the quadrillions of light sources on the earth, not even counting the octilllions of stars for which you're in shadow. - Eric]
back to contents
I created an SSE version of Smits' [98, 02] and Williams' [Williams03] ray-box code that looks very similiar to Solomon's. My main concern is branching, versus Eric's main concern of early outs. I was not able to see which branch has more "rejection power" in real world scenes, hence my reluctance to commit to early outs. For a GPU you most definitely do not want all those branches.
I use the IEEE trick Smits and Williams use to avoid the test for divide by zero and to capture the sign. I found I had to add one test to make it bullet- proof - to check for #IND (NaN) and to clamp these. The check involves a 3-way test for #IND, which is pretty fast. However, I do have to use _mm_movemask_ps, which involves syncing the SSE and integer pipes, which is 7 cycles. After massaging Williams' logic into a more SSE-friendly form with an eye to removing branching, I managed to get it down to a single conditional - the final compare of tmin and tmax for the return code, in addition to the check for #IND. First, we need to examine Williams' improvements to Smits' original code. The following is taken from [Williams03]:
Smits:
if (r.direction.x() >= 0) { tmin = (bounds[0].x() - r.origin.x()) / r.direction.x(); tmax = (bounds[1].x() - r.origin.x()) / r.direction.x(); } else { tmin = (bounds[1].x() - r.origin.x()) / r.direction.x(); tmax = (bounds[0].x() - r.origin.x()) / r.direction.x(); }etc. for y and z.
____
Williams:
divx = 1 / r.direction.x(); if (divx >= 0) { tmin = (bounds[0].x() - r.origin.x()) * divx; tmax = (bounds[1].x() - r.origin.x()) * divx; } else { tmin = (bounds[1].x() - r.origin.x()) * divx; tmax = (bounds[0].x() - r.origin.x()) * divx; }
Two multiplies are faster than the divide, but also handles the -0 case. 'divx' captures the sign of the r.direction even when it is zero. Consider 1/0.0 = +INF, 1/-0.0 = -INF. Willams then proposes a faster version that precomputes the inverse directions as well as the result of the sign tests (such as divx >= 0) in the ray structure.
tmin = (bounds[r.sign[0]].x() - r.origin.x()) * r.inv_direction.x(); tmax = (bounds[1-r.sign[0]].x() - r.origin.x()) * r.inv_direction.x(); tymin = (bounds[r.sign[1]].y() - r.origin.y()) * r.inv_direction.y(); tymax = (bounds[1-r.sign[1]].y() - r.origin.y()) * r.inv_direction.y(); if ( (tmin > tymax) || (tymin > tmax) ) return false; if (tymin > tmin) tmin = tymin; if (tymax < tmax) tmax = tymax; tzmin = (bounds[r.sign[2]].z() - r.origin.z()) * r.inv_direction.z(); tzmax = (bounds[1-r.sign[2]].z() - r.origin.z()) * r.inv_direction.z(); if ( (tmin > tzmax) || (tzmin > tmax) ) return false; if (tzmin > tmin) tmin = tzmin; if (tzmax < tmax) tmax = tzmax; return ( (tmin < t1) && (tmax > t0) );
____
I like Williams' approach of storing the extra data (1/dir and signs of dir) with the ray, but all the tests still worried me. There is a lot of branch misprediction in there. So I adopted the following approach of using SSE's min/max and horizontal min/max (which is only available in SSE 4).
Manny version:
static const SSE::Vec3 kINDVec3 = SSE::Invert4( SSE::Vec3(0.f,0.f,0.f) ); //(-1.0#IND,-1.0#IND,-1.0#IND) BBox::ClipRay(const Ray& r, float* tmin, float* tmax, float raymin, float raymax) const { const SSE::Vec3& rayorig = (r.GetOrigin().m_v); const SSE::Vec3& rayinvdir = (r.GetInvDir().m_v); SSE::Vec3 t0v, t1v; t0v = ( (m_points[0] - rayorig) * rayinvdir ); //tmins - same as Williams t1v = ( (m_points[1] - rayorig) * rayinvdir ); //tmaxs - same as Williams if (result != 0) { //handle #IND ClampTs( t0v, t1v, indmask ); } SSE::Vec3 t0v_n, t1v_n; t0v_n = SSE::Min( t0v, t1v ); //_mm_min_ps t1v_n = SSE::Max( t0v, t1v ); //_mm_max_ps SSE::Scalar interval_min, interval_max; SSE::Scalar t0, t1; //help keep them within SSE registers t0 = SSE::MaxHorz3( t0v_n ); //horizontal-max t1 = SSE::MinHorz3( t1v_n ); //horizontal-min interval_min = MAX1( t0, raymin ); //SSE 1-compoment Max interval_max = MIN1( t1, raymax ); //SSE 1-compoment Min *tmin = interval_min * kBBEPSMIN; //* 0.9999f *tmax = interval_max * kBBEPSMAX; //* 1.0001f return (interval_min <= interval_max); }
This code runs very fast and hardly takes any time at all. Instead of adding an epsilon to prevent a degenerate interval, I prefer to multiply by 0.9999 and 1.0001. Adding an epsilon is not always robust due to the nature of floating point math. See [Goldberg91] or Christer Ericson's book [Ericson05] for a more detailed exposition. The horizontal min/max can be replaced by several SSE instructions on older CPUs and the code still runs well.
I also found doing:
t0 = MaxHorz3( t0v_n ); //horizontal-max t1 = MinHorz3( t1v_n ); //horizontal-min interval_min = MAX1( t0, raymin ); //SSE 1-compoment Max interval_max = MIN1( t1, raymax ); //SSE 1-compoment Min
is a little bit faster than:
t0v_n = MaxVec3( t0v_n, Vec3(raymin) ); t1v_n = MinVec3( t1v_n, Vec3(raymax) ); interval_min = MaxHorz3( t0v_n ); //horizontal-max interval_max = MinHorz3( t1v_n ); //horizontal-min
____
References
[Ericson05] Real-Time Collision Detection. Morgan Kaufmann, 2005.
[Goldberg91] "What Every Computer Scientist Should Know About Floating-Point Arithmetic",
ACM Computing Surveys, Vol 23, No 1, March 1991.
[Smits98] "Efficiency Issues for Ray Tracing", JGT, vol 3:2, 1998, pp. 1-14.
[Smits02] "Efficient Bounding Box Intersection", Ray Tracing News 15:1 (2002).
[Williams03] "An efficient and robust ray-box intersection algorithm", JGT, vol 10:1 (2005), pp. 49–54.
back to contents
Unfortunately, no one took the bait and solved it for me. Gilbert Bernstein did reply about this problem, proving that the length of the perimeter of any triangle which is randomly oriented is proportional to the average mean width of the set of axis-aligned bounding boxes put around this triangle. The mean width of a bounding box is the sum of its three axes, height + width + depth (see puzzle answer in the RTNews). From testing below, it looks like the ratio is 4/3. To summarize:
Sum of triangle's edge lengths = 4/3 * ( height + width + depth ) of the "average summed" bounding box.
I like that this is true for any triangle, but it wasn't the answer I wanted - cold, hard numbers is what I was hoping to get. I decided to hack a quick perl program to give me an approximation, if not a formula. I tried it on a few common triangles:
Equilateral triangle: 0.267 - 1 in 3.75 Isosceles right triangle (edges 1,1,sqrt(2): 0.243 - 1 in 4.12 Golden ratio right triangle (short edges 1,1.618): 0.226 - 1 in 4.42 Thin right triangle (short edges 1, 0.1): 0.069 - 1 in 14 Very thin right triangle (short edges 1, 0.01): 0.0077 - 1 in 130The equilateral triangle seems like it would be a "best case", since it has the highest area to perimeter-squared ratio, and this indeed appears to be so, it giving the highest hit ratio of the triangles tested. The isosceles right triangle is a worthwhile case, in that two of these form a square, a common element in a mesh primitive. I recall reading in some "Mathematical Games" column that some person went around and measured rectangle ratios, wherever he'd see them. The average ratio was around the golden ratio, phi. A golden ratio right triangle forms such rectangles, so should also be a fairly common triangle type. The thin triangles were run just to see how bad things can get; no surprise, a randomly-oriented triangle that is needle-like, where area/perimeter is low, is likely to be contained by a proportionally huge bounding box.
I think this is worth knowing, as many of us are constantly putting bounding boxes around triangles. For axis-aligned triangles, the best we can ever do is 0.5, half the time the triangle will be hit by a ray piercing the bounding box. For arbitrary triangles, the best is 0.267 for an equilateral, and more likely the triangle will be hit by a box-piercing ray less than one time in four.
This hit ratio has a bearing on the performance of various efficiency schemes. For example, it appears BIH (bounding interval hierarchy) works well on laser- scanned data, where the triangles are more equilateral and rarely thin. Synthetic model data, such as long pipes in the Factory model, or even endcaps of a cylinder, produce long, thin triangles that are less likely to be hit.
It seems to me an important characteristic of any scene is what its hit ratio would be for its triangles, weighted by their areas (their chance of getting hit). For a static scene, this is easily done: look at every triangle, compute its bounding box, and get the ratio of areas directly. This would be an interesting number to know: sum of triangle areas / sum of bounding box areas.
Where it gets trickier is when objects in the scene are animated by rotation, which is where a formula would be handy, to know how bad a model can become when it is rotated. A model full of arbitrarily-oriented equilateral triangles to start won't be worse when rotated; a model with long, thin, axis-aligned triangles will have tight bounds initially, but terrible bounding boxes once rotated. The paper "Edge Volume Heuristic - Robust Triangle Subdivision for Improved BVH Performance" by Holger Dammertz and Alexander Keller discusses the effects of such rotation and tackles the problem of subdividing triangles that give poor triangle/box hit ratios.
A tiny project, maybe good for an undergrad's study project: what's the optimal triangulation of a polygon with the goal of minimizing the sum of the bounding box areas? For example, a polyline forming a circle could be triangulated into a fan (choose one vertex and fan from there) or a triangle strip (a zig-zag of triangles), or some mix of these, or some entirely arbitrary pattern. Which is best?
I think there's much more work to be done here; what is of interest to me is whether we can find statistical characterizations of a scene's data and determine any correlations with the effectiveness of various efficiency schemes. It's probably impractical (and as the Arnold article in this issue shows, other factors such as motion blur can entirely change the choice of efficiency scheme), but I like the idea that a firm could feed some typical models into an analyzer to find out what sort of efficiency scheme would do their data the most good.
back to contents
The intersections page continues to be somewhat relevant to ray tracing (though, really, ray/triangle and ray/bezier are probably all you need). Our graphics books recommendation section also has some relevant bits.
For blog entries, first of all, I edited a "book" on ray tracing, called "Another Introduction to Ray Tracing". 471 pages, cool cover, just $22.84, with one small caveat: it's a bunch of Wikipedia articles and took me two hours to make. The free version is here. Don't like my book? Simply grab the contents of that page and make your own! I wrote an article about it and other (much scammier) print-on-demand schemes, read it here. Writing this today, I decided to actually order a physical copy, goofy as it is, just to see what that's like. It's sort of like an existence proof: a print-on-demand book doesn't really exist until you print at least one physical copy.
I do recommend taking the quiz, answers here.
There's an entry about a trick that everyone should know, but many don't, about looping through a list of vertices.
My article on morphological antialiasing (MLAA) gives a brief rundown of the idea and some experiments. For scenes with sharp edges in need of antialiasing, this is an interesting option. This can be quite cheap for ray tracing, compared to the cost of shooting more rays. I find it falls apart on thin objects that span a pixel or two in width and pixels at the edges of the image are problematic, but otherwise the results are pretty convincing. Naty has a few followup articles of this type of technique being used in a game, here and here.
I wrote two articles on left-handed vs. right-handed coordinate systems, on world coordinates/A> and viewing coordinates. I'm not really satisfied with them, they're still a work in progress for me. I think this area still needs a thorough clear treatment from world to screen; I ignore the Y-flipping problem often run into going from Cartesian coordinates to screen coordinates. Still, I got a bit sick of seeing "just flip the Z coordinate" as the flip answer for converting from one coordinate system to another. Yes, this can work, but you have to understand what in the world you're doing and what you're giving up.
Nothing to do with ray tracing, but I liked the idea of a constant luma palette.
The Octane 2 renderer is a pleasant progressive ray tracer.
The next-to-last entry here shows the dangers of ray tracing.
Our most popular post over the years has been this simple one.
Finally, some visual treats, all the stuff I didn't stick in the Roundup, here, here, and here.
back to contents