2011 Color and Imaging Conference, Part III: Courses B

This post covers the rest of the CIC 2011 courses that I attended; it will be followed by posts describing the other types of CIC content (keynotes, papers, etc.).

Lighting: Characterization & Visual Quality

This course was given by Prof. Françoise Viénot, Research Center of Collection Conservation, National Museum of Natural History, Paris, France.

During the 20th century, the major forms of light were incandescent (including tungsten-halogen) and discharge (fluorescent as well as compact fluorescent lamps – CFL). Incandescent lights glow from heat, and discharge lamps include an energetic spark or discharge which emits a lot of UV light, which is converted by a fluorescent coating to visible light. LEDs are relatively new as a lighting technology; they emit photons at a frequency based on the bandgap between semiconductor quantum energy levels. LEDs are often combined with fluorescent phosphors to change the light color.

Correlated Color Temperature is used to describe the color of natural or “white” light sources. CCT is defined as the temperature of the blackbody (an idealized physical object which glows due to its heat) which has the color nearest the color of the tested illuminant. Backbody colors range from reddish at around 1000K (degrees Kelvin) through yellow, white and finally blue-white (for temperatures over 10,000K). The CCT is only defined for illuminants with colors reasonably near one of these so it is meaningless to talk about the CCT of an, e.g., green or purple light. “Nearest color” is defined on a CIE uv chromaticity diagram. Reciprocal CCT (one over CCT) is also sometimes used – reciprocal CCT lines are spaced very nearly at equal uv distances which is a useful coincidence (for example, “color temperature” interface sliders should work proportionally to reciprocal CCT for intuitive operation).

Perhaps confusingly, in terms of psychological effect low CCT corresponds to “warm colors” or “warm ambience” and high CCT corresponds to “cool colors” or “cool ambience”. Desirable interior lighting is about 3000K CCT.

Light manufacturers do not reproduce the exact spectra of real daylight or blackbodies, they produce metamers (different spectra with the same perceived color) of the white light desired. For example, four different lights could all match daylight with CCT 4500K in color, but have highly different spectral distributions. Actual daylight has a slightly bumpy spectral power distribution (SPD), incandescent light SPDs are very smooth, discharge lamps have quite spiky SPDs, and LED SPDs tend to have two somewhat narrow peaks.

Since LEDs are a new technology they are expected to be better or at least equal to existing lighting technologies. Expectations include white light, high luminous efficacy (converting a large percentage of its energy consumption on visible light and not wasting a lot on UV or IR), low power consumption, long lifetime, high values of flux (emitted light quantity), innovations such as dimmability and addressability, and high visual quality (color rendition, comfort & well-being). LED light is clustered into peaks that are not quite monochromatic – they are “quasi-monochromatic” with a smooth but narrow peak (spectral width around 100nm).

Most white light LEDs are “phosphor-converted LEDs” – blue LEDs with fluorescent powder (phosphor) that captures part of the blue light and emits yellow light, creating two peaks (blue and yellow) which produce an overall white color. By balancing the two peaks (varying the amount of blue light captured by the fluorescent powder), LED lights with different CCTs can be produced. It is also possible to add a second phosphor type to create more complex spectra. New LED lights are under development that use a UV-emitting LED coupled with 3 phosphor types.

An alternative approach to producing white-light LEDs is to create “color-mixed LEDs” combining red, green, and blue LEDs. There are also hybrid mixtures with multiple LEDs as well as phosphors. This course focused on phosphor-converted LEDs. They have better color rendition and good luminous efficacy, and are simple to control. On the other hand, RGB color-mixed LEDs have the advantage of being able to vary color dynamically.

Regarding luminous efficacy, in the laboratory cool white LED lamps can achieve very high values – about 150 lumens per Watt (steady-state operation). Commercially available cool white LED lamps can reach a bit above 100 lm/Watt, commercial warm white ones are slightly lower. US Department of Energy targets are for commercial LED lights of both types to approach 250 lm/Watt by 2020.

Intensity and spectral width strongly depend on temperature (cooling the lamp makes it brighter and “spikier”, heating does the opposite). Heat also reduces LED lifetime. As LEDs age, their flux (light output) decreases, but CCT doesn’t typically change. The rate of flux reduction varies greatly with manufacturer.

One way to improve LED lifetime is to operate it for short durations (pulse width modulation). This is done at a frequency between 100-2000 Hz, and of course reduces the flux produced.

Heat dissipation is the primary problem in scaling LED lights to high-lumen applications (cost is also a concern) – they top out around 1000 lumens.

The Color Rendering Index (CRI) is the method recommended by the CIE to grade illumination quality. The official definition of color rendering is “effect of an illuminant on the colour appearance of objects by conscious or subconscious comparison with their colour appearance under a reference illuminant”. The instructor uses “color rendition”, for which she has a simpler definition: “effect of an illuminant on the colour appearance of objects”.

CIE’s procedure for measuring the general color rendering index Ra consists of comparing the color of a specific collection of eight samples when illuminated by the tested light vs. a reference light. This reference light is typically a daylight or blackbody illuminant with the same or similar CCT as the tested light (if the tested light’s color is too far from the blackbody locus to have a valid CCT a rendering index cannot be computed; in any case such an oddly-colored light is likely to have very poor color rendering). The process includes a Von Kries chromatic adaptation to account for small differences in CCT between the test and reference light sources. After both sets of colors are computed, the mean of the chromatic distances between the color pairs is used to generate the CIE color rendering index. The scaling factors were chosen so that the reference illuminant itself would get a score of 100 and a certain “known poor” light source would get a score of 50 (negative scores are also possible). For office work, a score of at least 80 is required.

Various problems with CRI have been identified over the years, and alternatives have been proposed. The Gamut Area Index (GAI) is an approach that describes the absolute separation of the chromaticities of the eight color chips, rather than their respective distances vis-à-vis a reference light. Incandescent lights tend to get low scores under this index. Another alternative metric, the Color Quality Scale (CQS) was proposed by the National Institute of Standards and Technology (NIST). It is similar to CRI in basic approach but contains various improvements in the details. Other approaches focus on whether observers find the colors under the tested light to be natural or vivid.

In general, there are two contradicting approaches in selecting lights. You can either emphasize fidelity, discrimination and “naturalness”, or colorfulness enhancement and “beautification” you can’t have both. Which is more desirable will depend on the application. For everyday lighting situations, full-spectrum lights are likely to provide the best combination of color fidelity and visual comfort.

There are also potential health issues – lights producing a high quantity of energy in the blue portion of the spectrum may be harmful for the vision of children as well as adults with certain eye conditions. In general, the “cooler” (bluer) the light source, the greater the risk, but there are other factors, such as brightness density. Looking directly at a “cool white” LED light is most risky; “warm white” lights of all types as well as “cool white” frosted lamps (which spread brightness over the lamp surface) are more likely to be OK.

The Role of Color in Human Vision

This course was taught by Prof. Kathy T. Mullen from the Vision Research Unit in the McGill University Dept. of Ophthalmology.

Prof. Mullen started by stating that primates are the only trichromats (having three types of cones in the retina) am0ng mammals – all other mammals are dichromats (have two types of cones in the retina). One of the cone types mutated into two different ones relatively recently (in evolutionary terms). There is evidence that other species co-evolved with primate color vision (e.g. fruit colors changed to be more visible to primates).

The Role of Color Contrast in Human Vision

Color contrast refers to the ability to see color differences in the visual scene. It allows us to better distinguish boundaries, edges, and objects.

Color contrast has 4 roles.

Role 1: Detection of objects that would otherwise be invisible due to being seen against a dappled backgrounds – for example, seeing red berries among semi-shadowed green foliage.

Role 2: Segregation of the visual field into elements that belong together – if an object’s silhouette is split into several parts by closer objects, color enables us to see that these are all parts of the same object.

Role 3: Helps tell the difference between variations in surface color and variations in shading. This ability depends on whether color and achromatic contrasts coincide spatially or not. For example, a square with chrominance stripes (stripes of different color but the same luminance) at 90 degrees to luminance stripes (stripes that only change luminance) is strongly perceived as a 3D shaded object. If the chrominance and luminance stripes are aligned, then the object appears flat.

Role 4: Distinguishing between otherwise similar objects. This leads into color identification. If after distinguishing objects by color, we can also identify the colors, then we can infer more about the various object’s properties.

Color Identification and Recognition

Color identification & recognition is a higher, cognitive stage of color vision, which involves color identification / recognition and color naming. It requires an internalized “knowledge” of what the different colors are. There is a (very rare) condition called “color agnosia” where color recognition is missing – people suffering from this condition perform normally on (e.g.) color-blindness vision tests, but they can’t identify or name colors at all.

Color is an object property. People group, categorize and name colors using 11 basic color categories: Red, Yellow, Green, Blue, Black, Grey, White, Pink, Orange, Purple, and Brown (there is some evidence that Cyan may also be a fundamental category).

Psychophysical Investigations of Color Contrast’s Role in Encoding Shape and Form

For several decades, vision research was guided by an understanding of color’s role which Prof. Mullen calls the “coloring book model”. The model holds that achromatic contrast is used to extract contours and edges and demarcates the regions to be filled in by color, and color vision has a subordinate role – it “fills in” the regions after the fact. In other words, color edges have no role in the initial shape processing occurring in the human brain.

To test this model, you can perform experiments that ask the following questions:

  1. Does color vision have the basic building blocks needed for form processing: spatially tuned detectors & orientation tuning?
  2. Can color vision extract contours and edges from the visual scene?
  3. Can color vision discriminate global shapes?

The coloring book model would predict that the answer to all of these questions is “no”.

Prof. Mullen then described several experiments done to determine the answers to these questions. These experiments relied heavily on “isoluminant colors” – colors with different chromaticity but the same luminance. The researchers needed extremely precise isolation of luminance, so they had to find individual isoluminant color pairs for each observer. This was done via an interesting technique called “minimum motion”, which relies on the fact that color vision is extremely poor at detecting motion. The researchers had observers stare at the center of an image of a continually rotating wheel with two alternating colors on the rim. The colors were varied until the rim appeared to stop turning – at that point the two colors were recorded as an isoluminant pair for that observer.

The experiments showed that color vision can indeed extract contours and edges from the scene, and discriminate global shapes, although slightly less well than achromatic (luminance) vision. It appears that the “coloring book” model is wrong – color contrast can be used in the brain in all the same ways luminance contrast can. However, color vision is relatively low-resolution, so very fine details cannot be seen without some luminance contrast.

The Physiological Basis of Color Vision

Color vision has three main physiological stages:

  1. Receptoral (cones) – light absorption – common to all day time vision
  2. Post receptoral 1 – cone opponency extracts color but not color contrast
  3. Post receptoral 2: double cone opponency extracts color contrast

The retina has three types of cone cells used for day time (non-low-light) vision. Each type is sensitive to a different range of wavelengths – L cones are most sensitive to long-wavelength light, M cones are most sensitive to light in the middle of the visual spectrum, and S cones are most sensitive to short-wavelength light.

Post-receptoral 1: There are three main types of neurons in this layer, each connected to a local bundle of differently-typed cones. One forms red-green color vision from the opponent (opposite-sign) combination of L and M cones. The second forms blue-yellow color vision from the opponent combination of S with L and M cones. These two types of neurons are most strongly excited (activated) by uniform patches of color covering the entire cone bundle (some of them serve a different role by detecting luminance edges instead). The third type of neuron detects the luminance signal, and is most strongly excited by a patch of uniform luminance covering the entire cone bundle.

Post-receptoral 2: these are connected to a bundle of neurons from the “post-receptoral 1” phase – of different polarity; for example, a combination of “R-G+” neurons (that activate when the color is less red and more green) and “R+G-“ neurons (that activate when the color is more red and less green). Such a cell would detect red-green edges (a similar mechanism is used by other cells to detect blue-yellow edges). These types of cells are only found in the primate cortex – other types of mammals don’t have them.

Introduction to Multispectral Color Imaging

This course was presented by Dr. Jon Y. Hardeberg from the Norwegian Color Research Laboratory at Gjøvik University College.

Metamerism (the phenomena of different spectral distributions which are perceived as the same color) is both a curse and a blessing. Metamerism is what enables our display technologies to work. However, two surfaces with the same appearance under one illuminant may very well have a different appearance under another illuminant.

Besides visual metamerism, you can also have camera metamerism – a camera can generate the same RGB triple from two different spectral distributions. Most importantly, camera metamerism is different than human metamerism. For the two to be the same, the sensor sensitivity curves of the camera would have to be linearly related to the human cone cell sensitivity curves. Unfortunately, this is not true for cameras in practice. This means that cameras can perceive two colors as being different when humans would perceive them to be the same, and vice versa.

Multispectral color imaging is based on spectral reflectance rather than ‘only’ color; the number of channels required is greater than the three used for colorimetric imaging. Multispectral imaging can be thought of as “the ultimate RAW” – capture the physics of the scene now, make the picture later. Applications include fine arts / museum analysis and archiving, medical imaging, hi-fi printing and displays, textiles, industrial inspection and quality control, remote sensing, computer graphics, and more.

What is the dimensionality of spectral reflectance? This relates to the number of channels needed by the multispectral image acquisition system. In theory, spectral reflectance has infinite dimensionality, but objects don’t have arbitrary reflectance spectra in practice. Various studies have been done to answer this problem, typically using PCA (Principal Component Analysis). However, these studies tend to produce a wide variety of answers, even when looking at the same sample set.

For the Munsell color chip set, various studies have derived dimensionalities ranging from 3 to 8. For paint/artwork from 5 to 12, for natural/general reflectances from 3 to 20. Note that these numbers do not correspond to a count of required measurement samples (regularly or irregularly spaced), but to the number of basis spectra required to span the space.

Dr. Hardeberg did a little primer on PCA. Plotting the singular values can let you know when to “cut off” further dimensions. He proposed cutting off dimensions after the accumulated energy reaches 99% of the total.effective dimensionality based on 99% of accumulated energy – accumulated energy sounds like a good measure for PCA.

Dr. Hardeberg next discussed his own work on dimensionality estimation. He analyzed several reflectance sets:

  • MUNSELL: 1269 chips with matte finish, available from the University of Joensuu at Finland.
  • NATURAL: 218 colored samples collected from nature, also available at Joensuu.
  • OBJECT: 170 natural and man-made objects, online courtesy Michael Vhrel.
  • PIGMENTS: 64 oil pigments used in painting restoration, provided to ENST by National Gallery under the VASARI project (not available online)
  • SUBLIMATION: 125 equally spaced patches of a Mitsubishi S340-10 CMY sublimation printer

Based on the 99% accumulated energy criterion, he found the following dimensionalities for the various sets: 18 for MUNSELL, 23 for NATURAL, 15 for OBJECT, 13 for PIGMENTS, 10 for SUBLIMATION. The results suggest that 20 dimensions is a reasonable general-purpose number, but the optimal number will depend on the specific application.

The finding of 10 dimensions for the SUBLIMATION dataset may b e viewed as surprising since only three colorants (cyan, magenta, and yellow ink) were used. This is due to the nonlinear nature of color printing. A nonlinear model could presumably use as few as three dimensions, but a linear model needs 10 dimensions to get to 99% accumulated energy.

Multispectral color image acquisition systems are typically based on a monochrome CCD camera with several color filters. There are two variants – passive (filters in the optical path) and active (filters in the light path). Instead of multiple filters it is also possible to use a single Liquid Crystal Tunable Filter (LCTF). Dr. Hardeberg gave brief descriptions of several multispectral acquisition systems in current use, ranging from 6 to 16 channels.

Getting spectral reflectance values out of the multichannel measured values requires some work – Dr. Hardeberg detailed a model-based approach that takes a mathematical model of the acquisition device (how it measures values based on spectral input) and inverts it to general spectral reflectance from the measured values.

There is work underway to find spectral acquisition systems that are cheaper, easier to operate, and faster while still generating high-quality reflectance data. One of these is happening in Dr. Hardeberg’s group, based on a Color-Filter Array (CFA) – similar to the Bayer mosaics found in many digital camera, but with more channels. This allows capturing spectral information in one shot, with one sensor. Another example is a project that takes a stereo camera and puts different filters on each of the lenses, processing the resulting images to get stereoscopic spectral images with depth information.

Dr. Hardeberg ended by going over various current research areas for improving multispectral imaging, including a new EU-sponsored project by his lab which is focusing on multispectral printing.

Fundamentals of Spectral Measurements for Color Science

This course was presented by Dr. David R. Wyble, Munsell Color Science Lab at Rochester Institute of Technology.

Colorimetry isn’t so much measuring a physical value, as predicting the impression that will be formed in the mind of the viewer. Spectral measurements are more well-defined in a physical sense.

Terminology: Spectrophotometry measures spectral reflectance, transmittance or absorptance of a material as a function of wavelength. The devices used are spectrophotometers, which measure the ratio of two spectral photometric quantities, to determine the properties of objects or surfaces. Spectroradiometry is more general – measurement of spectral radiometric quantities. The devices used (spectroradiometers) work by measuring spectral radiometric quantities to determine the properties of light sources and other self-luminous objects. Reflectance, transmittance, absorptance are numerical ratios; the words “reflection”, “transmission”, and “absorption” refer to the physical processes. Most spectrophotometers measure at 10nm resolution, and spectroradiometers typically at 5-10nm.

Spectrophotometers

Spectophotometers measure a ratio with respect to a reference, so no absolute calibration is needed. For reflectance we reference a Perfect Reflecting Diffuser (PRD) and for transmittance we use air. A PRD is a theoretical device – a Lambertian diffuser with 100% reflectance. Calibration transfer techniques are applied to enable the calculation of reflectance factor from available measured data.

Reflectance is the ratio of the reflected flux to the incident flux (problem – measuring incident flux). Reflectance Factor is the ratio of the flux reflected from the sample to the flux that would be reflected from an identically irradiated PRD (problem – where’s my PRD?).

The calibration equation (or why we don’t need a PRD): a reference sample (typically white) is provided together with Rref(λ) – the known spectral reflectance of the sample (λ stands for wavelength). This sample is measured to provide the “reference signal” iref(λ). In addition, “zero calibration” (elimination of dark current, stray light, etc.) is performed by measuring a “dark signal” idark(λ). Dark signal is measured either with a black reference sample or “open port” (no sample in the device). The calibration equation combines Rref(λ), iref(λ) and idark(λ) with the measured sample intensity isample(λ) to get the sample’s spectral reflectance Rsample(λ):

Rsample (λ) = Rref (λ) * (isample (λ) – idark (λ)) / (iref (λ) – idark (λ))

Note that Rref(λ) was similarly measured against some other reference, and so on. So you have a pedigree of standards, ultimately leading to some national standards body. For example, if you buy a white reference from X-Rite, it was measured by X-Rite against a white tile they have that was measured at the National Institute of Standards and Technology (NIST).

A lot of lower-cost spectrophotometers don’t come with a reflectance standard – Dr. Wyble isn’t clear on how those work. You can always buy a reflectance standard separately and do the calibration yourself, but that is more risky – if it all comes from the same manufacturer you can expect that it was done properly.

Transmittance is the ratio of transmitted flux to incident flux. At the short path lengths in these devices air is effectively a perfect transmitter for visible light. So a “transmittance standard” is not needed since the incident flux can be measured directly – just measure “open port” (no sample) – for liquids you can measure an empty container, and in the case of measuring specific colorants which are dissolved in a carrier fluid you could measure with a container full of clean carrier fluid.

Calibration standards must be handled, stored and cleaned with care according to manufacturer instructions, otherwise incorrect measurement will result. A good way to check is to measure the white standard and check the result just before measuring the sample.

A spectrophotometer typically includes a light source, a sample holder, a diffraction grating (for separating out spectral components) and a CCD array sensor, as well as some optics.

Measurement geometry refers to the measuring setup; variables such as the angles of the light source and sensor to the sample, the presence or absence of baffles to block certain light paths, the use (or not) of integrating hemispheres, etc. Dr. Wyble went into a few examples, all taken from the CIE 15.2004 standards document. Knowledge of which measurement geometry was used can be useful, e.g. to estimate how much specular reflectance was included in a given measurement (different geometries exclude specular by different degrees). Some special materials (“gonio-effects” pigments that change color based on angle, fluorescent, metallic, retroreflective, translucent, etc.) will break the standard measurement geometries and need specialized measuring methods.

Spectroradiometers

Similar to spectrophotometers, but have no light source or sample holder. The light from the luminous object being measured goes through some optics and a dispersing element to a detector. There are no standard measurement geometries for spectroradiometry.

Some spectroradiometers measure radiance directly emitted from the source through focused optics (typically used for measuring displays). Others measure irradiance – the light incident on a surface (typically used for measuring illuminants). Irradiance measurements can be done by measuring radiance from a diffuse white surface, such as pressed polytetrafluoroethylene (PTFE) powder.

Irradiance depends on the angle of incident light and the distance of the detector. Radiance measured off diffuse surfaces is independent of angle to the device. Radiance measured off uniform surfaces is independent of distance to the device.

Instrument Evaluation: Repeatability (Precision) and Accuracy

Repeatability – do you get similar results each time? Accuracy – is the result (on average) close to the correct one? Repeatability is more important since repeatable inaccuracies can be characterized and corrected for.

Measuring repeatability – the standard deviations of reflectance or colorimetric measurements. The time scale is important: short-term repeatability (measurements one after the other) should be good for pretty much any device. Medium-term repeatability is measured over a day or so, and represents how well the device does between calibrations. Long-term repeatability is measured over weeks or months – the device would typically be recalibrated several times over such an interval. The most common measure of repeatability is Mean Color Difference from the Mean (MCDM). It is measured by making a series of measurements of the same sample (removing and replacing each time to simulate real measurements), calculating L*a*b* values for each, calculating the mean, calculate ΔE*ab between each value and the mean, and finally averaging the ΔE*ab values to get MCDM. The MCDM will typically be about 0.01 (pretty good) to 0.4 (really bad). Small handheld devices commonly have around 0.2.

Quantifying accuracy – typically done by measuring the spectral reflectance of a set of known samples (e.g. BCRA tiles) that have been previously measured at high-accuracy laboratories: NIST, NRC, etc. The measured values are compared to the “known” values and the MCDM is calculated as above. Once the inaccuracy has been quantified, this can be used to correct further measurement with the device (using regression analysis). When applied to the test tile values, the correction attempts to match the reference tile values. When applied to measured data, the correction attempts to predict reflectance data as if the measurements were made on the reference instrument. Note that the known values of the samples have uncertainties in them. The best uncertainty you can get is the 45:0 reflectometer at NIST, which is about 0.3%-0.4% (depending on wavelength) – you can’t do better than that.

Using the same procedure, instead of aligning your instruments with NIST, you can align a corporate “fleet” of instruments (used in various locations) to a “master” instrument.

2011 Color and Imaging Conference, Part II: Courses A

CIC traditionally includes a strong course program, with a two-day course on fundamentals (a DVD of this course presented by Dr. Hunt can be purchased online) and a series of short courses on more specialized topics. Since I attended the fundamentals course last year, this year I only went to short courses. This blog post will detail three of these courses, with the others covered by a future post.

Color Pipelines for Computer Animated Features

The first part of the course was presented by Rod Bogart. Rod is the lead color science expert at Pixar, and worked on color-related issues at ILM before that.

The animated feature pipeline has many steps, some of which are color-critical (underlined) and some which aren’t: Story, Art, Layout, Animation, Shading, Lighting, Mastering, and Exhibition. The people working on the underlined stages are the ones with color-critical monitors on their desks. Rod’s talk went through the color-critical stages of the pipeline, discussing related topics on the way.

Art

In this stage people look at reference photos, establish color palettes, and do look development. Accurate color is important. Often, general studies are done on how exteriors, characters, etc. might look. This is mostly done in Photoshop on a Mac.

Art is the first stage where people make color-critical images. In general, all images made in animated feature production exist for one of two reasons – for looking at directly, or to be used for making more images (e.g., textures). The requirements for image processing will vary depending on which group they belong to. During the Art stage the images generated are intended for viewing.

Images for viewing can be quantized as low as 8 bits per channel, and even (carefully) compressed. Pixel values tend to be encoded to the display device (output referred). In the absence of a color management system, the encoding just maps to frame buffer values, which feed into a display response curve. However, it is better to tag the image with an assumed display device (ICC tagging to a target like sRGB; other metadata attributes can be stored with the image as well). It’s important to minimize color operations done on such images, since they have already been quantized and have no latitude for processing. These images contain low dynamic range (LDR) data.

During the Art phase, images are typically displayed on RGB additive displays calibrated to specific reference targets. Display reference targets include specifications for properties such as the chromaticity coordinates of the RGB primaries and white point, the display response curve, the display peak white luminance and the contrast ratio or black level.

Shading

Shading and antialiasing operations need to occur on linear light values – values that are proportional to physical light intensity. Other operations that require linear values include resizing, alpha compositing, and filtering. Rendered buffers are written out as HDR values and later used to generate the final image.

Lighting

Lighting is sometimes done with special light preview software, and sometimes using other methods such as “light soloing”. “Light soloing” is a common practice where a buffer is written out for the contribution of each light in the scene (all other lights are set to black) and then the lighters can use compositing software to vary individual light colors and intensities and combine the results.

For images such as these “solo light buffers” which are used to assemble viewable images, Pixar uses the OpenEXR format. This format stores linear scene values with a logarithmic distribution of numbers – each channel is a 16-bit half-float. The range of possible values is -65505.0 to +65505.0. The positive range can be thought of as 32 stops (powers of 2) of data, with 1024 steps in each of the stops.

After images are generated, they need to be viewed. This is done in various review spaces: monitors (CRT or calibrated LCD) on people’s desks, as well as various special rooms (review rooms, screening rooms, grading suites) where images are typically shown on DLP projectors. In review rooms the projector is usually hooked up directly to a workstation, while screening rooms use special digital cinema playback systems or “dailies” software. Pixar try not to have any monitors in the screening rooms – screening rooms are dark and the monitors are intended (and calibrated) for brighter rooms.

Mastering

The mastering process includes in-house color grading. This covers two kinds of operations: shot-to-shot corrections and per-master operations. An example of a shot-to-shot correction: in “Cars” in one of the shots the grass ended up being a slightly different color than in other shots in the sequence – instead of re-rendering the shot, it was graded to make the grass look more similar to the other shots. In contrast, per-master operations are done to make the film fit a specific presentation format.

Mastering for film: film has a different gamut than digital cinema projection. Neither is strictly larger – each has colors the other can’t handle. Digital is good for bright, saturated colors, especially primary colors – red, green, and blue. Film is good for dark, saturated colors, especially secondary colors – cyan, magenta, and yellow. Pixar doesn’t generate any film gamut colors that are outside the digital projection gamut, so they just need to worry about the opposite case – mapping colors from outside the film gamut so they fit inside it, and previewing the results during grading. Mapping into the film gamut is complex. Pixar try to move colors that are already in-gamut as little as possible (the ones near the gamut border do need to move a little to “make room” for the remapped colors). For the out-of-gamut colors, first Pixar tried a simple approach – moving to the closest point in the gamut boundary. However, this method doesn’t preserve hue. An example of the resulting problems: in the “Cars” night scene where Lightning McQueen and Mater go tractor-tipping, the closest-point gamut mapping made Lightning McQueen’s eyes go from blue (due to the night-time lighting) to pink, which was unacceptable. Pixar figured out a proprietary method which involves moving along color axes. This sometimes changes the chroma or lightness quite a bit, but tends to preserve hue and is more predictable for the colorist to tweak if needed. For film mastering Pixar project the content in the P3 color space (originally designed for digital projection), but with a warmer white point more typical of analog film projection.

Mastering for digital cinema: color grading for digital cinema is done in a tweaked version of the P3 color space – instead of using the standard P3 white point (which is quite greenish) they use D65, which is the white point people have been using on their monitors while creating the content. Finally a Digital Cinema Distribution Master (DCDM) is created – this stores colors in XYZ space, encoded at 12 bits per channel with a gamma of 2.6.

Mastering for HD (Blu-ray and HDTV broadcast): color grading for HD is done in the standard Rec.709 color space. The Rec.709 green and red primaries are much less saturated than the P3 ones; the blue primary has similar saturation to the P3 blue but is darker. The HD master is stored in RGB, quantized to 10 bits. Rod talked about the method Pixar use for dithering while quantization – it’s an interesting method that might be relevant for games as well. The naïve approach would be to round to the closest quantized value. This is the same as adding 0.5 and rounding down (truncating). Instead of adding 0.5, Pixar add a random number distributed uniformly between 0 and 1. This gives the same result on average, but dithers away a lot of the banding that would otherwise result.

Exhibition

Exhibition for digital cinema: this uses a Digital Cinema Package (DCP) in which each frame is compressed using JPEG2000. The compression is capped to 250 megabits per second – this limit was set during the early days of digital cinema, and any “extra features” such as stereo 3D, 4K resolution, etc. still have to fit under the same cap.

Exhibition for HD (Blu-ray, HDTV broadcast): the 10-bit RGB master is converted to YCbCr, chroma subsampled (4:2:2) and further quantized to 8 bits. This is all done with careful dithering, just like the initial 10 bit quantization. MPEG4 AVC compression is used for Blu-ray, with a 28-30 megabits per second average bit rate, 34 megabits per second peak.

Disney’s Digital Color Workflow – Featuring “Tangled”

The second part of the course was presented by Stefan Luka, a senior color science engineer at Walt Disney Animation Studios. Disney uses various display technologies, including CRT, LCD and DLP projectors. Each display has a gamut that defines the range of colors it can show. Disney previously used CRT displays, which have excellent color reproduction but are unstable over time and have a limited gamut. They now consider LCD color reproduction to finally be good enough to replace CRTs (several in the audience disputed this), and primarily use HP Dreamcolor LCD monitors. These are very stable, can support wide gamuts (due to their RGB LED backlights), and include programmable color processing.

Disney considered using Rec.709 calibration for the working displays, but the artists really wanted P3-calibrated displays, mostly to see better reds. Rec 709’s red primary is a bit orangish – P3’s red primary is very pure, it’s essentially on the spectral locus. Disney calibrate the displays with P3 primaries, a D65 white point, and a 2.2 gamma (which Stefan says matches the CRTs used at that time). The viewing environment in the artist’s rooms is not fully controlled, but the lighting is typically dim.

Disney calibrate their displays by mounting them in a box lined with black felt in front of a spectroradiometer. They measure the primaries and ramps on each channel to build lookup tables. For software Disney use a custom-tweaked version of a tool from HP called “Ookala” (the original is available on SourceForge). When calibrating they make sure to let the monitor warm up first, since LEDs are temperature dependent. The HP DreamColor has a temperature sensor which can be queried electronically, so this is easy to verify before starting calibration. Disney uses a spectroradiometer for calibration – Stefan said that colorimeters are generally not good enough to calibrate a display like this, though perhaps the latest one from X-Rite (the i1Display Pro) could work. Only people doing color-critical work have DreamColor monitors – Disney couldn’t afford to give them to everyone. People with non-color-critical jobs use cheaper displays.

During “Tangled” production, the texture artists painted display encoded RGB, saved as 16-bit (per channel) TIFF or PSD. They used sRGB encoding (managed via ICC or external metadata/LUT) since it makes the bottom bits go through better than a pure power curve. Textures were converted to linear RGB for rendering. Rendering occurred in linear light space; the resulting images had a soft roll-off applied to the highlights and were written to 16-bit TIFF (if they were saving to OpenEXR – which they plan to do for future movies – they wouldn’t have needed to roll-off the highlights). Compositing inputs and final images were all 16-bit TIFFs.

During post production final frames are conformed and prepared for grading. The basic grade is done for digital cinema, with trim passes for film, stereoscopic, and HD.

The digital cinema grade is done in a reference room with a DLP projector using P3 primaries, D65 white point, 2.2 gamma, and 14 foot-Lamberts reference white. The colorist uses “video” style RGB grading controls, and the result is encoded in 12-bit XYZ space with 2.6 gamma, dithered, and compressed using JPEG2000.

For the film deliverable, Disney adjust the projector white point and view the content through the same film gamut mapping that Pixar uses. They then do a trim pass. White point compensation is also needed; the content was previously viewed at D65 but needs to be adjusted for the native D55 film white point to avoid excessive brightness loss. A careful process needs to be done to bridge the gap between the two white points. At the output, film gamut mapping as well as an inverse film LUT is applied to go from the projector-previewed colors to values suitable for writing to film negative. Finally, Disney review the content at the film lab and call printer lights.

Stereo digital cinema – luminance is reduced to 4.5 foot-Lamberts (in the field there will be a range of stereo luminances, Disney make an assumption here that 4.5 is a reasonable target). They do a trim pass, boosting brightness, contrast, and saturation to compensate for the greatly reduced luminance. The colorist works with one stereo eye at a time (working with stereo glasses constantly would cause horrible headaches). Afterwards the result is reviewed with glasses, output & encoded similarly as the mono digital cinema deliverable.

HD mastering – Disney also use a DLP projector for HD, but view it through a Rec.709 color-space conversion and with reference white set to 100 nits. They do a trim pass (mostly global adjustments needed due to the increase in luminance), output and bake the values into Rec.709 color space. Then Disney compress and review final deliverables on a HD monitor in a correctly set up room with proper backlight etc.

After finishing “Tangled”, Disney wanted to determine whether it was really necessary for production to work in P3; could they instead work in Rec.709 and have the colorist tweak the digital cinema master to the wider P3 gamut? Stefan said that this question depends on the distribution of colors in a given movie, which in turn depends a lot on the art direction. Colors can go out of gamut due to saturation, or due to brightness, or both. Stefan analyzed the pixels that went out of Rec.709 gamut throughout “Tangled”. Most of the out-of-gamut colors were due to brightness – most importantly flesh tones. A few other colors went out of gamut due to saturation: skies, forests, dark burgundy velvet clothing on some of the characters, etc.

Stefan showed four example frames on a DreamColor monitor, comparing images in full P3 with the same images gamut-mapped to Rec.709. Two of the four barely changed. Of the remaining two, one was a forest scene with a cyan fog in the background which shifted to green when gamut-mapped. Another shot, with glowing hair, had colors out of Rec.709 gamut due to both saturation & brightness.

At the end of the day, the artists weren’t doing anything in P3 that couldn’t have been produced at the grading stage, so Stefan doesn’t think doing production in P3 had much of a benefit. P3 was mostly used to boost brightness, so working in 709 space with additional headroom (e.g. OpenEXR) would be good enough.

After “Tangled”, Disney moved from 16-bit TIFFs to OpenEXR, helped by their recent adoption of Nuke (which has fast floating-point compositing – “Tangled” was composited on Shake). They also eliminated the sRGB encoding curve, and now just use a 2.2 gamma without any LUTs. Disney no longer need to do a soft roll off of highlights when rendering since OpenEXR can contain the full highlight detail. They are doing some experiments with HDR tone mapping, especially tweaking the saturation. Disney have also moved to working in Rec.709 instead of P3 for production (for increased compatibility between formats) and are using non-wide-gamut monitors (still HP, but not DreamColor).

In the future, Disney plan to do more color management throughout the pipeline, probably using the open-source OpenColorIO library. They also plan to investigate improvements in gamut mapping, including local contrast preservation (taking account of which colors are placed next to each other spatially, and not collapsing them to the same color when gamut mapping).

Color in High-Dynamic Range Imaging

This course was presented by Greg Ward. Greg is a major figure in the HDR field, having developed various HDR image formats (LogLuv TIFF and JPEG-HDR, as well as the first HDR format, RGBE), the first widely-used HDR rendering system (RADIANCE), and the first commercially available HDR display, as well as various pieces of software relating to HDR (including the Photosphere HDR image builder and browsing program). He’s also done important work on reflectance models, but that’s outside the scope of this course.

HDR Color Space and Representations

Images can be scene-referred (data encodes scene intensities) or output-referred (data encodes display intensities). Since human visual abilities are (pretty much) known, and future display technologies are mostly unknown, then scene-referred images are more useful for long-term archival. Output-referred images are useful in the short term, for a specific class of display technology. Human perceptual abilities can be used to guide color space encoding of scene-referred images.

The human visual system is sensitive to luminance values over a range of about 1:1014, but not in a single image. The human simultaneous range is about 1:10,000. The range of sRGB displays is about 1:100.

The HDR imaging approach is to render or capture floating-point data in a color space that can store the entire perceivable gamut. Post-processing is done in the extended color space, and tone mapping is applied for each specific display. This is the method adopted in the Academy Color Encoding Specification (ACES) used for digital cinema. Manipulation of HDR data is much preferred because then you can adjust exposure and do other types of image manipulation with good results.

HDR imaging isn’t new – black & white color film can hold at least 4 orders of magnitude, and the final print has much less. Much of the talent of photographers like Ansel Adams was darkroom technique – “dodging” and “burning” to bring out the dynamic range of the scene on paper. The digital darkroom provides new challenges and opportunities.

Camera RAW is not HDR; the number of bits available is insufficient to encode HDR data. A comparison of several formats which are capable of encoding HDR follows (using various metrics, including error on an “acid test” image covering the entire visible gamut over a 1:108 dynamic range).

  • Radiance RBGE & XYZE: a simple format (three 8-bit mantissas and one 8-bit shared exponent) with open source libraries. Supports lossless (RLE) compression (20% average compression ratio). However, does not cover visible gamut, the large dynamic range comes at the expense of accuracy, and the color quantization is not perceptually uniform. RGBE had visible error on the “acid test” image, XYZE performed much better but still had some barely perceptible error.
  • IEEE 96-bit TIFF (IEEE 32-bit float for each channel) is the most accurate representation, but the files are enormous (even with compression – 32-bit IEEE floats don’t compress very well).
  • 16-bit per channel TIFF (RGB48) is supported by Photoshop and the TIFF libraries including libTIFF. 16 bits each of gamma-compressed R G and B; LZW lossless compression is available. However, does not cover the visible gamut, and most applications interpret the maximum as “white”, turning it into a high-precision LDR format rather than an HDR format.
  • SGI 24-bit LogLuv TIFF Codec: implemented in libTIFF. 10- bit log luminance, and a 14-bit lookup into a ‘rasterized human gamut’ in CIE (u’,v’) space. It just covers the visible gamut and range, but the dynamic range doesn’t leave headroom for processing and there is no compression support. Within its dynamic range limitations, it had barely perceptible errors on the “acid test” image (but failed completely outside these limits).
  • SGI 32-bit LogLuv TIFF Codec: also in libTIFF. A sign bit, 16-bit log luminance, and 8 bits each for CIE (u’,v’). Supports lossless (RLE) compression (30% average compression). It had barely perceptible errors on the “acid test” image.
  • ILM OpenEXR Format: 16-bit float per primary (sign bit, 5-bit exponent, 10-bit mantissa). Supports alpha and multichannel images, as well as several lossless compression options (2:1 typical compression – compressed sizes are competitive with other HDR formats). Has a full-featured open-source library as well as massive support by tools and GPU hardware. The only reasonably-sized format (i.e. excluding 96-bit TIFF) which could represent the entire “acid test” image with no visible error. However, it is relatively slow to read and write. Combined with CTL (Color Transformation Language – a similar concept to ICC, but designed for HDR images), OpenEXR is the foundation of the Academy of Motion Picture Arts & Sciences’ IIF (Image Interchange Framework).
  • Dolby’s JPEG-HDR (one of Greg’s projects): backwards-compatible JPEG extension for HDR. A tone-mapped sRGB image is stored for use by naïve (non-HDR-aware) applications; the (monochrome) ratio between the tone-mapped luminance and the original HDR scene luminance is stored in a subband. JPEG-HDR is very compact: about 1/10 the size of the other formats. However, it only supports lossy encoding (so repeated I/O will degrade the image) and has an expensive three-pass writing process. Dolby will soon release an improved version of JPEG-HDR on a trial basis; the current version is supported by a few applications, including Photoshop (through a plugin – not natively) and Photosphere (which will be detailed later in the course).

HDR Capture and Photosphere

Standard digital cameras capture about 2 orders of magnitude in sRGB space. Using multiple exposures enables building up HDR images, as long as the scene and camera are static. In the future, HDR imaging will be built directly into camera hardware, allowing for HDR capture with some amount of motion.

Multi-exposure merge works by using a spatially-variant weighting function that depends on where the values sit within each exposure. The camera’s response function needs to be recovered as well.

The Photosphere application (available online) implements the various algorithms discussed in this section. Exposures need to be aligned – Photosphere does this by generating median threshold bitmaps (MTBs) which are constant across exposures (unlike edge maps). MTBs are generated based on a grayscale image pyramid version of the original image, alignments are propagated up the pyramid. Rotational as well as translational alignments are supported. This technique was published by Greg in a 2003 paper in the Journal of Graphics Tools.

Photosphere also automatically removes “ghosts” (caused by objects which moved between exposures) and reconstructs an estimate of the point-spread function (PSF) for glare removal.

Greg then gave a demo of new Windows version of PhotoSphere, including its HDR image browsing and cataloging abilities. It’s merging capabilities also include the unique option of outputting absolute HDR values for all pixels, if the user inputs an absolute value for a single patch (this would typically be a grey card measured by a separate device). This only needs to be done once per camera.

Image-Based Lighting

Take an HDR (bracketed exposure) image of a mirrored ball, use for lighting. Use a background plate to fill in the “pinched” stuff in the back. Render synthetic objects with the lighting and composite into the real scene, with optional addition of shadows. Greg’s description of HDR lighting capture is a bit out of date – most VFX houses no longer use mirrored balls for this (they still use them for reference), instead panoramic cameras or DSLRs with a nodal mount are typically used.

Tone-Mapping and Display

A renderer is like an ideal camera. Tone mapping is medium-specific and goal-specific. The user needs to consider display gamut, dynamic range, and surround. What do we wish to simulate – cinematic camera and film, or human visual abilities and disabilities? Possible goals include colorimetric reproduction, matching visibility, or optimizing contrast & color sensitivity.

Histogram tone-mapping is a technique that generates a histogram of log luminance for the scene, and creates a curve that redistributes luminance to fit the output range.

Greg discussed various other tone mapping methods. He mentioned a SIGGRAPH 2005 paper that used an HDR display to compare many different tone-mapping operators.

HDR Display Technologies

  • Silicon Light Machines Grating Light Valve (GLV) – amazing dynamic range, widest gamut, still in development. Promising for digital cinema.
  • Dolby Professional Reference Monitor PRM-4200. It’s a LED-based 42″ production unit based on technology that Greg worked on. He says this is extended dynamic range, but not true HDR (it goes up to 600 cd/m2).
  • SIM2 Solar Series HDR display: this is also based on the (licensed) Dolby tech- Greg says this is closer to what Dolby originally had in mind. It’s a 47” display with a 2,206 LED backlight that goes up to 4000 cd/m2.

As an interesting example, Greg also discussed an HDR transparency (slide) viewer that he developed back in 1995 to evaluate tone mapping operators. It looks similar to a ViewMaster but uses much brighter lamps (50 Watts for each eye, necessitating a cooling fan and heat-absorbing glass) and two transparency layers – a black-and-white (blurry) “scaling” layer as well as a color (sharp) “detail” layer. Together these layers yield 1:10,000 contrast. The principles used are similar to other dual-modulator displays; the different resolution of the two layers avoids alignment problems. Sharp high-contrast edges work well despite the blurry scaling layer – scattering in the eye masks the artifacts that would otherwise result.

New displays based on RGB LED backlights have the potential to achieve not just high dynamic range but greatly expanded gamut – the new LEDs are spectrally pure and the LCD filters can select between them easily, resulting in very saturated primaries.

HDR Imaging in Cameras, Displays and Human Vision

The course was presented by Prof. Alessandro Rizzi from the Department of Information Science and Communication at the University of Milan. With John McCann, he co-authored the book “The Art and Science of HDR Imaging” on which this course is based.

HDR Issues

The imaging pipeline starts with scene radiances generated from the illumination and objects. These radiances go through a lens, a sensor in the image plane, and sensor image processing to generate a captured image. This image goes through media processing before being shown on a print or display, to generate display radiances. These go through the eye’s lens and intraocular medium, form an image on the retina, which is then processed by the vision system’s image processing to form the final reproduction appearance. Prof. Rizzi went over HDR issues relating to various stages in the pipeline.

The dynamic range issue relates to the scene radiances. Is it useful to define HDR based on a specific threshold number for the captured scene dynamic range? No. Prof. Rizzi defines HDR as “a rendition of a scene with greater dynamic range than the reproduction media”. In the case of prints, this is this is almost always the case since print media has an extremely low dynamic range. Renaissance painters were the first to successfully do HDR renditions – example paintings were shown and compared to similar photographs. The paintings were able to capture a much higher dynamic range while still appearing natural.

A table was shown of example light levels, each listed with luminance in cd/m2. Note that these values are all for the case of direct observation, e.g. “sun” refers to the brightness of the sun when looking at it directly (not recommended!) as opposed to looking at a surface illuminated by the sun (that is a separate entry).

  • Xenon short arc: 200,000 – 5,000,000,000
  • Sun: 1,600,000,000
  • Metal halide lamp: 10,000,000 – 60,000,000
  • Incandescent lamp: 20,000,000 – 26,000,000
  • Compact fluorescent lamp: 20,000 – 70,000
  • Fluorescent lamp: 5,000 – 30,000
  • Sunlit clouds: 10,000
  • Candle: 7,500
  • Blue sky: 5,000
  • Preferred values for indoor lighting: 50 – 500
  • White paper in sun: 10,000
  • White paper in 500 lux illumination (typical office lighting): 100
  • White paper in 5 lux illumination (very dim lighting, similar to candle-light): 1

The next issue, range limits and quantization, refers to the “captured image” stage of the imaging pipeline. A common misconception is that the problem involves squeezing the entire range of intensities which the human visual system can handle, from starlight at 10-6 cd/m2 to a flashbulb at 108 cd/m2, into the 1-100 cd/m2 range of a typical display. The fact is that the 10-6 — 108 cd/m2 range is only obtainable with isolated stimuli – humans can’t perceive a range like that in a single image. Another common misconception is to think of precision and range as being linked; e.g. 8-bit framebuffers imply a 1:255 contrast. Prof. Rizzi used a “salami” metaphor – the size of the salami represents the dynamic range, and the number of slices represents the quantization. Range and precision are orthogonal.

In most cases, the scene has a larger dynamic range than the sensor does. So with non-HDR image acquisition you have to give up some dynamic range in the highlights, the shadows, or both. The “HDR idea” is to bracket multiple acquisitions with different exposures to obtain an HDR image, and then “shrink” during tone mapping. But how? Tone mapping can be general, or can take account of a specific rendering intent. Naively “squeezing” all the detail into the final image leads to the kind of unnatural “black velvet painting”-looking “HDR” images commonly found on the web.

As an example, the response of film emulsions to light can be mapped via a density-exposure curve, commonly called a Hurter-Driffield or “H&D” curve. These curves map negative density vs. log exposure. They typically show an s-shape with a straight-line section in the middle where density is proportional to log exposure, with a “toe” on the underexposed part and a “shoulder” on the overexposed part. In photography, exposure time should be adjusted so densities lie on the straight-line portion of the curve. With a single exposure, this is not possible for the entire scene – you can’t get both shadow detail and highlight detail, so in practice only midtones are captured with full detail.

History of HDR Imaging

Before the Chiaroscuro technique was introduced, it was hard to convey brightness in painting. Chiaroscuro (the use of strong contrasts between bright and dark regions) allowed artists to convey the impression of very high scene dynamic ranges despite the very low dynamic range of the actual paintings.

HDR photography dates back to the 1850s; a notable example being the photograph “Fading Away” by H.P.Robinson, which combined five exposures. In the early 20th century, C. E. K. Mees (director of research at Kodak) worked on implementing a desirable tone reproduction curve in film. Mees showed a two-negative photograph in his 1920 book as an example of desirable scene reproduction, and worked to achieve similar results with single-negative prints. Under Mees’ direction, the Kodak Research Laboratory found that an s-shaped curve produced pleasing image reproductions, and implemented it photochemically.

Ansel Adams developed the zone system around 1940 to codify a method for photographers to expose their images in such a way as to take maximum advantage of the negative and print film tone reproduction curves. Soon after, in 1941, L. A. Jones and H. R. Condit published an important study measuring the dynamic range of various real-world scenes. The range was between 27:1 and 750:1, with 160:1 being average. They also found that flare is a more important limit on camera dynamic range than the film response.

The Retinex theory of vision developed around 1967 from the observation that luminance ratios between adjacent patches are the same in the sun and the shade. While absolute luminances don’t always correspond to lightness appearance (due to spatial factors), the ratio of luminances at an edge do correspond strongly to the ratio in lightness appearance. Retinex processing starts with ratios of apparent lightness at all edges in the image and propagates these to find a global solution for the apparent lightness of all the pixels in the image. In the 1980s this research led to a prototype “Retinex camera” which was actually a slide developing device. Full-resolution digital electronics was not feasible, so a low-resolution (64×64) CCD was used to generate a “correction mask” which modulated a low-contrast photographic negative during development. This produced a final rendering of the image which was consistent with visual appearance. The intent was to incorporate this research in a Polaroid instant camera but this product never saw the light of day.

Measuring the Dynamic Range

The sensor’s dynamic range is limited but slowly getting better – Prof. Rizzi briefly went over some recent research into HDR sensor architectures.

Given limited digital sensor dynamic range, multiple exposures are needed to capture an HDR image. This can be done via sequential exposure change, or by using multiple image detectors at once.

There have been various methods developed for composing the exposures. Before Paul Debevec’s 1997 paper “Recovering High Dynamic Range Radiance Maps from Photographs”, the emphasis was on generating pleasing pictures. From 1997 on, research focused primarily on accurately measuring scene radiance values. Combined with recent work on HDR displays, this holds the potential of accurate scene reproduction.

However, veiling glare is a physical limit on HDR image acquisition and display. At acquisition time, glare is composed of various scattered light in the camera – air-glass reflections at the various lens elements, camera wall reflections, sensor surface reflections, etc. The effect of glare on the lighter regions of the image is small, but darker regions are affected much more strongly, which limits the overall contrast (dynamic range).

Prof. Rizzi described an experiment which measured the degree to which glare limits HDR acquisition, for both digital and film cameras. A test target was assembled out of Kodak Print Scale step-wedges (circles divided into 10 wedges which transmit different amounts of light, ranging from 4% to 82%) and neutral density filters to create a test target with almost 19,000:1 dynamic range. This target was photographed against different surrounds to vary the amount of glare.

In moderate-glare scenes, glare reduced the dynamic range at the sensor or film image plane to less than 1,000:1; in high-glare scenes, to less than 100:1. This limited the range that could be measured via multiple digital exposures (negative film has more dynamic range – about 10,000:1 – than the camera glare limit, so in the case of film multiple exposures were pointless).

While camera glare limits the amount of scene dynamic range that can be captured, glare in the eye limits the amount of display dynamic range which is useful to have.

Experiments were also done with observers estimating the brightness of the various sectors on the test target. There was a high degree of agreement between the observers. The perceived brightness was strongly affected by spatial factors; the brightness differences between the segments of each circle were perceived to be very large, and the differences between the individual circles were perceived to be very small. Prof. Rizzi claimed that a global tone scale cannot correctly render appearance, since spatial factors predominate.

Spatial factors also required designing a new target, so that glare could be separated from neural contrast effects. For this target, both single-layer and double-layer projected transparencies were used, allowing them to vary the dynamic range from about 500:1 to about 250,000:1 while keeping glare and surround constant.

For low-glare images (average luminance = 8% of maximum luminance), the observers could detect appearance changes over a dynamic range of a little under 1000:1. For high-glare images (average luminance = 50% max luminance), this decreased to about 200:1. Two extreme cases were also tested: with a white surround (extreme glare) the usable dynamic range was about 100:1 and with black surround (almost no glare at all) it increased to 100,000:1. The black surround case (which is not representative of the vast majority of real images) was the only one in which the high-dynamic range image had a significant advantage, and even there the visible difference only affected the shadow region – the bottom 30% of perceived brightnesses. These results indicate that dramatically increasing display dynamic range has minor effects on the perceived image; glare inside the eye limits the effect.

Separating Glare and Contrast

Glare inside the eye reduces the contrast of the image on the retina, but neural contrast increases the contrast of the visual signal going to the brain. These two effects tend to act in opposition (for example, brightening the surround of an image will increase both effects), but they vary differently with distance and do not cancel out exactly.

It is possible to estimate the retinal image based on the CIE Glare Spread Function (GSF). When doing so for the images in the experiment above, the high-glare target (where observers could identify changes over a dynamic range of 200:1) formed an image on the retina with a dynamic range of about 100:1. With white surround (usable dynamic range of 100:1) the retinal image had a dynamic range of about 25:1 and with black surround (usable dynamic range of 100,000:1) the retinal image had a dynamic range of about 3000:1. It seems that neural contrast partially compensates for the intra-ocular glare; both effects are scene dependent.

Scene Content Controls Appearance

The appearance of a pixel cannot be predicted from its intensity values – no global tone mapping operator can mimic human vision. An image dependent, local operator is needed. The human visual system performs local range compression. It is important to choose a rendering intent – reproduce the original scene radiances, scene reflectances, scene appearance, a pleasing image, etc. If the desire is to predict appearance then Retinex processing does a pretty good job in many cases.

Color in HDR

Two different data sets can be used to describe color: CMF (color matching functions – low-level sensor data) or UCS (uniform color space – high-level perceptual information).

CMF are used for color matching and metamerism preservation. They are linear transforms of cone sensitivities modified by pre-retinal absorptions. They have no spatial information, and cannot predict appearance.

UCS – for example CIEL*a*b*. Lightness (L*) is a cube root of luminance, which compresses the visible range. 99% of possible perceived lightness values fall in a 1000:1 region of scene dynamic range. This fits well with visual limitations caused by glare.

There are some discrepancies between data from appearance experiments with observers and measurements of retinal cone response.

First discrepancy: the peaks of the color-matching functions do not line up with the peaks of the cone sensitivity functions. This is addressed by including pre-retinal absorptions, which shift peak sensitivities to longer wavelengths.

Second discrepancy: retinal cones have a logarithmic response to light, but observers report a cube-root response. This is addressed by taking account of intra-ocular glare; it turns out that due to glare, a cube-root variation in light entering the eye turns into a logarithmic variation in light at the retina.

HDR Image Processing

Around 2002-2006, Robert Sobol developed a variant of Retinex which was implemented in a (discontinued) line of Hewlett-Packard cameras; the feature was marketed as “Digital Flash”. This produced very good results and could even predict certain features of well-known perceptual illusions such as “Adelson’s Checkerboard and Tower”, which were commonly thought to be evidence of cognitive effects in lightness perception.

ACE (Automatic Color Equalization) (which Prof. Rizzi worked on) and STRESS (Spatio-Temporal Retinex-inspired Envelope with Stochastic Sampling) are other examples of spatially-aware HDR image processing algorithms. Several examples were shown to demonstrate that spatially-aware (local) algorithms produce superior results to global tone mapping operators.

Prof. Rizzi described an experiment made with a “3D Mondrian” model – a physical scene with differently colored blocks, under different illumination conditions. Various HDR processing algorithms were run on captured images of the scene, and compared with observers estimations of the colors as well as a painter’s rendition (attempting to reproduce the perceptual appearance as closely as possible). The results were interesting – appearance does not appear to correlate specifically to reflectance vs. illumination, but rather to edges vs. gradients. The results appeared to support the goals of Retinex and similar algorithms.

Prof. Rizzi finished the course with some “take home” points:

  • HDR works well, because it preserves image information, not because it is more accurate (accurate reproduction of scene luminances is not possible in the general case).
  • Dynamic range acquisition is limited by glare, which cannot be removed.
  • Our vision system is also limited by glare, which is counteracted to some degree by neural contrast.
  • Accurate reproduction of scene radiance is not needed; reproduction of appearance is important and possible without reproducing the original stimulus.
  • Appearances are scene-dependent, not pixel-based.
  • Edges and gradients generate HDR appearance and color constancy.
http://www.graphics.cornell.edu/online/formats/rgbe/

2011 Color and Imaging Conference, Part I: Introduction

A few weeks ago, I attended the 2011 Color and Imaging Conference (CIC). CIC is a small conference (a little under 200 attendees) that nevertheless commands an important role in the fields of color science and digital imaging, similar to SIGGRAPH’s importance to computer graphics. CIC is co-sponsored by the Society for Imaging Science and Technology (IS&T) and the Society for Information Display (SID); it has been held annually in various US locations since 1993.

I attended this conference for the first time last year. In both years I attended, most of the conference attendees were academic color science researchers (the field appears to be dominated by a handful of institutions, most notably the color labs at the Rochester Institute of Technology and the University of Leeds), with the remainder primarily representing the R&D divisions of various camera, printer, display, and mobile phone manufacturers. There are typically also a few color experts from film companies such as Technicolor, ILM, Pixar, and Disney. I didn’t see any other game developers – I hope this will change in future years, as our industry starts paying more attention to this critical area.

Despite its modest attendance numbers, CIC boasted an impressive array of sessions, including courses, papers, short papers, and several keynotes. The content was of very high quality. The conference organizers are currently in the process of posting video of most of the conference content for free streaming and download in a variety of formats – a step which organizers of other conferences (such as SIGGRAPH) would do well to emulate.

I’ll be putting up several other posts with details of the conference content. They will be coming in rapid succession since I’m editing them down from an existing document (a report I did for work).

Do you spell these two words correctly?

We all have dumb little blind spots. As a kid, I thought “Achilles” was pronounced “a-chi-elz” and, heaven knows how, “etiquette” was somehow “eh-teak”. When you say goofy things to other people, someone eventually corrects you. However, if most of the people around you are making the same mistake (I’m sorry, “nuclear” is not pronounced “new-cue-lar”, it just ain’t so), the error never gets corrected. I’ve already mentioned the faux pas of pronouncing SIGGRAPH as “see-graph”, which seems to be popular among non-researchers (well, admittedly there’s no “correct” pronunciation on that one, it’s just that when the conference was small and mostly researchers that “sih-graph” was the way to say it. If the majority now say “see-graph”, so be it – you then identify yourself as a general attendee or a sales person and I can feel superior to you for no valid reason, thanks).

Certain spelling errors persist in computer graphics, perhaps because it’s more work to give feedback on writing mistakes. We also see others make the same mistakes and assume they’re correct. So, here are the two I believe are the most popular goofs in computer graphics (and I can attest that I used to make them myself, once upon a time):

Tesselation – that’s incorrect, it’s “tessellation”. By all rules of English, this word truly should have just one “l”: relation, violation, adulation, ululation, emulation, and on and on, they have just one “l”. The only exceptions I could find with two “l”s were “collation”, “illation” (what the heck is that?), and a word starting with “fe” (I don’t want this post to get filtered).

The word “tessellation” is derived from “tessella” (plural “tessellae”), which is a small piece of stone or glass used in a mosaic. It’s the diminutive of “tessera”, which can also mean a small tablet or block used as a ticket or token (but “tessella” is never a small ticket). Whatever. In Ionic Greek “tesseres” means “four”, so “tessella” makes sense as being a small four-sided thing. For me, knowing that “tessella” is from the ancient Greek word for a piece in a mosaic somehow helps me to catch my spelling of it – maybe it will work for you. I know that in typing “tessella” in this post I still first put a single “l” numerous times, that’s what English tells me to do.

Google test: searching on “tessellation” on Google gives 2,580,000 pages. Searching on “tesselation -tessellation”, which gives only pages with the misspelled version, gives 1,800,000 pages. It’s nice to see that the correct spelling still outnumbers the incorrect, but the race is on. That said, this sort of test is accurate to within say plus or minus say 350%. If you search on “tessellation -tesselation”, which should give a smaller number of pages (subtracting out those that I assume say “‘tesselation’ is a misspelling of ‘tessellation'” or that reference a paper with “tesselation” in the title), you get 8,450,000! How you can get more than 3 times as many pages as just searching on “tessellation” is a mystery. Finally, searching on “tessellation tesselation”, both words on the same page, gives 3,150,000 results. Makes me want to go count those pages by hand. No it doesn’t.

One other place to search is the ACM Digital Library. There are 2,973 entries with “tessellation” in them, 375 with “tesselation”. To search just computer graphics publications, GRAPHBIB is a bit clunky but will do: 89 hits for “tessellation”, 18 hits for the wrong one. Not terrible, but that’s still a solid 20% incorrect.

Frustrum – that’s incorrect, it’s “frustum” (plural “frusta”, which even looks wrong to me – I want to say “frustra”). The word means a (finite) cone or pyramid with the tip chopped off, and we use it (always) to mean the pyramidal volume in graphics. I don’t know why the extra “r” got into this word for some people (myself included). Maybe it’s because the word then sort-of rhymes with itself, the “ru” from the first part mirrored in the second. But “frustra” looks even more correct to me, no idea why. Maybe it’s that it rolls off the tongue better.

Morgan McGuire pointed this one out to me as the most common misspelling he sees. As a professor, he no doubt spends more time teaching about frusta than tessellations. Using the wildly-inaccurate Google test, there are 673,000 frustum pages and 363,000 “frustrum -frustum” pages. And, confusingly, again, 2,100,000 “frustum -frustrum” pages, more than three times as many as pages as just “frustum”. Please explain, someone. For the digital library, 1,114 vs. 53. For GRAPHBIB I was happy to see 42 hits vs. just 1 hit (“General Clipping on an Oblique Viewing Frustrum”).

So the frustum misspell looks like one that is less likely at the start and is almost gone by the time practitioners are publishing articles, vs. the tessellation misspell, which appears to have more staying power.

Addenda: Aaron Hertzmann notes that the US and Britain double their letters differently (“calliper”? That’s just unnatural, Brits). He also notes the Oxford English Dictionary says about tessellate: “(US also tesselate)”. Which actually is fine with me, except for the fact that Microsoft Word, Google’s spellchecker, and even this blog’s software flags “tesselate” as a misspelling. If only we had the equivalent of the Académie française to decide how we all should spell (on second thought, no).

Spike Hughes notes: “I think the answer for ‘frustrum’ is that it starts out like ‘frustrate’ (and indeed, seems logically related: the pyramid WANTS to go all the way to the eye point, but is frustrated by the near-plane).” This makes a lot of sense to me, and would explain why “frustra” feels even more correct. Maybe that’s the mnemonic aid, like how with “it’s” vs. “its” there’s “It’s a wise dog that knows its own fleas”. You don’t have to remember the spelling of each “its”, just remember that they differ; then knowing “it’s” is “it is” means you can derive that the possessive “its” doesn’t have an apostrophe. Or something. So maybe, “Don’t get frustrated when drawing a frustum”, remembering that they differ. Andrew Glassner offers: “There’s no rum in a frustum,” because the poor thing has the top chopped off, so all the rum we poured inside has evaporated.

Seven Things for 10/13/2011

  • Fairly new book: Practical Rendering and Computation with Direct3D 11, by Jason Zink, Matt Pettineo, and Jack Hoxley, A.K.Peters/CRC Press, July 2011 (more info). It’s meant for people who already know DirectX 10 and want to learn just the new stuff. I found the first half pretty abstract; the second half was more useful, as it gives in-depth explanation of practical examples that show how the new functionality can be used.
  • Two nice little Moore’s Law-related articles appeared recently in The Economist. This one is about how the law looks to have legs for a number of more years, and presents a graph showing how various breakthroughs have kept the law going over the past decades. Moore himself thought the law might hold for ten years. This one talks about how computational energy efficiency is doubling every 18 months, which is great news for mobile devices.
  • I used to use MWSnap for screen captures, but it doesn’t work well with two monitors and it hangs at times. I finally found a replacement that does all the things I want, with a mostly-good UI: FastStone Capture. The downside is that it actually costs money ($19.95), but I’m happy to have purchased it.
  • Ray tracing vs. rasterization, part XIV: Gavan Woolery thinks RT is the future, DEADC0DE argues both will always have a place, and gives a deeper analysis of the strengths and weaknesses of each (though the PITA that transparency causes rasterization is not called out) – I mostly agree with his stance. Both posts have lots of followup comments.
  • This shows exactly how far behind we are in blogging about SIGGRAPH: find the Beyond Programmable Shading course notes here – that’s just a mere two months overdue.
  • Tantalizing SIGGRAPH Talk demo: KinectFusion from Microsoft Research and many others. Watch around 3:11 on for the great reconstruction, and the last minute for fun stuff. Newer demo here.
  • OnLive – you should check it out, it’ll take ten minutes. Sign up for a free account and visit the Arena, if nothing else: it’s like being in a sci-fi movie, with a bunch of games being played by others before your eyes that you can scroll through and click on to watch the player. I admit to being skeptical of the whole cloud-gaming idea originally, but in trying it out, it’s surprisingly fast and the video quality is not bad. Not good enough to satisfy hardcore FPS players – I’ve seen my teenage boys pick out targets that cover like two pixels, which would be invisible with OnLive – but otherwise quite usable. The “no download, no GPU upgrade, just play immediately” aspect is brilliant and lends itself extremely well to game trials.

OnLive Arena

Seven things for 10/10/11

  • If you can get WebGL running properly on your browser, check out Shader Toy. Coolest thing is that you can edit any shader and immediately try it out.
  • Another odd little WebGL application is a random spaceship maker, with a direct tie-in to Shapeways to buy a 3D version of any model you make.
  • Speaking of Shapeways, I liked their “one coffee cup a day project“. The low-resolution cup is particularly good for computer graphics people, though I’m told that in real life it’s a fair bit more rounded off, due to the way the ceramic sets. Ironic. Also, note that these cups are actually quite small in real life (smaller than even espresso cups), which is too bad. Still, clever.
  • Source code for iOS versions of Castle Wolfenstein and the original DOOM is now available.
  • Patrick Cozzi has a nice rundown of his days at SIGGRAPH this August, with a particular emphasis on OpenGL and mobile. The links for each day are at the bottom of the entry.
  • Nice fractal video generated in near-real time (300 ms/frame) running a GLSL shader using this code. Reddit thread here, about an earlier video now pulled back online.
  • This site gives a darn long list of educational institutions offering videogame design degrees. It’s at least a place to start, if you’re looking for such things. That said, I’ve heard counterarguments from game company professionals to such specialized degrees, “just learn to program well and we’ll teach you the videogames business”.

Bonus thing: Draw a curve of your data for a number of years and see what it most closely correlates. Peculiar.

Predicting the Past

Inspired by Bing (a person, not a search engine) and by the acrobatics I saw tonight in Shanghai, time for a blog post.

So what’s up with graphics APIs? I’ve been working on a project for a fast 3D graphics system for Autodesk for about 4 years now; the base level (which hides the various flavors of DirectX and OpenGL) is used by Maya, Max, AutoCAD, Inventor, and other products. There are various higher-level optimizations we’ve added (and why Microsoft’s fxc effect compiler suddenly got a lot slower is a mystery), with some particularly nice work by one person here in the area of multithreading. Beyond these techniques, minimizing the raw number of calls to the API is the primary way to increase performance. Our rule of thumb is that you get about 1000-1500 calls a frame (CAD isn’t held to a 60 FPS rule, but we still need to be interactive). The usual tricks are to sort by state, and to shove as much geometry and processing as possible into a single draw call and so avoid the small batch problem. So, how silly is that? The best way to make your GPU run fast is to call it as little as possible? That’s an API with a problem.

This is old news, Tim Sweeney railed against API limitations 3 years ago (sadly, the article’s gone poof). I wrote about his ideas here and added my own two cents. So where are we since then? DirectX 11 has been out awhile, adding three more stages to the pipeline for efficient tessellation of higher-order surfaces. The pipeline’s feeling a bit unwieldy at this point, with a lot of (admittedly optional) stages. There are still some serious headaches for developers, like having to somehow manage to put lighting and material shading in the same pixel shader (one good argument for deferred lighting and similar techniques). Forget about optimization; the arcane API knowledge needed to get even a simple rendering on the screen is considerable.

I haven’t heard anything of a DirectX 12 in the works (except maybe this breathless posting, which I feel obligated to link to since I’m in China this month), nor can I imagine what they’d add of any significance. I expect there will be some minor XBox 72o (or whatever it will be called) -related tweaks specific to that architecture, if and when it exists. With the various CPU+GPU-on-a-chip products coming out – AMD’s Fusion family, NVIDIA’s Tegra 2, and similar from other companies (I think I counted 5, all totaled) – some access costs between the two processors become much cheaper and so change the rules. However, the API still looks to be the bottleneck.

Marketwise, and this is based entirely upon my work in scapulimancy, I see things shifting to mobile. If that isn’t at least the 247th time you’ve heard that, you haven’t been wasting enough time on the internet. But, it has some implications: first, DirectX 12 becomes mostly irrelevant. The GPU pipeline is creaky and overburdened enough right now, PC games are an important niche but not the focus, and mobile (specifically, iPad and other tablets) is fine with the functionality defined thus far by existing APIs. OpenGL ES will continue to evolve, but I doubt we’ll see for a good long while any algorithmically (vs. data-slinging) new elements added to the API that the current OpenGL 4.x and DX11 APIs don’t offer.

Basically, API development feels stalled to me, and that’s how it should be: mobile’s more important, PCs are a (large but slowly evolving) niche, and the current API system feels warped from a programming standpoint, with peculiar constructs like feeding text strings to the API to specify GPU shader effects, and strange contortions performed to avoid calling the API in order to coax the GPU to run fast.

Is there a way out? I felt a glimmer while attending HPG 2011 this year. The paper “High-Performance Software Rasterization on GPUs” by Samuli Laine and Tero Karras was one of my (and many attendees’) favorites, talking about how to efficiently implement a basic rasterizer using CUDA (code’s open sourced). It’s not as fast as dedicated hardware (no surprise there), but it’s at least in the same ball-park, with hardware being anywhere from 1.5x to 8.1x faster for their test cases, median being 3.6x. What I find exciting is the idea that you could actually program the pipeline, vs. it being locked away. They discuss ideas for optimization such as loosening the “first in, first out” rule for triangles currently enforced by all APIs. With its “yet another language” dependency, I can’t say I hope GPGPU is the future (and certainly CUDA isn’t, since it cuts out non-NVIDIA hardware vendors, but from all reports it’s currently the best way to experiment with GPGPU). Still, it’s nice to see that the fixed-function bits of the GPU, while important, are not an insurmountable limit in considering more flexible and general interactive rasterization programming models. Or, ray tracing – always have to stick that in there.

So it’s “forward to the past”, looking at traditional algorithms like rasterization and ray tracing and how to gain efficiency (both in raw speed and in development time) on various modern architectures. That’s ultimately what it’s about for me, at least: spending lots of time fighting the API, gluing together strings to make shaders, and all the other craziness is a distraction and a time-waster. That said, there’s a cost/benefit calculation implicit in all of this. For example, using C# or Java is way more productive than C++, I’d say about 2x, mostly because you’re not tracking down memory problems like leaks and access uninitialized or non-existent values. But, there’s so much legacy C++ code around that it’s still the language of graphics, as previously discussed here. Which means I expect none of the API weirdness to change for a solid decade, at the minimum. Please do go ahead and prove me wrong – I’d be thrilled!

Oh, and acrobatics? Hover your cursor over the image. BTW, the ERA show in Shanghai is wonderful, unlike current APIs.

AMD CubeMapGen is now Open Source

UPDATE 9/1/2011: ignotion has put the source up on Google Code.

For a long time, I’ve found ATI’s (now AMD’s) CubeMapGen library to be an indispensable tool for creating prefiltered environment maps (important for physically based shading). Many older GPUs (all the ones in current consoles) do not filter across cube faces. CubeMapGen solves this problem and others – details can be found in a GDC presentation and a SIGGRAPH sketch, both from 2005.

Support for CubeMapGen has been spotty for the last few years, and a while ago AMD officially declared its end of life. Since then I’ve been wondering when AMD would open-source this important tool – there is a good precedent in NVIDIA texture tools, which has been open source for several years now.

Speaking of NVIDIA texture tools, a comment on its Google Code website just let me know that AMD has released source to CubeMapGen. A link to the source for version 1.4 can be found on the bottom of the CubeMapGen page. Note that this does not include the DXT compression part of the edge fixup (which was a pretty nifty feature – hopefully someone will reimplement it now that the library is open source).

Looking at the license doc in the zip file, the license appears to be a modified BSD license. This is excellent news – tools like this are far more useful when source is available. Perhaps someone should host the code on Google Code or github, to make it easier to add future improvements – or maybe it could be folded into the nvidia_texture_tools code base (if the license allows).

Advances in RTR Course Notes up

I’m finally back from a nice post-SIGGRAPH vacation in the Vancouver area. Both our computers broke early on in the trip, so it was a true vacation.

I hope to post on a bunch of stuff soon, but wanted to first mention something now available: the slides and videos presented in the popular SIGGRAPH course “Advances in Real-Time Rendering in 3D Graphics”. Find them here, and the page for previous years (well, currently just 2010) here. Hats off to Natalya Tatarchuk and all the speakers for quickly making this year’s presentations available.