2011 Color and Imaging Conference, Part VI: Special Session

This last post on CIC 2011 covers a special session that took place the day after the conference. Although not strictly part of the conference (it required a separate registration fee), it covered closely related topics.

The special session “Revisiting Color Spaces” was jointly organized by the Inter-Society Color Council (ISCC), the Society for Imaging Science and Technology (IS&T), and the Society for Information Display (SID) to mark the 15th anniversary of the publication of the sRGB standard. It included a series of separate talks, all related to color spaces:

sRGB – Work in Progress

This presentation was given by Ricardo Motta, a Distinguished Engineer at NVIDIA. Mr. Motta developed the first colorimetrically calibrated CRT display for his Master’s thesis at RIT, helped develop much of HP’s color imaging tech as their first color scientist, and was one of the original authors of the sRGB spec. Now he has responsibility for NVIDIA’s mobile imaging technology and roadmap.

The presentation started with some history on the development of sRGB. It actually started with an attempt by HP and Adobe in 1989 to get the industry to standardize on CIELAB as a device-independent color space. Their first attempts at achieving industry consensus didn’t go well: Gary Starkweather at Apple insisted that full spectral representations (highly impractical at the time) were the right direction, and initial agreement by Microsoft to standardize on CIELAB were scotched when Nathan Myhrvold insisted on 32-bit XYZ (also infeasible) instead. After these setbacks, the people at HP and Adobe who were working on this started realizing that RGB can actually work pretty well as a device-independent color space. They wrote drivers for the Mac first, and ported them when Windows got color capability. PC monitors and televisions at the time all used the same CRT designs (the PC market was as yet too small to justify custom designs), so Adobe characterized the typical CRT in their RGB drivers – first as an internal HP standard (“HP RGB”), and later in collaboration with Kodak and Microsoft as part of the FlashPix standard (“NIF RGB”). In 1996, HP presented NIF RGB to Microsoft as a proposed standard, ending with the sRGB standard proposal exactly 15 years ago.

Why does sRGB work? RGB tristimulus values by themselves are not enough to describe color appearance. The effect of viewing conditions and white balance on the appearance of self-luminous displays are not fully understood. Colorimetry mostly focuses on surface colors, not self-luminous aperture colors. Also, the limited gamut of displays make low-CCT white-balances impractical.

By standardizing the assumed viewing conditions and equipment (display with near 2.2 gamma, Rec.709 primaries, D65 white point at 80 nits (cd/m²), 200 lux D50 ambient, 1% flare) then the RGB data fully implies appearance with little processing needed. Also, daylight-balanced displays tend to remain constant in appearance over a wide range of viewing conditions (D65 is consistently perceived as neutral in the absence of other adapting illumination) so the results are robust in practice.

In 1996 the strength of sRGB was that these viewing conditions and equipment were common and widely used. 15 years later, this strength has become a limiting factor in some scenarios.

If self-luminous display colors are not very close to the correct scene surface colors, there is a perceptual “snap” as the image suddenly appears as a glowing rectangle instead of a 3D scene (this is similar to the “uncanny valley” problem). Current standard display primaries fail to match large classes of surface colors due to their limited gamut; newer developments (AMOLED, LED backlights) enable a much wider color gamut.

In addition, displays have been getting much brighter – every decade, LED brightness has consistently increased by at least 20X. Newest LCD tablets achieve 500 nits with over 1000:1 contrast ratio (CR), exactly matching reflected colors in most conditions. Daylight equivalence requires 6,400 nits; by the end of this decade, portable displays should be able to show actual surface colors under all lighting situations.

The sRGB approach is no longer valid in the mobile space – with highly variable viewing conditions and displays that can directly match reflective colors, we need to move from a “tristimulus + viewing conditions” encoding to an “object properties” encoding (still tristimulus-based).

OSA-UCS System: Color-signal Processing from Psychophysical to Psychometric Color

This presentation was given by Prof. Claudio Oleari from the Department of Physics at the University of Parma.

Psychophysical color specification is based on color matching under arbitrary viewing conditions. Psychometric color specification is based on quantifying perceived color differences and realizing uniform scales of perceived colors under controlled conditions (comparison of color samples on a uniform achromatic background under a chosen illuminant). Under these conditions, the only appearance phenomena are the instantaneous color constancy and the lightness contrast.

Between 1947 and 1974 the Optical Society of America (OSA) had a committee working on a uniform psychometric color scale; their goal was a lattice of colors in Euclidean color space where equal distances between points corresponds to equal visual differences. However, they eventually concluded that this is not possible – the human color system does not work this way. The resulting system (OSA-UCS) had only approximately uniform color scales and many scientists considered this to be a failure. However, OSA-UCS has a very strong property which is not shared by any other color space – it is spanned by a net of perceived geodesic lines. These are scales of colors which define the shortest perceptual path between colors, ordered with the difference between each pair of colors equal to one just noticeable difference (jnd).

Prof. Oleari has published an algorithm linking the cone activations (psychophysical color) to the OSA-UCS coordinates (“Color Opponencies in the System of the Uniform Color Scales of the Optical Society of America”, 2004).

Another of Prof. Oleari’s papers (“Euclidean Color-Difference Formula for Small-Medium Color Differences in Log Compressed OSA-UCS Space”, 2009) defines a Euclidean color-difference formula based on a logarithmically compressed version of OSA-UCS (like other such formula, it is only applicable to small color differences since a globally uniform space does not exist). This formula has only two parameters, but performs as well as the CIEDE2000 formula which has many more. Generalizing the formula to arbitrary illuminants and observers provides a matrix which is useful for color conversion of digital camera images between illuminants. Prof. Oleari claims that this matrix provides results that are clearly better for this purpose than other chromatic adaptation transforms (“Electronic Image Color Conversion between Different Illuminants by Perfect Color-Constancy Actuation in a Color-Vision Model Based on the OSA-UCS System”, 2010).

Design and Optimization of the ProPhoto RGB Color Encodings

This presentation was given by Dr. Geoff Wolfe, Senior Research Manager at Canon Information Systems Research Australia. However, the work it describes was done while he was at Kodak Research Laboratories, in collaboration with Kevin Spaulding and Edward Giorgianni.

ProPhotoRGB was created at a time (late 1990s – early 2000s) when the photographic world was in massive upheaval. In 2000 film sales were around 1 billion rolls/year; this decreased to 20 million by 2010 with an ongoing 20% volume reduction year on year. Digital cameras were just starting to become decent: in 1998 most consumer cameras had sensors under 1 megapixel, and in 1999 most had 2 megapixel sensors, and resolution continued to increase rapidly. Another interesting trend was digital processing for film; in 1990 the PhotoCD system scanned film to 24 megapixel images which were processed digitally and then printed out to analog film. ProPhoto RGB was intended to be used in a system which took this one step further: optically scanning negatives and then processing as well as printing digitally (today of course imaging is digital from start to finish).

During the mid to late 90s there was an increasing awareness that images could exist in different “image states”, characterized by different viewing environments, dynamic ranges, colorimetric aims and intended uses. The simplest example is to classify images as either scene-referred (unrendered) or output-referred (rendered picture or other reproduction). On one hand, scenes are very different than pictures – scenes have 14 stops or so of dynamic range vs. 6-8 stops in a picture, pictures are viewed in an adaptive viewing environment with a certain white point, luminance, flare, and surround – all of which affect color appearance. On the other hand, a scene and its picture are obviously closely related: the picture should convey the scene appearance. Memory and preference also play a part: people often assess an image against their memory of the scene appearance, which tends to be different (for example, more saturated) than the original scene. Even if they have never seen the original, people tend to prefer slightly oversaturated images.

There are several issues regarding the rendering of the scene into the display image. The first is the dynamic range problem – which 6-8 stops from the scene’s 14 should we keep? The adaptive viewing environment also poses some issues. An “accurate” reproduction of the scene colorimetry looks flat and dull compared to a “pleasing” rendition with adjustments to account for the viewing environment’s effect on perception.

ProPhoto RGB was designed as a related family of encodings allowing both original scenes and rendered pictures to be encoded. The encodings should facilitate rendering from scenes to picture with: common primaries for both scene and picture encoding, suitable working spaces for digital image processing, direct and simple relationships to CIE colorimetry and the ICC profile color space (PCS), and fast and simple transformations to commonly used output color encodings such as sRGB or Adobe RGB.

Since the desire was to have the same primaries for both scene and picture image states, choosing the right primaries was critical. The primaries needed to enable a gamut wide enough to cover all real world surface colors, and all output devices. On the other hand, making the gamut too wide could cause quantization errors (given a fixed bit depth and encoding curve, quantization gets worse with increasing gamut size). The primaries needed to yield the desired white point (D50) when present in equal amounts, and avoid objectionable hue distortions under tonescale operations (more on that below). However, the primaries did not need to be physically realizable; they could be outside the spectral locus.

Regarding tonescale operations: a common image processing operation is to put each channel through a nonlinear curve, for example an S-shaped contrast enhancement curve. Such operations are fast, convenient and generally well-behaved; they also are guaranteed to not go out of the color space’s gamut. However, in the general case, tonescale operations are not hue-preserving, and can result in noticeable hue shifts in natural “highlight to shadow” gradients. These hue shifts are particularly objectionable in skin tones, especially if they shift towards green.

All these constraints were fed into Matlab and an optimization process was performed to find the final primaries. The hue rotations could not be eliminated, but they were reduced overall and minimized for especially sensitive areas such as skin tones. The final set of ProPhoto primaries was much better in this regard than those of sRGB/Rec.709 or Adobe RGB (1998) primaries. Two of the resulting primaries were imaginary (outside the spectral locus), with the third (red) right on the spectral locus .

Besides the primaries and D50 white point, a nonlinear encoding (1/1.8 power with a linear toe segment) was added to create the ROMM (Reference Output Medium Metric) RGB color space, intended for display-referred data. A corresponding RGB space for scene-referred data was also defined: RIMM (Reference Input Medium Metric). RIMM had the same primaries as ROMM but a different encoding (same as Rec.709 but scaled to handle scene values up to 2.0, where 1.0 represents a perfect white diffuse reflector in the scene). An extended dynamic range version of RIMM (ERIMM) was defined as well. ERIMM has a logarithmic encoding curve with a linear toe segment, and can handle scene values up to 316.2 (relative to a white diffuse reflector at 1.0). All spaces can be encoded at 8, 12 or 16 bits per channel, but for ERIMM at least 12 bits are recommended.

The original intended usage for this family of color spaces was as follows. First, the negative is scanned and a representation of the original scene values is created in RIMM or ERIMM space. This is known as “unbuilding” the film response – a complex process that needs to account for capture system flare, the distribution of exposure in the different color layers, crosstalk between layers and the film response curve. Digital rendering of the image puts it through a tone scale and goes to the ROMM output space, and is finally turned into a printed picture or displayed image.

Digital cameras tend to have much simpler unbundling processes – it is straightforward to get scene linear values from the camera RAW sensor values. For this reason, Dr. Wolfe thinks that camera RAW can be an effective replacement for scene referred encodings such as RIMM/ERIMM, which he claims are now effectively redundant. On the other hand, he found that ROMM / ProPhoto RGB is still used by many photography professionals (and advanced amateurs) for its ability to capture highly saturated objects (such as iridescent bird feathers) and ease of tweaking in Photoshop.

During the Q&A period, several people in the audience challenged Dr. Wolfe’s statement that scene-referred encodings are no longer needed. The Academy of Motion Pictures Arts and Sciences (AMPAS) uses a scene-referred encoding in their Image Interchange Format (IIF) because their images come from a variety of sources, including different film stocks as well as various digital cameras. Even for still cameras, a scene-referred type of encoding is needed at least as the internal reference space (e.g. ICC PCS) even if the consumer never sees it.

Adobe RGB: Happy Accidents

This presentation was given by Chris Cox, a Senior Computer Scientist at Adobe Systems who has been working on Photoshop since 1996. It covered the history of the “Adobe RGB (1998)” color space.

In 1997-1998, Adobe was looking into creating ICC profiles that their customers could use with Photoshop’s new color management features. Not many applications had ICC color management at this point, so operating systems didn’t ship with them yet.

Thomas Knoll (the original creator – with his brother John – of Photoshop) was looking for relevant standards and ideas to build ICC profiles around; one of the specifications he found documentation for was the SMPTE 240M standard, which was the precursor to Rec.709. SMPTE 240M looked interesting – its gamut was wider than sRGB’s but not huge, and tagging existing content with it didn’t result in horrid colors. The official standards weren’t available online, and Adobe couldn’t wait to have a paper copy mailed since Photoshop 5 was about to ship, so they got the information from a somewhat official-looking website.

Adobe got highly positive feedback from their customers about the “SMPTE 240M” profile. Users loved the wide gamut and found that color adjustments looked really good in that space and that conversions to and from CYMK worked really well. A lot of books, tutorials and seminars recommended using this profile.

A while after Photoshop 5 shipped, people familiar with the SMPTE 240M spec contacted Adobe and told them that they got it wrong. It turns out that the website they used copied the values from an appendix to the spec which contained idealized primaries, not the actual standard ones. The real SMPTE-240M is a lot closer to sRGB (which Photoshop users didn’t like as a working space). Even worse, Thomas Knoll made a typo copying the red primary chromaticity values so the primaries Photoshop 5 shipped with weren’t even the correct ones from the appendix.

What to do? The profile was wrong in at least two different ways, but the customers REALLY liked it! Adobe tried to improve on the profile in various ways, and built test code to evaluate CMYK conversion quality (which was something the customers especially liked about the “SMPTE 240M” profile) in the new “fixed” profiles.

But no matter what they tried: correcting the red primary, changing the white point from D65 to the theoretically more prepress-friendly D50, widening the primaries, moving the green to cover more gamut, etc., every change made CYMK conversion worse than the “incorrect” profile.

In the end, Adobe decided to keep the profile but change the name. They picked “Adobe RGB” so they wouldn’t have to do a trademark search or get legal approval. The date was added to the profile name since they were sure they would be bringing out a better version soon, and the “Adobe RGB (1998)” profile was shipped in a Photoshop 5 dot release. Adobe kept experimenting, but was never able to improve on the profile. After a while they stopped trying.

After a while Kodak visited them to talk about ProPhoto RGB and how it was designed to minimize hue shifts under nonlinear tonescale operations (see previous talk). Adobe realized they had lucked into a color space that just happened to have good behavior in that regard, explaining the good CYMK conversions (which typically suffer from the same issue). Kodak assumed that Adobe had designed their color space like that on purpose.

Recent Work on Archival Color Spaces

This session was presented by Dr. Robert Buckley, formerly a Distinguished Engineer at Xerox, now a scientist at the University of Rochester.

It describes work done in collaboration between the CIE Technical Committee TC8-09 (of which Dr. Buckley is chair) and the Still Image Working Group of the Federal Agencies Digitization Initiative (FADGI).

TC8-09 did a recent study where they sent a set of test pieces to participating institutions to digitize with their usual procedures. The test pieces included four original color prints and three standard targets: X-Rite Digital ColorChecker SG, Image Engineering Universal Test Target (UTT) and the Library of Congress Digital Image Conformance Evaluation (DICE) Object Target. Special sleeves were made for the prints with holes to identify specific regions of interest (ROIs) for measurement. The technical committee members measured CIELAB values for the print ROIs and the standard target patches for later comparison with the results produced by the participating institutions.

Each institution used their usual scanning equipment and procedures; some used digital cameras, others used scanners; they used various profiles (manufacturer or custom) and some post-processed the resulting images.

The best agreement between the institution’s captures and the measured values were in the cases where digital cameras were used with custom profiles. In general the agreement was better for the targets than for the originals, which isn’t surprising since calibration uses similar targets. The committee concluded that better results would be obtained if the capture devices were calibrated to targets that contained colors more representative of the content being captured (not true for the standard targets).

Besides evaluating the various capture protocols, TC8-09 also wanted to establish which color space is best to use for image archiving. The gamut should of course include all the colors in the archived documents, but it should not be larger than necessary to avoid quantization artifacts. Specifically, if 8 bits per channel are used (which is common) then the gamut shouldn’t be much wider than sRGB. In practice, most of the material (with a few exceptions, such as a color plate in a book on gems) fit easily in the sRGB gamut.

Modern Display Technologies: Is sRGB Still Relevant?

This session was presented by Tom Lianza, “Corporate Free Electron” at X-Rite and Chair of the International Color Consortium (ICC).

One of sRGB’s main strengths is the fact that the primary chromaticities are the same as Rec.709 (and the two tone reproduction curves, while not identical, do have similarities). These similarities have led to the easy mixing of motion and still images in many different environments. The Rec.709 primaries were based on CRT primaries – at the time it was not clear whether they could be realized in flat-panel displays, but the standard pushed the manufacturers to make sure they did.

One of the goals of any color space is to reproduce the Pointer gamut of real-world surface colors. Unfortunately, there are cyans in this gamut that will be a problem for pretty much any physically realizable RGB system.

An output referred color space will always require some specification of ambient conditions. This is needed for effective perceptual encoding.

A missing element in many color spaces is a hard definition of black (Adobe RGB is one of the few that does have an encoding specification of black). The lack of this definition leads to inter-operability issues, and to non-uniform rendering in practice. ICC is now moving black point compensation into ISO to be considered as a standard, which would allow more vendors to use it (Adobe currently have an algorithm which their products use).

All commonly used display technologies (include the iPhone screen which has a really small gamut) encompass the Bartleson memory colors (“Memory Colors of Familiar Objects”, 1960). This explains why people find them all acceptable, although they vary greatly in gamut size and none of them cover the Pointer gamut completely.

Viewing conditions for sRGB are well defined but the assumptions of low-luminance displays viewed in low ambient lighting do not reflect how people view images today.

Cameras are not (and should not be) colorimeters. They do not use sRGB as a precise encoding curve (most cameras reproduce images with a relative gamma of 1.2-1.3 vs. the sRGB encoding curve, to take account low viewing luminance). Instead, cameras are designed to produce good images when viewed on an sRGB display – having a common target guides the different manufacturers to similar solutions. As an example, Mr. Lianza showed a scene with highly out-of-gamut colors, photographed with automatic white balancing on cameras from different vendors. There is no standard for handling out of gamut colors, but nevertheless all the cameras produced very similar images. This is because the critical visual evaluations of these camera’s algorithms were all done on the same (sRGB) displays.

Browsers have various issues with color management. ICC has a test page which can be used to see if a browser handles ICC version 4 profiles properly. Chrome does not have color management and shows the entire page poorly. Firefox shows the ICC version 2 profile test correctly, but not the ICC version 4 test. Safari has good color management and shows all images well, but not when printing.

Conclusions: sRGB is robust and can be used to reproduce a wide range of real-world and memory colors. Existence of the specification coupled with physically realizable displays makes the application of the spec quite uniform in the industries that use it. The lack of black point specification and the low luminance assumption has caused manufacturers to apply compensation to the images which may not work well at higher luminances encountered in mobile environments. It may be possible to tweak the spec for higher luminance situations, but any wholesale changes will have a very bad effect on the market place due to the huge amount of legacy content. The challenge to sRGB in the 21st century comes from disruptive display technologies and the implementations that allow for simultaneous display of sRGB and wide gamut images on the same media at high luminance and high ambient conditions.

Question from the audience: most mobile products don’t have color management, and this is a core issue now. Answer: ICC is splitting into three groups. ICC version 4 is staying stable to address current applications, the “ICC Labs” open-source project is intended for advanced applications, and there will be a separate project to establish a solution for the web and mobile (there is a current discussion regarding adding a new working group for mobile hardware).

Device-Independent Imaging System for High-Fidelity Colors

This session was presented by Dr. Akiko Yoshida from SHARP. It describes the same system that SHARP presented at SIGGRAPH 2011 (there was a talk about the system, and the system itself was shown in Emerging Technologies).

The system comprises a wide-gamut camera (which colorimetrically captures the entire human visual range of colors) and a 5-primary display with a gamut that includes 99% of Pointer’s real-world surface colors.

The camera they developed has sensor sensitivities that satisfy the Luther-Ives condition: the sensitivity curves are a linear combination of cone fundamentals (or equivalently, of the appropriate color-matching functions). This is the first digital camera to satisfy this condition. It is fully colorimetric, measuring the Macbeth ColorChecker chart with an accuracy of about 0.27 ΔE.

Today’s display systems cannot display many colors found in daily life, as can be seen by comparing their gamuts to the Pointer surface color gamut (“The Gamut of Real Surface Colors”, 1980). Although the Pointer gamut is relatively small compared to the gamut of human vision, it cannot be efficiently covered with three RGB primaries. SHARP set a goal to reproduce real-surface colors faithfully and efficiently with a five-primary system (“QuintPixel”) including RGB plus yellow and cyan. QuintPixel actually has six subpixels for each pixel – the red subpixels are repeated twice. This was necessary to get adequate coverage reds. This display can efficiently reproduce 99.9% of Pointer’s gamut.

Why not just extend the three primaries? Mitsubishi has rear-projection laser TVs with really wide RGB gamuts. The reason SHARP didn’t take this approach is efficiency – the gamut is much larger than it needs to be. Another advantage of adding primaries is color reproduction redundancy, which can be exploited to have brighter reproduction at the same power consumption, lower power consumption with the same brightness, or improved viewing angle. The larger number of sub-pixels can also be used to greatly increase resolution (similarly to Microsoft’s “ClearType” technology). These advantages can be realized without losing the wide gamut.

The camera sends 10-bit XYZ signals at 30Hz to the display via the CameraLink protocol. The display does temporal up-conversion from 30 to 60 Hz as well as interpreting the XYZ signal.

Q&A Session:

Question: Is the colorimetric camera available for purchase? Answer: yes, for 1M yen (about $13,000).

Question: 10 bits are not enough for XYZ, are they planning to address this? Answer: yes, they do plan to increase the bit-depth.

Question: what is the display resolution? Answer: They use a 4K panel and combine two pixels into one, cutting the resolution in half.

Is There Really Such a Thing As Color Space? Foundation of Uni-Dimensional Appearance Spaces

This talk was presented by Prof. Mark D. Fairchild, from the Munsell Color Science Laboratory in the Rochester Institute of Technology.

Color is an attribute of visual sensation – not physical values. Color scientists seldom question the 3D nature of color space, but Prof. Fairchild thinks that it is more correct to think about color as a series of one-dimensional appearance spaces or scales, and not to try to link them together.

Color vision is only part of the visual sense, which is itself just one of five senses. Only in color vision is a multidimensional space commonly used to describe perception. All the other senses are described with multiple independent dimensions as appropriate, not with multi-dimensional Euclidean differences.

For example, taste has at least five separate scales: sweet, bitter, sour, salty, and umami. But there is no definition of “delta-Taste” which collapses taste differences into a single number. Smell has about 1000 different receptor types, and some have tried to reduce the dimensionality to about six such as flowery, foul, fruity, spicy, burnt, and resinous. Hearing is spectral – our ears can perceive the spectral power distribution of the sound. Touch might well be too complex to summarize in a single sentence.

Why should color vision be different? Perhaps researchers have been misled by certain properties of color vision such as low-dimensional color matching and simple perceptual relationships such as color opponency. The 3×3 linear transformations between color matching spaces really reinforce the feeling of a three-dimensional color space, but they have nothing to do with perception. Color scientists have spent a lot of effort looking for the “holy grail” of a global 3D color appearance space with Euclidean differences, to no avail.

Perhaps this is misguided and efforts should focus on a set of 1D scales instead. There have been examples of such scales in color science. The Munsell system has separate hue, value and chroma dimensions. Similarly, Guth’s ATD model of visual perception was typically described in terms of independent dimensions. Color appearance models such as CIECAM02 were developed with independent predictors of the perceptual dimensions of brightness, lightness, colorfulness, saturation, chroma, and hue. This was compromised by requests for rectangular color space dimensions which appeared as CIECAM97s evolved to CIECAM02. The NCS system treats hue separately from whiteness-blackness and chromaticness, though it does plot the latter two as a two dimensional space for each hue.

This insight leads to the hypothesis that perhaps color space is best expressed as a set of 1D appearance spaces (scales), rather than a 3D space, and that difference metrics can be effective on these separate scales (but not on combinations of them). The three fundamental appearance attributes for related colors are lightness, saturation, and hue. Combined with information on absolute luminance, colorfulness and brightness can be derived from these and are important and useful appearance attributes. Lastly, chroma can be derived from saturation and lightness if desired as an alternative relative colorfulness metric.

Prof. Fairchild has derived a set of color appearance dimensions following these principles. The first step is to apply a chromatic adaptation model to compute corresponding colors for reference viewing conditions (D65 white point, 315 cd/m² peak luminance, 1000 lux ambient lighting). Then the IPT model is used to compute a hue angle (h) and then a hue composition (H) can be computed based on NCS. For the defined hue, saturation (S) is computed using the classical formula for excitation purity applied in the u’v’ chromaticity diagram. For that chromaticity, G0 is defined as the reference for lightness (L) computations that follow a “power plus offset” (sigmoid) function. Brightness (B) is Lightness (L) scaled by the Stevens and Stevens terminal brightness factor. Colorfulness (C) is Saturation (S) scaled by Brightness (B), and Chroma (Ch) is Saturation (S) times Lightness (L).

Prof. Fairchild plans to present his detailed formulation soon, and do testing and refinement afterwards.

HDR and UCS: Do HDR Techniques Require a New UCS Space?

This session was presented by Prof. Alessandro Rizzi from the Department of Information Science and Communication at the University of Milan. There was some overlap between this session and the “HDR Imaging in Cameras, Displays and Human Vision” course which Prof. Rizzi presented earlier in the week.

Colorimetry ends in the retinal cone outer segments; color appearance is at the other end of the human visual system. Appearance incorporates all the spatial processing of all the color responsive neurons. Thus color vision can be analyzed in two ways: bottom-up starting from the color matching response of retinal receptors accounting for pre-retinal absorption and glare (going through color matching tests, e.g. the CIE 1931 observer) or top-down starting from the color appearance generated by the entire human visual system (asking observers to describe the apparent distances between hues, chromas and lightnesses, e.g. the Munsell color space).

Recent work (“A Quantitative Model for Transforming Reflectance Spectra Into the Munsell Color Space Using Cone Sensitivity Functions and Opponent Process Weights”, 2003) has linked the two, solving for the 3-D color space transform that places LMS cone responses in the color-space positions measured for the Munsell Book of Color. The process includes a correction for veiling glare inside the eye, which causes the image on the retina to be different than the original scene intensities entering the cornea. The cone response is proportional to the logarithm of the retinal intensities, which (because of glare) is proportional to the cube root of scene intensities. This glare also limits the dynamic range of the retinal image. The link between cone responses and Munsell colors also involves a strong color-opponent process (creating signals differentiating opponent colors such as red-green or yellow-blue).

CIE L*a*b* also has a cube root response and opponent channel mechanism. L*a*b* handles the lightness component of HDR scenes with a two-component compression curve – the first component is a cube-root function in both lightness and chroma for high and medium light levels, and the second is a linear function for low light levels (the two components connect seamlessly). The sRGB and Rec.709 transfer functions are similarly constructed. CIE L*a*b* normalizes each of X, Y and Z to its maximum value over the image before further processing; this is equivalent to the way human vision effectively normalizes L, M and S cone responses (it processes differentials/ratios and not absolute values, as in Retinex theory). After normalization, the compression curve scales the large range of possible radiances into a limited range of appearances – 99% of possible lightnesses correspond to the top 1000:1 range of scene radiances – all remaining radiances (darker than 1/1000 of the white point) correspond to the bottom 1% of possible perceived lightness values. sRGB has similar behavior.

Given these considerations, Prof. Rizzi does not believe that new uniform color spaces (UCSs) are needed for HDR imaging; existing spaces can handle the range that the human eye can perceive in a single scene (note that this analysis does not relate to intermediate images, such HDR IBL – UCSs are only used to describe the perceived colors in the final viewed image).

Digital HDR Color Separations

This session was presented by John McCann, an independent color and imaging consultant since 1996. Previously he led Polaroid’s Vision Research Laboratory for over 30 years, working on topics including Retinex theory, color constancy, very large-format photography, and perceptually-guided color reproduction. John is a co-author of the recently published book “The Art and Science of HDR Imaging”.

Many applications (HDR exposure bracketing, various computer vision and spatial image processing algorithms) need linear light scene values. The JPEGs produced by cameras are very far from linear light; they are images created with the intention of creating a preferred rendering of the scene, which looks pleasing and is not colorimetrically accurate. Regular color print & negative film were designed with a similar intent and produce similar results.

Although the sRGB standard specifies an encoding from scene values, and camera manufacturers follow some aspects of the sRGB standard in producing JPEGs, the processing differs in important ways from the sRGB encoding spec. the algorithms that perform the demosaic, color balance, color enhancement, tone scale, and post-LUT for display and printing create discrepancies between the sRGB output in practice and an idealized conversion of scene radiances to sRGB space.

Together with Vassilios Vonikakis (Democritus University of Thrace, Greece), John McCann did an experiment to measure these discrepancies. Images of a Macbeth ColorChecker chart were taken under varying exposures using three methods: digitally scanned traditional color separation photographs, standard JPEG images from a commercial camera, and “RAW* separations” from the same camera. Traditional color separation photographs use R, G and B filters and panchromatic black and white film to create separate single-channel R, G and B images that are combined into a single color image. “RAW* separations” are the author’s names for linear RGB values that were generated from partially processed RAW camera data (read with LibRaw’s “unprocessed” function). This data does not even include demosaic – it is a black and white image with the mosaic pattern (e.g., Bayer) in it. The authors did their own, carefully calibrated processing on these images to create normalized, linear RGB data.

The photographic separations were most correct – the chromaticity of the Macbeth chart squares remained very stable across all the exposure values. The JPEG image had the largest chroma errors – the chromaticities of the colored Macbeth squares varied greatly with exposure – this is part of the “preferred rendering” performed by these cameras to make the resulting image look good. The RAW* separations were similar to film (slightly less stable chromaticities, but close).

The conclusion is that for any algorithm that needs linear scene data, it is important to use RAW data where most of the processing has been turned off and do carefully calibrated processing.

2011 Color and Imaging Conference, Part V: Papers

Papers are the “main event” at CIC. Unlike the papers at computer science conferences (which are indistinguishable from journal papers), CIC papers appear to be focused more towards “work in progress” and “inspiring ideas”. This stands in contrast to the work published in color and imaging journals such as the Journal of Imaging Science and Technology or the Journal of Electronic Imaging. This distinction is actually the norm in most fields – computer science is atypical in that respect.

Note that since CIC is single-track, I was able to see (and describe in this post) all the papers, including some that aren’t as relevant to readers of this blog.

Root-Polynomial Colour Correction

Images from digital cameras need to be color-corrected, since they typically have sensors which cannot be easily mapped to device independent color-matching functions.

The simplest mapping is a linear transform (matrix), which can be obtained by taking photos of known color targets. However this assumes that the camera spectral sensitivities are linear combinations of the device-independent ones, which is not the case.

Polynomial color correction is another option which can reduce the error of the linear mapping by extending it with additional polynomials of increasing degree. However, polynomial color correction is not scale-independent – there is a chromaticity shift when intensity changes (e.g. based on lighting). This shift can be quite dramatic in some cases.

This paper proposes a new method: root-polynomial color correction. It is very straightforward: simply take the nth root of each nth order term in the extended polynomial vector. Besides restoring scale-independence, the vector also becomes smaller since some of the terms now become the same (e.g. sqrt(r*r) = r).

Experiments showed that with fixed illumination, root-polynomial color correction performed similarly to higher-order polynomial correction. It performed much better if the illumination level changes, even slightly. A large improvement is achieved by adding only three terms to the linear model, so this technique provides very good bang for buck.

Tone Reproduction and Color Appearance Modeling: Two Sides of the Same Coin?

This invited paper was written and presented by Erik Reinhard (University of Bristol), who has done some very influential work on tone mapping for computer graphics and has also co-authored some good books on HDR and color imaging.

Tone mapping or tone reproduction typically refers to luminance compression (often sigmoidal), intended to map high-dynamic range images onto low-dynamic range displays. This can be spatially varying or global over the image. However, tone mapping typically does not take account of color issues – most tone mapping operators work on the luminance channel and the final color is reconstructed via various ad-hoc methods – the two most popular ones are by Schlick (“Quantization Techniques for Visualization of High Dynamic Range Pictures”, 1994) and Mantiuk (“Color correction for Tone Mapping”, 2009). They do not take account of the various luminance-induced appearance phenomena that have been identified over the years: the Hunt effect (perceived colorfulness increases with luminance), the Stevens effect (perceived contrast increases with luminance), the Helmholt-Kohlrausch effect (perceived brightness increases with saturation for certain hues), and the Bezold-Brücke effect (perceived hue shifts based on luminance).

Color appearance models attempt to predict the perception of color under different illumination conditions. They include chromatic adaptation, non-linear range compression (often sigmoidal), and other features used to compute appearance correlates. They are designed to take account of effects such as the ones mentioned in the previous paragraph, but most of them do not handle high dynamic range images (there are some exceptions, such as iCAM and the model presented in the 2009 SIGGRAPH paper “Modeling Human Color Perception Under Extended Luminance Levels”).

Tone mapping and color appearance models appear to have important functional similarities, and their aims partially overlap. The paper was written to show opportunities to construct a combined tone reproduction and color appearance model that can serve as a basis for predictive color management under a wide range of illumination conditions.

Tone mapping operators tend to range-compress luminance and ignore color. Color appearance models tend to identically range-compress individual color channels (typically in a sharpened cone space) and do separate chromatic adaptation. A recent color appearance model by Erik and others (“A Neurophysiology-Inspired Steady-State color Appearance Model”, 2009) combines chromatic adaptation and range compression into the same step (basically doing different range compression on each channel), which Erik sees as a step towards unifying the two approaches.

Another recent step towards unifying the two can be seen in HDR extensions to color spaces (“hdr-CIELAB and hdr-IPT: Simple Methods for Describing the Color of High-Dynamic Range and Wide-Color-Gamut Images”, 2010) which replace the compressive power function with sigmoid curves. A similar approach was taken for HDR color appearance modeling (“Modeling Human Color Perception Under Extended Luminance Levels”, 2009). Image appearance models such as iCAM and iCAM06 incorporate HDR in a different way, taking account of spatial adaptation.

Some of the most successful tone mapping operators are based on neurophysiology, but put the resulting “perceived” values into a frame buffer. This is theoretically wrong, but looks good in practice. Color appearance models instead run the model in reverse from the perception correlates to display intensities (with the display properties and viewing conditions). This is theoretically more correct, but in practice tends to yield poor tone mapping since the two sigmoid curves (one run forward, one in reverse) tend to cancel out, undoing a lot of the range compression. An ad-hoc way to combine the strengths of both approaches (the color management of color appearance models and the range compression of tone mapping operators) is to run a color appearance model on an HDR image, then resetting the luminance to retain only chromatic adjustments and compressing luminance via a tone mapping operator. However, it is hoped that the recent work mentioned above (combining chromatic adaptation & range compression, sigmoidally compressed HDR color spaces, and HDR color appearance models such as iCAM) can be built upon to form a more principled unification of tone mapping and color appearance modeling.

Real-Time Multispectral Rendering with Complex Illumination

Somewhat unusually for this conference, this paper was about a computer graphics real-time rendering system. The relevance comes from the fact that it was a multispectral real-time rendering pipeline.

RGB rendering is used almost exclusively in industry applications, however it is an approximation. Although three numbers are enough to describe the final rendered color, they are not enough in principle to compute light-material interactions, which can be affected by metameric errors.

The authors wanted their pipeline to support complex real world illumination (image-based lighting – IBL), while still allowing for interactive (real-time) rendering. They used Filtered Importance Sampling (see “Real-time Shading with Filtered Importance Sampling”, EGSR 2008) to produce realistic (Ward) BRDF interactions with IBL.

The implementation was in OpenGL, using 6 spectral channels so they could use pairs of RGB textures for reflectance and illumination, two RGB render targets, etc. After rendering each frame, the 6-channel data was transformed first to XYZ and then to the display space, optionally using a chromatic adaptation transform.

The reflectance data was taken from spectral reflectance databases and the spectral IBL was captured by removing the IR filter from a Canon 60D camera and taking bracketed-exposure images of a stainless steel sphere with two different spectral filters.

The underlying mathematical approach was to use a set of six spectral basis functions and multiply their coefficients for light-material interactions, as in the work of Drew and Finlayson (“Multispectral Processing Without Spectra”, 2003). However, the authors found a new set of optimized basis functions (primaries), optimized to minimize error for a set of illuminants and reflectances.

The authors compared the analysis of their results with best-of-class three-channel methods such as the one described in the EGSR 2002 paper “Picture Perfect RGB rendering using Spectral Prefiltering and Sharp Color Primaries”. The results of the six-channel method were visibly closer to the ground truth (the RGB rendering had quite noticeable color errors in certain cases).

Choosing Optimal Wavelengths for Colour Laser Scanners

Monochrome laser scanners are widely used to capture geometry but are incapable of capturing color information. Color laser scanners are a popular choice since they capture geometry and color at the same time, avoiding the need for a separate color capturing system as well as the registration issues involved in combining disparate sources of data. These scanners scan three lasers (red, green, blue) to simultaneously obtain XYZ coordinates as well as RGB reflectance.

However, laser scanners are effectively point-sampling the spectral reflectance at three wavelengths, which is known to be a highly inaccurate method, prone to metamerism. Also, the three wavelengths typically used (635nm, 532nm, and 473nm for the Arius scanner – similar wavelengths for other scanners) are chosen for reasons unrelated to colorimetric accuracy.

The authors of this paper did a brute-force optimization process to find the three best wavelengths for minimizing colorimetric error in color laser scanners. They found that the same three wavelengths (460nm, 535nm, and 600nm) kept popping up, regardless of the reflectance dataset, the difference metric, or any other variation in the optimization process. The errors using these wavelengths were much lower than with the wavelengths currently used by the laser scanners – the color rendering index (CRI) improved from 48 to 75 (out of a 0-100 scale). Interestingly, adding a fourth and fifth wavelength gave no improvement at all.

Since these wavelengths are independent of the color space, difference metric and sample set, they must be associated with a fundamental property of human vision. These wavelengths are very close to the ‘prime colors’ (approximately 450nm, 530nm, and 610nm) identified in 1971 by Thornton (“Luminosity and Color-Rendering Capability of White Light”) as the wavelengths of peak visual sensitivity. These wavelengths were later shown (also by Thornton) to have the largest possible tristimulus gamut (assuming constant power), and are therefore optimal as the dominant wavelengths of display primaries. The significance of these wavelengths can be understood by applying Gram-Schmidt orthonormalization to the human color-matching functions (with the luminance function as the first axis) – the maxima and minima of the two chromatic orthonormal color matching functions line up along these three wavelengths. In other words, these wavelengths produce the maximal excitation of the opponent color channels in the retina.

These results are applicable not just to laser scanners but also to regular (broadband-filter) cameras and scanners, in guiding the dominant wavelengths of the spectral sensitivity functions.

(How) Do Observer Categories Based on Color Matching Functions Affect the Perception of Small Color Differences?

The CIE 2° and 10° standard observers that underlie a lot of color science are well-understood to be averages; people with normal color vision are expected to deviate from these to some extent. There is even a CIE standard as to the expected variation (the somewhat amusingly-named CIE Standard Deviate Observer). However, this does not say how human observers are distributed – are variations essentially random, or are people grouped into clusters defined by their color vision? In last year’s conference, a paper was presented which demonstrated that humans can be classified into one of seven well–defined color vision groups. This paper is a follow-on to that work, which attempts to discover if observers ability to detect small color differences depends on the group they belong to.

It turns out that it does, which opens up some interesting questions. Does it make sense to customize color difference equations and uniform color spaces to each category? Modern displays with their narrow-band primaries tend to exaggerate observer differences, so it might be a good time to explore more precise modeling of observer variation.

A Study on Spectral Response for Dichromatic Vision

Dichromats are people who suffer from a particular kind of color blindness; they only have two types of functional cones. Previous work has dealt with projections from 3D to 2D space but didn’t deal with spectral analysis; this work aims to remedy that. The study looked at three types of dichromats (each missing a different cone type), classified visible and lost spectra for each, and validated certain previous work.

Saliency as Compact Regions for Local Image Enhancement

The goal of this paper is to improve the subjective quality of photographs (taken by untrained photographers) by finding and enhancing their most salient (visually important) features.

It was previously found that people highly prefer images with high salience (prominence), where a region is highly distinct from its background. However untrained photographers often capture images without salient regions. It would be desirable to find an automated way to increase salience, but salience is very difficult to predict for general images.

This paper sidesteps the problem by finding an easier-to-measure correlate – spatial compactness (a certain property is spatially compact if it is concentrated in a relatively small area). The idea is to look at the distribution of pixels with certain low-level attributes such as opponent hues, luminance, sharpness, etc. If the distribution is highly compact (peaked), then there is probably high saliency there and enhancing that attribute will make the photograph look better. There are a few additional tweaks (small objects are filtered out, and regions closer to the center of the screen are considered more important) but that is the gist. The enhancements they did are relatively modest (5-10% increase in the most salient attribute). The results were surprisingly strong: 91% of people preferred the modified image (which is quite an achievement in the field of automatic image enhancement).

The Perception of Chromatic Noise on Different Colors

Pixel size on CMOS sensors is steadily decreasing as pixel count increases, and appears set to continue doing so based on camera manufacturer roadmaps. This increases the likelihood of noisy images; noise reduction filters (e.g. bilateral filters) are becoming more important. Tuning these filters correctly depends on a good model for noise perception. Previous work has shown that the perception of chromatic noise (noise which does not vary luminance) depends on patch color; this study was done to further explore this and to attempt an explanation.

It was found that the perception of chromatic noise was weakest when the noise was added to a grey patch, and strongest when the noise was added to a purple, blue, or cyan patch. Orange, yellow and green patches were in the middle.

Further experiments implied that these differences could be due to the Helmholtz-Kohlrausch (H-K) effect, which causes chromatic stimuli of certain hues to appear brighter than white stimuli of same brightness. Due to this effect, the chromatic noise on certain patches was partially perceived as brightness noise, which has higher spatial visual resolution.

Predicting Image Differences Based on Image-Difference Features

Image-difference measures are important for estimating (as a guide to reducing) distortions caused by various image processing algorithms. Many commonly used measures only take into account the lightness component, which makes them useless for applications such as gamut mapping where color distortions are critical. This paper takes a new approach, by combining many simple image-difference features (IDFs) in parallel (similar to how the human visual cortex works). The authors took a large starting set of IDFs, and (using a database of training images) isolated a combination of IDFs that best matched subjective assessments of image difference.

Comparing a Pair of Paired Comparison Experiments – Examining the Validity of Web-Based Psychophysics

Paired comparison experiments are fairly common in color science, but it is difficult to get enough observers. Some attempts have been made to do experiments over the web; this could greatly increase observer count, but has several issues (uncalibrated conditions, varying screen resolutions, applications like f.lux that vary color temperature as a function of time, etc.).

This paper describes a “meta-experiment” meant to determine the accuracy of web experiments vs. those conducted in a lab.

The correlation between web and lab experiments appears to be poor. That’s not to say that the data gained is not useful; when working on consumer applications, results are typically viewed in uncontrolled conditions. The web experiment performed for this paper ended up not having many participants and had a few other issues (bad presentation design, etc.)

The authors are now doing a second web experiment which has had a lot more participants and better correlation to the lab experiment. They hope to come back next year with a paper on why this second experiment was more successful.

Recent Development in Color Rendering Indices and Their Impacts in Viewing Graphic Printed Materials

Background information on color rendering indices can be found in the “Lighting: Characterization & Visual Quality” course description in my previous blog post. The current CIE standard (CIE-R_a) has several acknowledged faults (use of obsolete metrics such as the von Kries chromatic adaptation transform and the U*V*W* color space, low saturation of the test samples). A CIE technical committee (TC 1-69) was started in late 2006 to investigate methods that would work with new light sources including solid state/LED; this paper reports on the current status of their work.

There have been several proposals for color rendering indices. The current front-runner is based on the CAM02-UCS uniform color space (itself based on the CIECAM02 color appearance model). Various test sample sets were evaluated. The committee currently have a set of 273 samples primarily selected from the University of Leeds dataset (which contains over 100,000 measured reflectance spectra), and are working on reducing it to around 200. The color difference weighting method and scaling factors were also adjusted. Finally, the new index was compared with several others in a typical graphic art setting (common CMY ink set and 58 different D50-simulating lighting sources), and was found to perform well.

Memory Color Based Assessment of the Color Quality of White Light Sources

Although color rendering indices such as the one discussed in the previous paper are needed for professional applications where color fidelity is important, for home and retail lighting color fidelity is not necessarily the most desirable lighting property, instead lights that make colors appear more “vibrant” or “natural” may be preferred. Recognizing this, very recently (July 2011) a new CIE technical committee (TC1-87) was formed to investigate an assessment index more suitable for home and retail applications.

Many such metrics have been proposed over the years, most of which use a Planckian or daylight illuminant as an optimal reference. However, some light sources produce more preferred color renditions than these reference illuminants. This paper focuses on an attempt to define color quality without the need for a reference illuminant.

The approach is based on “memory colors” – the colors that people remember for certain familiar objects. The theory is that if a light source renders familiar objects close to their memory colors, people will prefer it. Experiments were performed where the apparent color of 10 familiar objects was varied and observers selected the preferred color as well as the effect of varying the color (e.g., whether changing saturation relative to the preferred color is perceived as worse than changing the brightness, etc.). This data was fit to bivariate Gaussians in IPT color space to produce individual metrics for each object. The geometric mean of these was rescaled to a 0-100 range, with the F4 illuminant at 50 (which is also its score in the CIE-R_a metric) and D65 at 90 (D65 is a reference illuminant in CIE-R_a, but was found to be non-optimal for memory color rendition).

The authors did a large study to validate the new metric and found that it matched observer’s judgments of visual appreciation better than the other metrics. For future work, they are planning to study how cultural differences affect memory colors.

Appearance Degradation and Chromatic Shift in Energy-Efficient Lighting Devices

During the next few years, many countries will mandate replacement of the incandescent lamp technology which has served humanity’s lighting needs faithfully since 1879. Incandescent lamps, being blackbody radiators, appear “natural” to consumers – they have very good color rendering and remain on the Planckian locus over their lifetime (albeit shifted in color temperature). The general CIE color rendering index (CIE-R_a) is not a sufficient metric – the speaker showed three lights, all with CIE-R_a of 85 and correlated color temperature (CCT) of 3000K; they didn’t look alike at all.

Most consumers inherently recognize the difference between incandescent and energy-efficient lamps. The lighting from the latter just doesn’t “look natural” to them. When asking focus groups about important lighting considerations, they first mention appearance issues: color quality (color rendering), color temperature (warm, normal, or cool white), form factor (shaped like a bulb, a tube, or other), dimmability (will a household triac dimmer work with it?), and glare. Appearance issues are followed by efficiency, brightness, lifetime, environmental friendliness and instant on/off.

The two types of energy efficient lights in common use today are compact fluorescent lamps (CFL) and light emitting diode (LED). In CFLs, a mercury vapor UV light excites phosphors, which emit light in the visible spectrum. White light LEDs have a blue LED which excites phosphors. Both of these light types are characterized by two-stage energy conversion. There are other energy-efficient lighting devices (HIR, HID, OLED, hybrids), but these are not practical for residential lighting.

Since both LEDs and CFL phosphors are operated at high energy densities, heat causes them to degrade over time. Since white is obtained via multiple phosphors, the differential degradation (between phosphors or between phosphors and LED) causes a chromatic shift during usage.

The authors measured aging for all three types of lamp over 5000 hours. The incandescent barely changed. The CFL had some phosphor types degrade a lot & others somewhat, causing a shift toward green. The LED lamp had huge degradation in the phosphors and almost none in the blue LED – light comes from both so the color shifted quite a bit towards blue. Both energy-efficient lamps started with bad color rendition (CRI) and it got a lot worse; luminous efficacy (lumens/Watt) also decreased.

In theory, UV sources combined with trichromatic phosphors that age uniformly could solve the problem, but that challenge has not yet been solved. Emerging energy-efficient lamp types (ESL and others) are supposed to help but aren’t ready yet, which is worrying since the transition has already started.

During Q&A, the speaker stated that he doesn’t think color rendering indices are useful at all; instead he uses color rendering maps that show the color rendering for various points on the color gamut simultaneously. These color rendering maps can show which colors are most affected. Since computers are now fast enough to compute such a map in less than a second, why use a single number? Also, the CIE CRI in common use is overly permissive – it will give high scores to some pretty bad-looking illuminants. Of course, for this very reason the light manufacturers will fight against changing it.

Meta-Standards for Color Rendering Metrics and Implications for Sample Spectral Sets

Like the previously presented paper “Recent Development in Color Rendering Indices and Their Impacts in Viewing Graphic Printed Materials”, this is also a report on the work done by the TC-69 CIE technical committee working on proposals for a new standard color rendering index, but by a different subcommittee. Neither paper appears to represent a consensus; presumably one of these approaches (or a different one) will eventually be selected.

In a recent meeting, the technical committee recommended selecting a reflectance sample set for the new CRI that is simple and as “real” as possible. This paper will talk about potential “meta-standards” by which to select the new CRI standard, and what this means for the reflectance sample set.

Their approach is based on the idea that the CRI should be equally sensitive to perturbations in light spectra regardless of where in the visible spectrum the perturbation occurs. This implies that the average curvature of the reflectance sample set should be uniform, since an area with higher curvature will be more sensitive to perturbations in light spectra. The average curvature of the 8 current CRI samples is very non-uniform, unsurprising due to the low sample count.

At first they tried to select 1000 samples from the University of Leeds sample set (which includes over 100,000 reflectance spectra). The samples were picked to be roughly equally spaced throughout the color set. The average curvature was still highly non-uniform, since many of the materials share the same small set of basic dyes and pigments. Generating completely random synthetic spectra would solve this problem, but then there would be no guarantee that they would be “natural” in the sense of having similar spectral features and frequency distributions. The authors decided to go for a “hybrid” solution where segments of reflectance spectra from the Leeds database were “stitched” together and shifted up or down in wavelength. This resulted in a set of 1000 samples with a much smoother curvature distribution while keeping the “natural” nature of the individual spectra.

1000 samples may be too high for some applications, so the authors attempted to generate a much smaller set of 17 mathematically regular spectra which yield similar results to the set of 1000 hybrid samples. The subcommittee is proposing this set (named “HL17”) to the full technical committee for consideration.

Image Fusion for Optimizing Gamut Mapping

There are various methods for mapping colors from one gamut to another (typically reduced) gamut. Each method works well in some circumstances and less well in others. Previous work applied different gamut mapping algorithms to an image and automatically selected the one that generated the best image based on some quality measure. The authors of this paper tried to see if this can be done locally – if different parts of the same image can be productively processed with different gamut mapping algorithms, and if this produces better results than using the same algorithm for the whole image.

Their approach involved mapping the original with every gamut mapping algorithm in the set, and generating structural similarity maps for each algorithm. This was followed by generation of an index map for the highest similarity at each pixel. Each pixel was mapped with the best algorithm, and the results were fused into one image.

Simple pixel-based fusion results in artifacts, so the authors tried segmentation and bilateral filtering. Bilateral fusion ended up producing the best results, then segmented fusion, followed by picking the best overall algorithm for each image, and finally the individual algorithms. So the fusion approach was promising in terms of visual quality, but computation costs are high. They plan to improve this work as well as applying it to other imaging problems like locally optimized image enhancement and tone mapping operators.

Image-Adaptive Color Super-Resolution

The problem is to take multiple low-resolution images and estimate a high-resolution image. There has been work in this area, but challenges remain, especially correct handling of color images. The authors treated this as an optimization problem with simple constraints (individual pixel values must lie in the 0 to 1 range, warping and blurring must preserve energy over the image, as well as some assumptions on the possible properties of blurring and warping). They add a novel chrominance regularization term to hand color edges properly. The results shown appear to be better than those achieved by previous work.

Two-Field Color Sequential Display

Color-sequential displays mix primaries in time rather than in space as most displays do. Since the color filters are removed (replaced by a flashing red-green-blue backlight), the power efficiency is increased by a factor of three. However, very high frame rates are needed (problematic with LCD displays) and the technique is prone to color “breakup” artifacts.

This paper proposes a display composed of two temporal fields instead of three, to reduce flicker. Optimal pairs of backlight colors are found for each screen block to reduce color “breakup”. This is implemented via an LCD system with local RGB LED backlights. The authors built a demonstration system and experimented with various images. Most natural images are OK, but some man-made objects look bad. The number of segments can be increased, reducing the errors but not eliminating them. They were able to achieve reasonable results with 576 blocks.

Efficient Computation of Display Gamut Volumes in Perceptual Spaces

This paper discussed fast methods to compute gamut volume – the motivation is for use in optimizing display primaries (three or more). I’m not sure how important it is to do this fast, but that is the problem they chose to solve.

Computing gamut volume of three-primary displays in additive spaces is very easy (just the magnitude of the determinant of the primary matrix). The authors want to compute the gamut volume in CIELAB space, which is more perceptually uniform but has non-linearities which complicate volume computation. They found a way to refactor the math into a relatively simple form based on certain assumptions on the properties of the perceptual space. For three-primary displays in CIELAB this reduces to a simple closed-form expression.

Computing gamut volume for multi-primary displays is more complex. The authors represent the gamut as a tessellation of parallelepipeds. To determine the total volume in CIELAB space they solve a numerical problem in a way similar to Taylor series.

Appearance-Based Primary Design for Displays

LED-backlit LCD displays have recently entered the market. They have many advantages over traditional LCD displays: higher dynamic range, high frame rate, wider color gamut, thinner, more environmentally friendly, etc. There are two main types of such displays. RGB-LED-based LCD displays can potentially deliver more saturated primaries (and thus wider color gamuts) due to the narrow spectral width of the LEDs used, while white-LED-based LCD displays might provide high brightness and contrast but smaller gamuts by using high efficiency LEDs in combination with the LCD-panel RGB filters.

The choice between the two is primarily a tradeoff between saturation and brightness. However, the two are linked due to the Hunt effect, which causes perceived colorfulness to increase with luminance. The Stevens effect (perceived contrast increases with brightness) is also relevant. Could these effects lead to a win-win (increased perceived saturation and contrast, as well as actual brightness) even if actual saturation is sacrificed?

The authors investigated two possible designs. One adds a white LED to an RGB LED backlight (RGBW LED backlight). The other keeps the RGB LED backlight, and adds a white subpixel to the LCD (RGBW LCD). The RGBW LED backlight design proved to work best, with an increased white up to 40% providing increased colorfulness as well as brightness. The RGBW LCD white-subpixel design always decreased perceived colorfulness regardless of the amount.

This was determined via a paired comparison experiment. It is interesting to note that neither CIELAB nor CIECAM02 models predicted the result for the RGBW LED backlight – CIELAB predicted that colorfulness would decrease, while CIECAM02 predicted it would increase but not the right amount. In the case of the RGBW LCD subpixel design, both CIELAB and CIECAM02 predicted the results.

HDR Video – Capturing and Displaying Dynamic Real World Lighting

This paper (by Alan Chalmers, WMG, University of Warwick) described the HDR video pipeline under development at the University of Warwick. It includes a Spheron HDRv camera (capable of capturing 20 f-stops of exposure at full HD resolution and 30 fps), NukeX and custom dynamic IBL (image-based lighting) software for post-production, various HDR displays (including a 2×2 “wall” of Brightside DR37-P HDR displays), and a specialized HDR video compression algorithm (for which they have spun off a company, goHDR).

Prof. Chalmers made the case that the 16 f-stops which traditional film can acquire is not sufficient, and showed various examples where capturing 20 f-stops produced better results. He also discussed the recently begun European Union COST (Cooperation in Science and Technology) Action IC1005-7251 “HDRi” which focuses on coordinating European HDR activity and proposing new standards for the HDR pipeline.

High Dynamic Range Displays and Low Vision

This paper was presented by Prof. James Ferwerda from the Munsell Color Science Lab at the Rochester Institute of Technology. Low vision is the preferred term for visual impairment. It is defined as the uncorrectable loss of visual function (such as acuity and visual fields). Low vision (caused by trauma, aging, and disease) affects 10 million people in the USA, and 135 million people worldwide.

HDR imaging offers new opportunities for understanding low vision. This paper describes two projects: simulating low vision in HDR scenes, and using HDR displays to test low vision.

The framework behind tone reproduction operators (which simulate on an LDR display what an observer would have seen in the HDR scene) can be adapted to simulate an impaired scene observer instead of a normal-vision one. Aging effects (such as increased bloom and slower adaptation) can also be simulated.

The importance of using HDR displays to test vision comes from the fact that people with low vision have problems in extreme (light, dark) lighting situations, such as excessive glare or adaptation issues. In addition, there are theories that changes in adaptation time can be good early predictors of retinal disease. However, standard vision tests use moderate light levels so they are not capable of identifying adaptation or other extreme-lighting-induced issues.

Before experiments could be started on the use of HDR displays for vision testing, the NIH (very reasonably) wanted to ensure that these displays could not cause any damage to the test subjects’ vision. Damage caused by light is called “phototoxicity” and can be related to either extremely high light levels in general, or more moderate levels of UV or even blue light. Blue light has recently been identified as a hazard, especially to people with retinal disease. The International Committee on Non-Ionizing Radiation Protection (ICNIRP) has established guidelines for safe light exposure levels, including blue light.

The authors estimated the phototoxicity potential of HDR displays, using the Brightside/Dolby DR37-P as a test case. At maximum brightness, they computed the amount of light which would reach the retina, with the ICNIRP “blue light hazard” spectral filter applied. The result was 4 micro-Watts; since the ICNIRP limit for unrestricted viewing is 200 micro-Watts of blue light, there appears to be no phototoxicity issue with HDR displays. Another way of looking at this: to reach the ICNIRP limit, the display would have to produce the same luminance as a white paper in bright sunlight: 165,000 cd/m2 (for comparison, the DR37-P peak white is about 3000 cd/m2 and the current Dolby HDR monitor – the PRM-4200 – peaks at 600 cd/m2).

Appearance at the Low-Radiance End of HDR Vision: Achromatic & Chromatic

This paper (by John J. McCann, McCann Imaging) studies how human vision works at the low end, close to the absolute threshold of visibility. In particular, does spatial processing change? There are a lot of physiological differences between rods and cones – spatial distribution, wiring, etc., so it might be expected that spatial processing would differ between scotopic and photopic vision. A series of achromatic tests designed to demonstrate various features of spatial vision processing were tested in extreme low-light conditions. The result was exactly the same as in normal light conditions – it appears that spatial processing does not change.

The authors also did experiments with low-light color vision. Although rods by themselves cannot see color (which requires at least two different detector types with distinct spectral sensitivity curves), they can be used for color vision when combined with at least one cone type. In particular, light which has enough red to activate the L cones (but not S or M) and enough light in the right wavelengths to activate the rods will enable dichromatic color vision using the rods and L cones (firelight, a 2000° K blackbody radiator, has the best balance of spectral light for this). This enabled comparing the spatial component of color vision in low-light and normal-light conditions. As before, the observers saw all the same effects, showing that spatial processing was the same in both cases.

Hiding Patterns with Daylight Fluorescent Inks

This paper describes the use of daylight fluorescent inks (which absorb blue & UV light and emit green light, in addition to reflecting light as normal inks do) to create patterns inside arbitrary images which are invisible under normal daylight but appear with other illuminants.

The authors looked at different combinations of regular and fluorescent inks and calculated the gamut for each one. The areas of the gamut that are metameric (under D65) with regular inks can be used to hide patterns. They also calculated proper ink coverage amounts needed to match the fluorescent and regular inks under D65.

Optimizing HANS Color Separation: Meet the CMY Metamers

The Halftone Area Neugebauer Separation (HANS) approach presented at last year’s CIC offered opportunities for optimizing various aspects of the printing process. This year’s paper further explores some of those possibilities.

Regular color halftoning works by controlling the coverage of each colorant, e.g. cyan, yellow, and magenta in a CMY system. HANS extends this to controlling the coverage of each possible combination of colorants (these are called the Neugebauer primaries). For example, a CMY system has 8 Neugebauer primaries: white (bare paper), cyan, magenta, yellow, blue (combination of cyan & magenta), green (combination of cyan & yellow), red (combination of magenta & yellow), and black (combination of all three colorants). Trichromatic color printing (e.g. CMY) has only one halftone pattern for each color in the available gamut. Extending this to more primaries (as HANS does) allows for metamers – different halftone patterns that can obtain the same color, and thus optimization opportunities.

With CMY inks, the authors found that the ink use varied quite a bit depending on the metamers used, indicating that a significant amount of ink could be saved even with such a limited ink set. HANS can save even more ink when used with more typical ink sets which include four or more inks.

Local Gray Component Replacement Using Image Analysis

Gray Component Replacement (GCR) refers to the practice of saving ink in a CMYK printing system by replacing amounts of CMY by similar amounts of K. GCR advantages include deeper blacks, ink savings, and increased sharpness of small details. But it does have one large drawback – it can cause excessive graininess (visible noise) in certain cases. This causes most printer manufacturers to use it very lightly, if at all.

This paper seeks to exploit the fact that noise perception depends on content activity. Noise is quite visible in smooth areas, not so visible in “active” areas. A GCR scheme that adapts to the content of the image has the potential to realize significant GCR benefits without causing noticeable noise.

One problem with this approach is that existing methods to find the “active” areas of the image either do not take account of the properties of the human visual system, require too much computation, or both. The authors’ insight was that cosine-based compression schemes have been heavily designed to exploit the properties of the human visual system, and can be adapted to this application. They do a DCT (discrete cosine transform) of the image and run it through a weighting matrix originally designed for JPEG quantization. The authors put the result through a mapping table (values based on experimentation) to find the desired black ink amount.

A Study on Perceptually Coherent Distance Measures for Color Schemes

This is related to the NPAR 2011 paper “Towards Automatic Concept Transfer”, which allowed for transferring color palettes associated with a concept (such as “earthy”) to images. This paper attempts to find an automated way to assess similarity between color palettes so that different palettes associated with the same concept can be explored.

Most of the existing color metrics either require the compared palettes to have the same number of colors, are dependent on color ordering, or both. The authors came up with a metric called “Color-Based Earth-Mover’s Distance”, which performed well.

Effects of Skin Tone and Facial Characteristics on Perceived Attractiveness: A Cross-Cultural Study

The aim is to study the impact of observer’s cultural background on the perception of faces. Earlier studies showed that slightly more colorful skin tones are more attractive than measured ones, and that larger eyes are rated more attractive. Some cultural background effects were found as well.

The authors took a set of face images, and manipulated skin color, eye size, distance between eyes, and nose length. The resulting images were shown to sets of both British and African observers.

Conclusions: Observers were more sensitive to changes in facial characteristics for faces of their own ethnic group than to those of other ethnic groups. British observers preferred skin colors of higher chroma, with a hue angle of about 41 degrees. African observers had preferences for more reddish, higher chroma faces.

Color Transfer with Naturalness Constraints

The motivation of this work (done at Hewlett-Packard) was to make it easier for untrained users to make pleasing collages by assembling photographs on top of a themed background. In many cases the colors of the selected photographs do not match each other or the background. The photographs may come from different cameras, and some might be downloaded from the web. Some of the images might even be drawings or paintings.

The idea is to use color transfer to make the various images match more closely each other as well as the background. There has been previous work in color transfer, but none of it fit the constraints of this specific application. The users often know what the original photo looked like and will not accept drastic changes. They are also not willing to do extensive manual tinkering to get good results.

Naturalness is important – colors in each image should be modified as a whole, familiar object colors (esp. skin) should remain “plausible”, and the white point should not change too drastically. The authors used a color adaptation model, constraining the color changes to those consistent with adapted appearance under natural illuminants. For each image, they find an estimated illuminant (defined by its CCT and luminance). This is not white point balancing or illuminant estimation; most consumer photos are white-balanced already. The idea is to describe the collective characteristics of the image’s color with parameters that are amenable to transfer to another image through an illuminant adaptation model. A simple Bayesian formulation was used to find the CCT and luminance, based on based on a set of 81 simulated illuminants (9 CCTs and 9 log-spaced luminance levels), applied to 170 object surfaces from Vrhel’s natural surface reflectance database (ftp://ftp.eos.ncsu.edu/pub/eos/pub/spectra/). Once a CCT & luminance is found for each image, color transfer was handled as a chromatic adaptation process. Any model would work; the authors used RLAB since they found it best suited to their needs.

If the photos differ greatly in tone as well as color, they can also do a “tone transfer” process using similar methods.

The results were very effective in improving the user-created collage while still keeping the individual photographs natural-looking and recognizable.

The Influence of Speed and Amplitude on Visibility and Perceived Subtlety of Dynamic Light

Modern light sources (such as LEDs) enable continuously varying color illumination. Users have expressed a desire for this, but prefer slow/small changes over fast/large ones. For residential lighting, it’s important that the dynamic lighting not distract or be obtrusive. In other words, the changes need to be subtle.

This work explores what “subtle” means in the context of dynamic lighting. Do people understand subtlety in the same way? Can you measure a subtlety threshold that is distinct from visibility?

The authors tried dynamic lighting with different amplitude and speed of changes, and asked people whether they considered the lighting subtle, and whether they could see it happening at all.

The dynamics were considered to be subtle if they were slow, smooth, and over a narrow hue range. People seem to agree on what “subtle” means; a subtlety threshold is distinct from the visibility threshold and can be measured.

Blackness: Preference and Perception

In practice, printed black patches vary in hue. This study attempted to determine among a range of blacks of varying hues, which blacks are most preferred and which are considered to be “pure black”.

Used various hues from Munsell color system – three neutrals (with Value 0, 1 and 2 – N0, N1, N2) and the “blackest” version (Value 1 and Chroma 2) of the midpoint of each Munsell hue (5R, 5YR, 5Y, 5GY, 5G, 5BG, 5B, 5PB, 5P, 5RP), for a total of 13 patches. The patches were presented to the observers in pairs against a neutral grey background.

Blackness: N0 (true black) was perceived to be closest to pure black. N1 and 5PB were in second and third places. 5R, 5RP and 5Y were considered to be the least pure black. There was little difference found between UK and China populations, or between genders.

Preference: On average 5B, 5PB and 5RP were most preferred. 5GY, 5Y and 5YP were the least preferred. Here there were some differences between Chinese and UK observers, and quite large differences based on gender.

Summary: blackness perception is not strongly linked to nationality and gender, but preference among black colors is. Observers appeared to have a strong preference for bluish blacks and purplish blacks over achromatic and even pure blacks.

Evaluating the Perceived Quality of Soft-Copy Reproductions of Fine Art Images With and Without the Original

This study was done as part of a project to evaluate the perceived image quality of fine art reproductions. The reproductions were printed on the same printer but based on digital scans done at different institutions using different methods. The goal was to see how the presence or absence of the original for comparison effects how people rank the different reproductions.

Two experiments were performed. One under controlled conditions in a laboratory, ranking the different reproductions of each artwork both with and without the original for comparison. The second study was done via the web, in uncontrolled conditions and without the originals.

For the controlled experiment with the original, the subjects were shown the images in pairs and asked to click on one more closely matching the original. The experiments without the original (both in the lab and on the web) were also based on pairs, but the subjects were asked to click on the image they preferred.

In the case of the experiment with the original, the subject’s rankings corresponded closely to measured color differences between the original and reproductions. In the two experiments done without the original, there was no such correlation.

There were low correlations between the results of the controlled experiments done with and without the originals, implying that preference is not strongly linked to fidelity vs. the original. However, there was a strong correlation between the controlled and web experiments done without the originals, implying that testing conditions do not significantly impact the preference judgment of images.

In the web-based experiment, the subjects were also asked to click on the parts of the picture that most influenced their choice. Users tended to click on specific objects, especially faces.

Scanner-Based Spectrum Estimation of Inkjet Printed Colors

Fast and accurate color measurement devices are useful for the printing process, but such devices are expensive. Scanners are cheap and some printing presses even have them integrated into the output paper path, but they are not accurate measuring devices.

This paper describes a method for using knowledge about both the printing process and the scanner characteristics to estimate spectral reflectances of printed material based on scanner output. The scanner response is estimated by scanning various patches with known spectra. Then the spectra of the printed materials are inferred from their scanned pixel values, the scanner response, and a physical model of the printing technology. The method yielded fairly accurate results.

Evaluating Color Reproduction Accuracy of Sterero One-shot Six-band Camera System

Multi-band imaging can be a good solution for accurate color reproduction. Most such systems are time-sequential and cannot capture moving objects or handle camera motion. Several multi-band video systems have been developed, but they all require expensive optical equipment.

The proposed system uses a stereo six-band camera system to acquire depth information and multi-band color information (for accurate color reproduction) simultaneously for moving scenes. The system is comprised of two consumer-model digital cameras, one of which is fitted with an interference filter which chops off half of each of the sensor R, G and B spectral sensitivity curves. Care needs to be taken during processing since the parallax between the two camera images may affect color reproduction. There are also issues relating to gloss, shade, lighting direction, etc. that need to be resolved.

The authors use a thin-plate spline model to deform the images to each other for registration purposes. When a corresponding point cannot be found, they use only the unfiltered image for color reproduction of that pixel. The authors evaluated color accuracy with a Macbeth ColorChecker chart. The colorimetry was accurate and they even got some information on the reflectance spectra.

Efficient Spectral Imaging Based on Imaging Systems with Sensor Adaptation Using Tunable Color Pixels

Current multispectral cameras come in two main types.

Time multiplexing multispectral camera – for static scenes only (high quality capture can take up to 30 seconds).
Color filter array (CFA) with 6 or more filter types, many cameras do this today (Canon Expo 50 megapixel camera). But this causes a large loss of resolution.

This paper discusses the use of tunable imaging sensors with spectral sensitivity curves that can be dynamically varied on a pixel-by-pixel bases. One such sensor is the transverse field detector (TFD), which exploits the fact that the penetration depth of photons into Silicon depends on their wavelength. The TFD uses a transverse electrical field to collect photons at various penetration depths.

The authors simulated a TFD-based system with several selectable per-pixel sensor capture states. The idea is to analyze where the primary spectrum transition is happening and ensure the sensor has a sensitivity peak there. The system has an initial stage where the derivatives of a preview image are fed to a support vector machine (SVM) that has been trained on a set of images for which the ground truth reflectance spectra were known.

For evaluation, the authors simulated spectral capture systems of all three types – time multiplexing, CFA, and tunable sensor. They found that there is a big improvement on the tunable sensor when going from only one possible state per pixel to two, but no improvement when further increasing the number of states. For scientific applications, the tunable system did slightly outperform the other two in terms of accuracy, and gave similar results for consumer images (the authors suspect that with a better choice of sensor states an improvement would be possible here too). More importantly, the tunable sensor technique doesn’t suffer from the primary drawbacks of the other two (reduced resolution in the case of CFA, multiple-shot requirement in the case of time multiplexing).

A New Approach for the Assessment of Allergic Dermatitis Using Long Wavelength Near-Infrared Spectral Imaging

Hyperspectral imaging can help with diagnosis of allergic dermatitis, which is one of the most common skin diseases. Near infra-red (IR) can penetrate under the skin and show what is happening there. The author’s system showed early stages of both disease and treatment success before visible changes were apparent. It could also clearly discriminate irritation from allergic reaction (which is very difficult from visual inspection), even distinguishing between different types of allergic reactions.

Saliency-Based Band Selection for Spectral Image Visualization

Visualizing multispectral data on an RGB display always involves some data loss, but the goal is to show the most important data while keeping a somewhat natural-looking image. The authors of this paper used saliency (visual importance) maps (previously used to predict where people would spend most time looking in an image), to find the most important three channels.

Spectral Estimation of Fluorescent Objects Using Visible Lights and an Imaging Device

Many everyday materials (for reasons of safety, fashion or others) contain fluorescent substances. The principle of fluorescence is that the material absorbs a photon, goes from ground state to a highly excited state, slowly goes to a less excited stage and finally jumps back to ground state, releasing a less-energetic (lower frequency) photon.

Standard fluorescent measurements involve either two monochromators (expensive and only usable in laboratory setups) or UV light and a spectro-radiometer (hard to estimate accuracy; also, the use of UV lights poses a safety problem).

The authors of this paper propose a method for estimating the spectral radiance factor of fluorescent objects by using visible lights and a multispectral camera. Measurements assume that the fluorescence is equally emitted along all wavelengths lower than the excitation wavelength, which is a pretty good assumption for most fluorescent materials. Analysis of the results showed them to be of high accuracy.

2011 Color and Imaging Conference, Part IV: Featured Talks

CIC typically has several featured talks such as keynotes and an “evening lecture” – these are invited talks about topics of interest to attendees:

Keynote: Color Responses of the Human Brain Explored with fMRI

The first keynote of the conference was given by Kathy Mullen from McGill Vision Research at the McGill University Department of Opthalmology.

In this keynote, Prof. Mullen (who also taught the course “The Role of Color in Human Vision” this year) discussed research into human vision that uses fMRI (functional magnetic resonance imaging) to measure BOLD (blood oxygen level dependent response). This takes advantage of the fact that oxyhemoglobin increases in venous blood during neuronal activity, which results in an increase in the intensity of the BOLD signal after a time delay (2-3 seconds). BOLD is imaged volumetrically at a resolution of about 3mm cubed, which is typical for fMRI.

At a high level, visual information flows from the optic nerve via the thalamus (relayed through the lateral geniculate nucleus – LGN) to the visual cortex in the back of the head. Then it splits into two primary streams – the dorsal stream, thought to be associated with motion, and the ventral stream, thought to be associated with objects. The BOLD experiments attempt to localize particular aspects of human perception more precisely. These experiments involve showing volunteers specific stimuli which are carefully designed to isolate certain visual processing areas.

A few different fMRI studies were discussed; for example, it was found that the different opponency signals (blue-yellow, red-green, and achromatic) have widely differing intensities in the LGN (corresponding roughly to the differing proportions of the cones and opponency neurons driving them), but the cortex responds to all three roughly equally – this implies that selective amplification is occurring between the LGN and the cortex. This amplification also appears to depend on temporal frequency -it does not occur for signals cycling at 8 Hz or faster.

In general, fMRI appears to be a bit of a blunt instrument but it can tell us where in the brain certain things happen, making it a useful complement to psychophysical data and low-level (single neuron) experiments on monkeys.

Keynote: The Challenge of Our Known Knowns

This keynote was given by Robert W. G. Hunt (Michael R. Pointer was supposed to be presenting part of it but couldn’t make it to the conference due to illness).

Dr. Hunt is a titan in the field of color science, following up 36 years at Kodak Research (where his accomplishments included the design of the first commercial subtractive color printer) with thirty years as an independent color consultant. He has written over 100 papers (including several highly influential ones) on color vision, color reproduction, and color measurement, as well as two highly-regarded textbooks: “The Reproduction of Colour” (now in its 6th edition) and “Measuring Colour” (now in its 4th edition). He has won many accolades for his work in the field, including appointment as an Officer of the British Empire (OBE) in 2009. He has been a constant presence at CIC over the years, teaching courses and giving many keynotes.

This keynote speech focused on factors which are known to have a (sometimes profound) effect on the appearance of colored objects, but for which agreed quantitative measures are not yet available.

Successive Contrast

This is the phenomena of adaptation to previous images affecting the current image. This needs to be accounted for in (e.g.) motion picture film editing, but there is no measure for it. We need a quantitative representation of successive contrast as a function of the luminance, chromaticity, and time of exposure of the adapting field.

Simultaneous Contrast

The appearance of a color can be greatly affected by the presence of other color around it. This phenomenon has been known since the mid-19th century. Dark surround is known to make colors look lighter, and light surround makes them look darker. CIECAM02 and other proposed measures do account for simultaneous contrast, but not for the fact that the strength of the effect depends on the extent of the contact between the color and its surround (for example, the perceived color of a thin “X” embedded in the surround will be affected a lot more than that of a rectangular patch).. A comprehensive quantitative representation would have to be a function of the luminance factor and chromaticity of adjacent areas, extent of their contact, and include allowance for cognitive effects.

Assimilation

When stimuli cover small angles in the field of view, the opposite of simultaneous contrast can occur. The color appear to be more, rather than less, like their surroundings, an effect called assimilation. The likely causes of the effect include scattering of light in the eye, and the fact that the color difference signals in the visual system have lower resolution than that of the achromatic signal. A quantitative representation of assimilation would have to be a function of the luminance factor and chromaticity of the adjacent areas, the extent of their contact, and the angular subtense of the elements.

Gloss

The surfaces of most objects have some gloss, and the appearance of their colors is affected by the geometry of the lighting. Colors of glossy objects can appear very different in lighting that is diffuse vs. directional. Current methods of measuring gloss do so by measuring the ratio of magnitude of specularly reflected light to incident light at certain angles.

However, such measurements do not account for other factors that affect apparent gloss – geometry of illumination, roughness of objects, and amount of non-specular (diffuse) reflection.

There appear to be two major perceptual dimensions to gloss: contrast gloss (a function of the specular and diffuse reflectance factors) and distinctness of image (a function of specular spread). A quantitative representation of gloss might have to be a function or functions of not just specular reflectance factor, diffuse reflectance factor, and spread of specular reflection, but also of the geometry of illumination.

Translucency

Translucency is very important in the apparent quality of foodstuffs. Like gloss, it also appears to have two main perceptual dimensions: clarity (the extent to which fine detail can be perceived through the material) and haze (the extent to which objects objects viewed through the material appear to be reduced in contrast). Clarity and haze can be measured with the right apparatus. A quantitative representation of translucency might have to be separate functions of clarity and haze, but the relationship between these requires further research.

Surface Texture

Pattern is a fundamental attribute belonging to a surface; texture is a parameter relating to the perception of that pattern, which will, among other variables, be a function of the viewing distance. Surface textures can be characterized in terms of structure (structured-unstructured), regularity (irregular-regular) and directionality (directional-isotropic). The measurement of texture is still in its infancy. A quantitative representation of texture might have to be a function of structure, regularity and directionality. Some early research into this area is being done by people working in machine vision.

Summary

Are color, gloss, translucency, and surface texture all independent phenomena? They all derive from the optical properties of materials. Gloss can have a large effect on color (light reflections decrease saturation). Surface texture has been found to have a large effect on gloss. All these phenomena are known, but quantitative measures are lacking – many industries would benefit.

Evening Lecture: Exploring the Fascinating World of Color Beneath the Sea

The lecture was given by David Gallo, Director of Special Projects at the Woods Hole Oceanographic Institution. David Gallo is a prominent undersea explorer and oceanographer who has used manned submersibles and robots to map the ocean world with great detail. Among many other expeditions, he has co-led expeditions exploring the RMS Titanic and the German battleship Bismarck.

The talk was very interesting and included a lot of stunning imagery, but is hard to summarize. David Gallo emphasized the importance of the ocean, and the fact that humans have only explored about 5% of it. Images were shown of underwater lakes, rivers, and waterfalls (formed of denser and saltier water than the surrounding ocean), as well as a variety of bioluminescent creatures from both mid and deep waters, including species living around poisonous hydrothermic vents. Acquiring high-quality color images in the deep ocean is a difficult challenge, ably met by David’s colleague William Lange at the Woods Hole Advanced Imaging and Visualization Lab.

Keynote: The Human Demosaicing Algorithm

This keynote was given by Prof. David Brainard, from the Department of Psychology at the University of Pennsylvania.

Almost all color cameras use interleaved trichromatic sampling – not all channels are sampled at all pixels, instead there is a mosaic (e.g. Bayer Mosaic). The output of this mosaic must be processed (“demosaiced”) into a thrichromatic image. There is information loss, often resulting in artifacts such as magenta-green fringing. Algorithms are constantly being improved to try to reduce the artifacts but they can never be completely eliminated.

The human retina has the same interleaved design; there is only one cone (long – L, medium – M, or short – S) at each location. S cones are sparse, and the L & M cones are arranged in a quasi-random pattern. The same ambiguities exist, but we very rarely see these chromatic fringes. High enough frequencies do reach the retina to cause such artifacts in theory, so some very clever algorithm must be at work.

Making the problem more complicated, there are very large differences between observers in proportions of L, M and S cones, even among observers that test with normal color vision. Some people have a lot more M than L, or the other way around. Yet somehow this does not affect their color perception.

There are two functional questions:

How does the human visual system (HVS) process the responses of an interleaved cone mosaic to approximate full trichromancy?
How do individuals with very different mosaics perceive color in the same way?

Prof. Brainard uses a Bayesian approach to analyze cases where sensory data is ambiguous about physical variables of interest. He picked this approach because is has simple underlying principles, provides an optimal performance benchmark, and is often a good choice for the null hypothesis about performance.

The basic Bayesian recipe is: model the sensory system as a likelihood, express statistical regularities of the environment as a prior distribution, and apply the Bayes Rule.

Prof. Brainard gave a simple example to illustrate the use of Bayesian priors and posteriors, and then scaled it up to the systems used in his work.

The Bayesian system was able to correctly predict many facets of human color vision, including the demosaicing, and how people with very different cone distributions are able to have similar color vision. To stress-test the predictive power of the model, they examined an experiment by Hofer et al in 2005 which used corrective optics to image spots of light on the retina smaller than the distance between adjacent cones. The results predicted by the Bayesian system matched Hofer’s results.

For this method to work, the visual system must process the cone output with ‘knowledge’ of the type of each cone at a fine spatial scale. It appears that the brain needs to learn the cone types, since there doesn’t seem to be a biochemical marker. Another experiment was done to determine how well cones can be typed via unsupervised learning, and it was found that this is indeed possible – about 2500 natural images are sufficient for a system to learn the type of each input in the mosaic.

None of this proves that the HVS works in this way, but it does show that it is possible and correctly predicts all the data. In the future, Prof. Brainard plans to explore engineering applications of these methods.

2011 Color and Imaging Conference, Part III: Courses B

This post covers the rest of the CIC 2011 courses that I attended; it will be followed by posts describing the other types of CIC content (keynotes, papers, etc.).

Lighting: Characterization & Visual Quality

This course was given by Prof. Françoise Viénot, Research Center of Collection Conservation, National Museum of Natural History, Paris, France.

During the 20th century, the major forms of light were incandescent (including tungsten-halogen) and discharge (fluorescent as well as compact fluorescent lamps – CFL). Incandescent lights glow from heat, and discharge lamps include an energetic spark or discharge which emits a lot of UV light, which is converted by a fluorescent coating to visible light. LEDs are relatively new as a lighting technology; they emit photons at a frequency based on the bandgap between semiconductor quantum energy levels. LEDs are often combined with fluorescent phosphors to change the light color.

Correlated Color Temperature is used to describe the color of natural or “white” light sources. CCT is defined as the temperature of the blackbody (an idealized physical object which glows due to its heat) which has the color nearest the color of the tested illuminant. Backbody colors range from reddish at around 1000K (degrees Kelvin) through yellow, white and finally blue-white (for temperatures over 10,000K). The CCT is only defined for illuminants with colors reasonably near one of these so it is meaningless to talk about the CCT of an, e.g., green or purple light. “Nearest color” is defined on a CIE uv chromaticity diagram. Reciprocal CCT (one over CCT) is also sometimes used – reciprocal CCT lines are spaced very nearly at equal uv distances which is a useful coincidence (for example, “color temperature” interface sliders should work proportionally to reciprocal CCT for intuitive operation).

Perhaps confusingly, in terms of psychological effect low CCT corresponds to “warm colors” or “warm ambience” and high CCT corresponds to “cool colors” or “cool ambience”. Desirable interior lighting is about 3000K CCT.

Light manufacturers do not reproduce the exact spectra of real daylight or blackbodies, they produce metamers (different spectra with the same perceived color) of the white light desired. For example, four different lights could all match daylight with CCT 4500K in color, but have highly different spectral distributions. Actual daylight has a slightly bumpy spectral power distribution (SPD), incandescent light SPDs are very smooth, discharge lamps have quite spiky SPDs, and LED SPDs tend to have two somewhat narrow peaks.

Since LEDs are a new technology they are expected to be better or at least equal to existing lighting technologies. Expectations include white light, high luminous efficacy (converting a large percentage of its energy consumption on visible light and not wasting a lot on UV or IR), low power consumption, long lifetime, high values of flux (emitted light quantity), innovations such as dimmability and addressability, and high visual quality (color rendition, comfort & well-being). LED light is clustered into peaks that are not quite monochromatic – they are “quasi-monochromatic” with a smooth but narrow peak (spectral width around 100nm).

Most white light LEDs are “phosphor-converted LEDs” – blue LEDs with fluorescent powder (phosphor) that captures part of the blue light and emits yellow light, creating two peaks (blue and yellow) which produce an overall white color. By balancing the two peaks (varying the amount of blue light captured by the fluorescent powder), LED lights with different CCTs can be produced. It is also possible to add a second phosphor type to create more complex spectra. New LED lights are under development that use a UV-emitting LED coupled with 3 phosphor types.

An alternative approach to producing white-light LEDs is to create “color-mixed LEDs” combining red, green, and blue LEDs. There are also hybrid mixtures with multiple LEDs as well as phosphors. This course focused on phosphor-converted LEDs. They have better color rendition and good luminous efficacy, and are simple to control. On the other hand, RGB color-mixed LEDs have the advantage of being able to vary color dynamically.

Regarding luminous efficacy, in the laboratory cool white LED lamps can achieve very high values – about 150 lumens per Watt (steady-state operation). Commercially available cool white LED lamps can reach a bit above 100 lm/Watt, commercial warm white ones are slightly lower. US Department of Energy targets are for commercial LED lights of both types to approach 250 lm/Watt by 2020.

Intensity and spectral width strongly depend on temperature (cooling the lamp makes it brighter and “spikier”, heating does the opposite). Heat also reduces LED lifetime. As LEDs age, their flux (light output) decreases, but CCT doesn’t typically change. The rate of flux reduction varies greatly with manufacturer.

One way to improve LED lifetime is to operate it for short durations (pulse width modulation). This is done at a frequency between 100-2000 Hz, and of course reduces the flux produced.

Heat dissipation is the primary problem in scaling LED lights to high-lumen applications (cost is also a concern) – they top out around 1000 lumens.

The Color Rendering Index (CRI) is the method recommended by the CIE to grade illumination quality. The official definition of color rendering is “effect of an illuminant on the colour appearance of objects by conscious or subconscious comparison with their colour appearance under a reference illuminant”. The instructor uses “color rendition”, for which she has a simpler definition: “effect of an illuminant on the colour appearance of objects”.

CIE’s procedure for measuring the general color rendering index R_a consists of comparing the color of a specific collection of eight samples when illuminated by the tested light vs. a reference light. This reference light is typically a daylight or blackbody illuminant with the same or similar CCT as the tested light (if the tested light’s color is too far from the blackbody locus to have a valid CCT a rendering index cannot be computed; in any case such an oddly-colored light is likely to have very poor color rendering). The process includes a Von Kries chromatic adaptation to account for small differences in CCT between the test and reference light sources. After both sets of colors are computed, the mean of the chromatic distances between the color pairs is used to generate the CIE color rendering index. The scaling factors were chosen so that the reference illuminant itself would get a score of 100 and a certain “known poor” light source would get a score of 50 (negative scores are also possible). For office work, a score of at least 80 is required.

Various problems with CRI have been identified over the years, and alternatives have been proposed. The Gamut Area Index (GAI) is an approach that describes the absolute separation of the chromaticities of the eight color chips, rather than their respective distances vis-à-vis a reference light. Incandescent lights tend to get low scores under this index. Another alternative metric, the Color Quality Scale (CQS) was proposed by the National Institute of Standards and Technology (NIST). It is similar to CRI in basic approach but contains various improvements in the details. Other approaches focus on whether observers find the colors under the tested light to be natural or vivid.

In general, there are two contradicting approaches in selecting lights. You can either emphasize fidelity, discrimination and “naturalness”, or colorfulness enhancement and “beautification” you can’t have both. Which is more desirable will depend on the application. For everyday lighting situations, full-spectrum lights are likely to provide the best combination of color fidelity and visual comfort.

There are also potential health issues – lights producing a high quantity of energy in the blue portion of the spectrum may be harmful for the vision of children as well as adults with certain eye conditions. In general, the “cooler” (bluer) the light source, the greater the risk, but there are other factors, such as brightness density. Looking directly at a “cool white” LED light is most risky; “warm white” lights of all types as well as “cool white” frosted lamps (which spread brightness over the lamp surface) are more likely to be OK.

The Role of Color in Human Vision

This course was taught by Prof. Kathy T. Mullen from the Vision Research Unit in the McGill University Dept. of Ophthalmology.

Prof. Mullen started by stating that primates are the only trichromats (having three types of cones in the retina) am0ng mammals – all other mammals are dichromats (have two types of cones in the retina). One of the cone types mutated into two different ones relatively recently (in evolutionary terms). There is evidence that other species co-evolved with primate color vision (e.g. fruit colors changed to be more visible to primates).

The Role of Color Contrast in Human Vision

Color contrast refers to the ability to see color differences in the visual scene. It allows us to better distinguish boundaries, edges, and objects.

Color contrast has 4 roles.

Role 1: Detection of objects that would otherwise be invisible due to being seen against a dappled backgrounds – for example, seeing red berries among semi-shadowed green foliage.

Role 2: Segregation of the visual field into elements that belong together – if an object’s silhouette is split into several parts by closer objects, color enables us to see that these are all parts of the same object.

Role 3: Helps tell the difference between variations in surface color and variations in shading. This ability depends on whether color and achromatic contrasts coincide spatially or not. For example, a square with chrominance stripes (stripes of different color but the same luminance) at 90 degrees to luminance stripes (stripes that only change luminance) is strongly perceived as a 3D shaded object. If the chrominance and luminance stripes are aligned, then the object appears flat.

Role 4: Distinguishing between otherwise similar objects. This leads into color identification. If after distinguishing objects by color, we can also identify the colors, then we can infer more about the various object’s properties.

Color Identification and Recognition

Color identification & recognition is a higher, cognitive stage of color vision, which involves color identification / recognition and color naming. It requires an internalized “knowledge” of what the different colors are. There is a (very rare) condition called “color agnosia” where color recognition is missing – people suffering from this condition perform normally on (e.g.) color-blindness vision tests, but they can’t identify or name colors at all.

Color is an object property. People group, categorize and name colors using 11 basic color categories: Red, Yellow, Green, Blue, Black, Grey, White, Pink, Orange, Purple, and Brown (there is some evidence that Cyan may also be a fundamental category).

Psychophysical Investigations of Color Contrast’s Role in Encoding Shape and Form

For several decades, vision research was guided by an understanding of color’s role which Prof. Mullen calls the “coloring book model”. The model holds that achromatic contrast is used to extract contours and edges and demarcates the regions to be filled in by color, and color vision has a subordinate role – it “fills in” the regions after the fact. In other words, color edges have no role in the initial shape processing occurring in the human brain.

To test this model, you can perform experiments that ask the following questions:

Does color vision have the basic building blocks needed for form processing: spatially tuned detectors & orientation tuning?
Can color vision extract contours and edges from the visual scene?
Can color vision discriminate global shapes?

The coloring book model would predict that the answer to all of these questions is “no”.

Prof. Mullen then described several experiments done to determine the answers to these questions. These experiments relied heavily on “isoluminant colors” – colors with different chromaticity but the same luminance. The researchers needed extremely precise isolation of luminance, so they had to find individual isoluminant color pairs for each observer. This was done via an interesting technique called “minimum motion”, which relies on the fact that color vision is extremely poor at detecting motion. The researchers had observers stare at the center of an image of a continually rotating wheel with two alternating colors on the rim. The colors were varied until the rim appeared to stop turning – at that point the two colors were recorded as an isoluminant pair for that observer.

The experiments showed that color vision can indeed extract contours and edges from the scene, and discriminate global shapes, although slightly less well than achromatic (luminance) vision. It appears that the “coloring book” model is wrong – color contrast can be used in the brain in all the same ways luminance contrast can. However, color vision is relatively low-resolution, so very fine details cannot be seen without some luminance contrast.

The Physiological Basis of Color Vision

Color vision has three main physiological stages:

Receptoral (cones) – light absorption – common to all day time vision
Post receptoral 1 – cone opponency extracts color but not color contrast
Post receptoral 2: double cone opponency extracts color contrast

The retina has three types of cone cells used for day time (non-low-light) vision. Each type is sensitive to a different range of wavelengths – L cones are most sensitive to long-wavelength light, M cones are most sensitive to light in the middle of the visual spectrum, and S cones are most sensitive to short-wavelength light.

Post-receptoral 1: There are three main types of neurons in this layer, each connected to a local bundle of differently-typed cones. One forms red-green color vision from the opponent (opposite-sign) combination of L and M cones. The second forms blue-yellow color vision from the opponent combination of S with L and M cones. These two types of neurons are most strongly excited (activated) by uniform patches of color covering the entire cone bundle (some of them serve a different role by detecting luminance edges instead). The third type of neuron detects the luminance signal, and is most strongly excited by a patch of uniform luminance covering the entire cone bundle.

Post-receptoral 2: these are connected to a bundle of neurons from the “post-receptoral 1” phase – of different polarity; for example, a combination of “R-G+” neurons (that activate when the color is less red and more green) and “R+G-“ neurons (that activate when the color is more red and less green). Such a cell would detect red-green edges (a similar mechanism is used by other cells to detect blue-yellow edges). These types of cells are only found in the primate cortex – other types of mammals don’t have them.

Introduction to Multispectral Color Imaging

This course was presented by Dr. Jon Y. Hardeberg from the Norwegian Color Research Laboratory at Gjøvik University College.

Metamerism (the phenomena of different spectral distributions which are perceived as the same color) is both a curse and a blessing. Metamerism is what enables our display technologies to work. However, two surfaces with the same appearance under one illuminant may very well have a different appearance under another illuminant.

Besides visual metamerism, you can also have camera metamerism – a camera can generate the same RGB triple from two different spectral distributions. Most importantly, camera metamerism is different than human metamerism. For the two to be the same, the sensor sensitivity curves of the camera would have to be linearly related to the human cone cell sensitivity curves. Unfortunately, this is not true for cameras in practice. This means that cameras can perceive two colors as being different when humans would perceive them to be the same, and vice versa.

Multispectral color imaging is based on spectral reflectance rather than ‘only’ color; the number of channels required is greater than the three used for colorimetric imaging. Multispectral imaging can be thought of as “the ultimate RAW” – capture the physics of the scene now, make the picture later. Applications include fine arts / museum analysis and archiving, medical imaging, hi-fi printing and displays, textiles, industrial inspection and quality control, remote sensing, computer graphics, and more.

What is the dimensionality of spectral reflectance? This relates to the number of channels needed by the multispectral image acquisition system. In theory, spectral reflectance has infinite dimensionality, but objects don’t have arbitrary reflectance spectra in practice. Various studies have been done to answer this problem, typically using PCA (Principal Component Analysis). However, these studies tend to produce a wide variety of answers, even when looking at the same sample set.

For the Munsell color chip set, various studies have derived dimensionalities ranging from 3 to 8. For paint/artwork from 5 to 12, for natural/general reflectances from 3 to 20. Note that these numbers do not correspond to a count of required measurement samples (regularly or irregularly spaced), but to the number of basis spectra required to span the space.

Dr. Hardeberg did a little primer on PCA. Plotting the singular values can let you know when to “cut off” further dimensions. He proposed cutting off dimensions after the accumulated energy reaches 99% of the total.effective dimensionality based on 99% of accumulated energy – accumulated energy sounds like a good measure for PCA.

Dr. Hardeberg next discussed his own work on dimensionality estimation. He analyzed several reflectance sets:

MUNSELL: 1269 chips with matte finish, available from the University of Joensuu at Finland.
NATURAL: 218 colored samples collected from nature, also available at Joensuu.
OBJECT: 170 natural and man-made objects, online courtesy Michael Vhrel.
PIGMENTS: 64 oil pigments used in painting restoration, provided to ENST by National Gallery under the VASARI project (not available online)
SUBLIMATION: 125 equally spaced patches of a Mitsubishi S340-10 CMY sublimation printer

Based on the 99% accumulated energy criterion, he found the following dimensionalities for the various sets: 18 for MUNSELL, 23 for NATURAL, 15 for OBJECT, 13 for PIGMENTS, 10 for SUBLIMATION. The results suggest that 20 dimensions is a reasonable general-purpose number, but the optimal number will depend on the specific application.

The finding of 10 dimensions for the SUBLIMATION dataset may b e viewed as surprising since only three colorants (cyan, magenta, and yellow ink) were used. This is due to the nonlinear nature of color printing. A nonlinear model could presumably use as few as three dimensions, but a linear model needs 10 dimensions to get to 99% accumulated energy.

Multispectral color image acquisition systems are typically based on a monochrome CCD camera with several color filters. There are two variants – passive (filters in the optical path) and active (filters in the light path). Instead of multiple filters it is also possible to use a single Liquid Crystal Tunable Filter (LCTF). Dr. Hardeberg gave brief descriptions of several multispectral acquisition systems in current use, ranging from 6 to 16 channels.

Getting spectral reflectance values out of the multichannel measured values requires some work – Dr. Hardeberg detailed a model-based approach that takes a mathematical model of the acquisition device (how it measures values based on spectral input) and inverts it to general spectral reflectance from the measured values.

There is work underway to find spectral acquisition systems that are cheaper, easier to operate, and faster while still generating high-quality reflectance data. One of these is happening in Dr. Hardeberg’s group, based on a Color-Filter Array (CFA) – similar to the Bayer mosaics found in many digital camera, but with more channels. This allows capturing spectral information in one shot, with one sensor. Another example is a project that takes a stereo camera and puts different filters on each of the lenses, processing the resulting images to get stereoscopic spectral images with depth information.

Dr. Hardeberg ended by going over various current research areas for improving multispectral imaging, including a new EU-sponsored project by his lab which is focusing on multispectral printing.

Fundamentals of Spectral Measurements for Color Science

This course was presented by Dr. David R. Wyble, Munsell Color Science Lab at Rochester Institute of Technology.

Colorimetry isn’t so much measuring a physical value, as predicting the impression that will be formed in the mind of the viewer. Spectral measurements are more well-defined in a physical sense.

Terminology: Spectrophotometry measures spectral reflectance, transmittance or absorptance of a material as a function of wavelength. The devices used are spectrophotometers, which measure the ratio of two spectral photometric quantities, to determine the properties of objects or surfaces. Spectroradiometry is more general – measurement of spectral radiometric quantities. The devices used (spectroradiometers) work by measuring spectral radiometric quantities to determine the properties of light sources and other self-luminous objects. Reflectance, transmittance, absorptance are numerical ratios; the words “reflection”, “transmission”, and “absorption” refer to the physical processes. Most spectrophotometers measure at 10nm resolution, and spectroradiometers typically at 5-10nm.

Spectrophotometers

Spectophotometers measure a ratio with respect to a reference, so no absolute calibration is needed. For reflectance we reference a Perfect Reflecting Diffuser (PRD) and for transmittance we use air. A PRD is a theoretical device – a Lambertian diffuser with 100% reflectance. Calibration transfer techniques are applied to enable the calculation of reflectance factor from available measured data.

Reflectance is the ratio of the reflected flux to the incident flux (problem – measuring incident flux). Reflectance Factor is the ratio of the flux reflected from the sample to the flux that would be reflected from an identically irradiated PRD (problem – where’s my PRD?).

The calibration equation (or why we don’t need a PRD): a reference sample (typically white) is provided together with R_ref(λ) – the known spectral reflectance of the sample (λ stands for wavelength). This sample is measured to provide the “reference signal” i_ref(λ). In addition, “zero calibration” (elimination of dark current, stray light, etc.) is performed by measuring a “dark signal” i_dark(λ). Dark signal is measured either with a black reference sample or “open port” (no sample in the device). The calibration equation combines R_ref(λ), i_ref(λ) and i_dark(λ) with the measured sample intensity i_sample(λ) to get the sample’s spectral reflectance R_sample(λ):

R_sample (λ) = R_ref (λ) * (i_sample (λ) – i_dark (λ)) / (i_ref (λ) – i_dark (λ))

Note that R_ref(λ) was similarly measured against some other reference, and so on. So you have a pedigree of standards, ultimately leading to some national standards body. For example, if you buy a white reference from X-Rite, it was measured by X-Rite against a white tile they have that was measured at the National Institute of Standards and Technology (NIST).

A lot of lower-cost spectrophotometers don’t come with a reflectance standard – Dr. Wyble isn’t clear on how those work. You can always buy a reflectance standard separately and do the calibration yourself, but that is more risky – if it all comes from the same manufacturer you can expect that it was done properly.

Transmittance is the ratio of transmitted flux to incident flux. At the short path lengths in these devices air is effectively a perfect transmitter for visible light. So a “transmittance standard” is not needed since the incident flux can be measured directly – just measure “open port” (no sample) – for liquids you can measure an empty container, and in the case of measuring specific colorants which are dissolved in a carrier fluid you could measure with a container full of clean carrier fluid.

Calibration standards must be handled, stored and cleaned with care according to manufacturer instructions, otherwise incorrect measurement will result. A good way to check is to measure the white standard and check the result just before measuring the sample.

A spectrophotometer typically includes a light source, a sample holder, a diffraction grating (for separating out spectral components) and a CCD array sensor, as well as some optics.

Measurement geometry refers to the measuring setup; variables such as the angles of the light source and sensor to the sample, the presence or absence of baffles to block certain light paths, the use (or not) of integrating hemispheres, etc. Dr. Wyble went into a few examples, all taken from the CIE 15.2004 standards document. Knowledge of which measurement geometry was used can be useful, e.g. to estimate how much specular reflectance was included in a given measurement (different geometries exclude specular by different degrees). Some special materials (“gonio-effects” pigments that change color based on angle, fluorescent, metallic, retroreflective, translucent, etc.) will break the standard measurement geometries and need specialized measuring methods.

Spectroradiometers

Similar to spectrophotometers, but have no light source or sample holder. The light from the luminous object being measured goes through some optics and a dispersing element to a detector. There are no standard measurement geometries for spectroradiometry.

Some spectroradiometers measure radiance directly emitted from the source through focused optics (typically used for measuring displays). Others measure irradiance – the light incident on a surface (typically used for measuring illuminants). Irradiance measurements can be done by measuring radiance from a diffuse white surface, such as pressed polytetrafluoroethylene (PTFE) powder.

Irradiance depends on the angle of incident light and the distance of the detector. Radiance measured off diffuse surfaces is independent of angle to the device. Radiance measured off uniform surfaces is independent of distance to the device.

Instrument Evaluation: Repeatability (Precision) and Accuracy

Repeatability – do you get similar results each time? Accuracy – is the result (on average) close to the correct one? Repeatability is more important since repeatable inaccuracies can be characterized and corrected for.

Measuring repeatability – the standard deviations of reflectance or colorimetric measurements. The time scale is important: short-term repeatability (measurements one after the other) should be good for pretty much any device. Medium-term repeatability is measured over a day or so, and represents how well the device does between calibrations. Long-term repeatability is measured over weeks or months – the device would typically be recalibrated several times over such an interval. The most common measure of repeatability is Mean Color Difference from the Mean (MCDM). It is measured by making a series of measurements of the same sample (removing and replacing each time to simulate real measurements), calculating L*a*b* values for each, calculating the mean, calculate ΔE*ab between each value and the mean, and finally averaging the ΔE*ab values to get MCDM. The MCDM will typically be about 0.01 (pretty good) to 0.4 (really bad). Small handheld devices commonly have around 0.2.

Quantifying accuracy – typically done by measuring the spectral reflectance of a set of known samples (e.g. BCRA tiles) that have been previously measured at high-accuracy laboratories: NIST, NRC, etc. The measured values are compared to the “known” values and the MCDM is calculated as above. Once the inaccuracy has been quantified, this can be used to correct further measurement with the device (using regression analysis). When applied to the test tile values, the correction attempts to match the reference tile values. When applied to measured data, the correction attempts to predict reflectance data as if the measurements were made on the reference instrument. Note that the known values of the samples have uncertainties in them. The best uncertainty you can get is the 45:0 reflectometer at NIST, which is about 0.3%-0.4% (depending on wavelength) – you can’t do better than that.

Using the same procedure, instead of aligning your instruments with NIST, you can align a corporate “fleet” of instruments (used in various locations) to a “master” instrument.

2011 Color and Imaging Conference, Part II: Courses A

CIC traditionally includes a strong course program, with a two-day course on fundamentals (a DVD of this course presented by Dr. Hunt can be purchased online) and a series of short courses on more specialized topics. Since I attended the fundamentals course last year, this year I only went to short courses. This blog post will detail three of these courses, with the others covered by a future post.

Color Pipelines for Computer Animated Features

The first part of the course was presented by Rod Bogart. Rod is the lead color science expert at Pixar, and worked on color-related issues at ILM before that.

The animated feature pipeline has many steps, some of which are color-critical (underlined) and some which aren’t: Story, Art, Layout, Animation, Shading, Lighting, Mastering, and Exhibition. The people working on the underlined stages are the ones with color-critical monitors on their desks. Rod’s talk went through the color-critical stages of the pipeline, discussing related topics on the way.

Art

In this stage people look at reference photos, establish color palettes, and do look development. Accurate color is important. Often, general studies are done on how exteriors, characters, etc. might look. This is mostly done in Photoshop on a Mac.

Art is the first stage where people make color-critical images. In general, all images made in animated feature production exist for one of two reasons – for looking at directly, or to be used for making more images (e.g., textures). The requirements for image processing will vary depending on which group they belong to. During the Art stage the images generated are intended for viewing.

Images for viewing can be quantized as low as 8 bits per channel, and even (carefully) compressed. Pixel values tend to be encoded to the display device (output referred). In the absence of a color management system, the encoding just maps to frame buffer values, which feed into a display response curve. However, it is better to tag the image with an assumed display device (ICC tagging to a target like sRGB; other metadata attributes can be stored with the image as well). It’s important to minimize color operations done on such images, since they have already been quantized and have no latitude for processing. These images contain low dynamic range (LDR) data.

During the Art phase, images are typically displayed on RGB additive displays calibrated to specific reference targets. Display reference targets include specifications for properties such as the chromaticity coordinates of the RGB primaries and white point, the display response curve, the display peak white luminance and the contrast ratio or black level.

Shading

Shading and antialiasing operations need to occur on linear light values – values that are proportional to physical light intensity. Other operations that require linear values include resizing, alpha compositing, and filtering. Rendered buffers are written out as HDR values and later used to generate the final image.

Lighting

Lighting is sometimes done with special light preview software, and sometimes using other methods such as “light soloing”. “Light soloing” is a common practice where a buffer is written out for the contribution of each light in the scene (all other lights are set to black) and then the lighters can use compositing software to vary individual light colors and intensities and combine the results.

For images such as these “solo light buffers” which are used to assemble viewable images, Pixar uses the OpenEXR format. This format stores linear scene values with a logarithmic distribution of numbers – each channel is a 16-bit half-float. The range of possible values is -65505.0 to +65505.0. The positive range can be thought of as 32 stops (powers of 2) of data, with 1024 steps in each of the stops.

After images are generated, they need to be viewed. This is done in various review spaces: monitors (CRT or calibrated LCD) on people’s desks, as well as various special rooms (review rooms, screening rooms, grading suites) where images are typically shown on DLP projectors. In review rooms the projector is usually hooked up directly to a workstation, while screening rooms use special digital cinema playback systems or “dailies” software. Pixar try not to have any monitors in the screening rooms – screening rooms are dark and the monitors are intended (and calibrated) for brighter rooms.

Mastering

The mastering process includes in-house color grading. This covers two kinds of operations: shot-to-shot corrections and per-master operations. An example of a shot-to-shot correction: in “Cars” in one of the shots the grass ended up being a slightly different color than in other shots in the sequence – instead of re-rendering the shot, it was graded to make the grass look more similar to the other shots. In contrast, per-master operations are done to make the film fit a specific presentation format.

Mastering for film: film has a different gamut than digital cinema projection. Neither is strictly larger – each has colors the other can’t handle. Digital is good for bright, saturated colors, especially primary colors – red, green, and blue. Film is good for dark, saturated colors, especially secondary colors – cyan, magenta, and yellow. Pixar doesn’t generate any film gamut colors that are outside the digital projection gamut, so they just need to worry about the opposite case – mapping colors from outside the film gamut so they fit inside it, and previewing the results during grading. Mapping into the film gamut is complex. Pixar try to move colors that are already in-gamut as little as possible (the ones near the gamut border do need to move a little to “make room” for the remapped colors). For the out-of-gamut colors, first Pixar tried a simple approach – moving to the closest point in the gamut boundary. However, this method doesn’t preserve hue. An example of the resulting problems: in the “Cars” night scene where Lightning McQueen and Mater go tractor-tipping, the closest-point gamut mapping made Lightning McQueen’s eyes go from blue (due to the night-time lighting) to pink, which was unacceptable. Pixar figured out a proprietary method which involves moving along color axes. This sometimes changes the chroma or lightness quite a bit, but tends to preserve hue and is more predictable for the colorist to tweak if needed. For film mastering Pixar project the content in the P3 color space (originally designed for digital projection), but with a warmer white point more typical of analog film projection.

Mastering for digital cinema: color grading for digital cinema is done in a tweaked version of the P3 color space – instead of using the standard P3 white point (which is quite greenish) they use D65, which is the white point people have been using on their monitors while creating the content. Finally a Digital Cinema Distribution Master (DCDM) is created – this stores colors in XYZ space, encoded at 12 bits per channel with a gamma of 2.6.

Mastering for HD (Blu-ray and HDTV broadcast): color grading for HD is done in the standard Rec.709 color space. The Rec.709 green and red primaries are much less saturated than the P3 ones; the blue primary has similar saturation to the P3 blue but is darker. The HD master is stored in RGB, quantized to 10 bits. Rod talked about the method Pixar use for dithering while quantization – it’s an interesting method that might be relevant for games as well. The naïve approach would be to round to the closest quantized value. This is the same as adding 0.5 and rounding down (truncating). Instead of adding 0.5, Pixar add a random number distributed uniformly between 0 and 1. This gives the same result on average, but dithers away a lot of the banding that would otherwise result.

Exhibition

Exhibition for digital cinema: this uses a Digital Cinema Package (DCP) in which each frame is compressed using JPEG2000. The compression is capped to 250 megabits per second – this limit was set during the early days of digital cinema, and any “extra features” such as stereo 3D, 4K resolution, etc. still have to fit under the same cap.

Exhibition for HD (Blu-ray, HDTV broadcast): the 10-bit RGB master is converted to YCbCr, chroma subsampled (4:2:2) and further quantized to 8 bits. This is all done with careful dithering, just like the initial 10 bit quantization. MPEG4 AVC compression is used for Blu-ray, with a 28-30 megabits per second average bit rate, 34 megabits per second peak.

Disney’s Digital Color Workflow – Featuring “Tangled”

The second part of the course was presented by Stefan Luka, a senior color science engineer at Walt Disney Animation Studios. Disney uses various display technologies, including CRT, LCD and DLP projectors. Each display has a gamut that defines the range of colors it can show. Disney previously used CRT displays, which have excellent color reproduction but are unstable over time and have a limited gamut. They now consider LCD color reproduction to finally be good enough to replace CRTs (several in the audience disputed this), and primarily use HP Dreamcolor LCD monitors. These are very stable, can support wide gamuts (due to their RGB LED backlights), and include programmable color processing.

Disney considered using Rec.709 calibration for the working displays, but the artists really wanted P3-calibrated displays, mostly to see better reds. Rec 709’s red primary is a bit orangish – P3’s red primary is very pure, it’s essentially on the spectral locus. Disney calibrate the displays with P3 primaries, a D65 white point, and a 2.2 gamma (which Stefan says matches the CRTs used at that time). The viewing environment in the artist’s rooms is not fully controlled, but the lighting is typically dim.

Disney calibrate their displays by mounting them in a box lined with black felt in front of a spectroradiometer. They measure the primaries and ramps on each channel to build lookup tables. For software Disney use a custom-tweaked version of a tool from HP called “Ookala” (the original is available on SourceForge). When calibrating they make sure to let the monitor warm up first, since LEDs are temperature dependent. The HP DreamColor has a temperature sensor which can be queried electronically, so this is easy to verify before starting calibration. Disney uses a spectroradiometer for calibration – Stefan said that colorimeters are generally not good enough to calibrate a display like this, though perhaps the latest one from X-Rite (the i1Display Pro) could work. Only people doing color-critical work have DreamColor monitors – Disney couldn’t afford to give them to everyone. People with non-color-critical jobs use cheaper displays.

During “Tangled” production, the texture artists painted display encoded RGB, saved as 16-bit (per channel) TIFF or PSD. They used sRGB encoding (managed via ICC or external metadata/LUT) since it makes the bottom bits go through better than a pure power curve. Textures were converted to linear RGB for rendering. Rendering occurred in linear light space; the resulting images had a soft roll-off applied to the highlights and were written to 16-bit TIFF (if they were saving to OpenEXR – which they plan to do for future movies – they wouldn’t have needed to roll-off the highlights). Compositing inputs and final images were all 16-bit TIFFs.

During post production final frames are conformed and prepared for grading. The basic grade is done for digital cinema, with trim passes for film, stereoscopic, and HD.

The digital cinema grade is done in a reference room with a DLP projector using P3 primaries, D65 white point, 2.2 gamma, and 14 foot-Lamberts reference white. The colorist uses “video” style RGB grading controls, and the result is encoded in 12-bit XYZ space with 2.6 gamma, dithered, and compressed using JPEG2000.

For the film deliverable, Disney adjust the projector white point and view the content through the same film gamut mapping that Pixar uses. They then do a trim pass. White point compensation is also needed; the content was previously viewed at D65 but needs to be adjusted for the native D55 film white point to avoid excessive brightness loss. A careful process needs to be done to bridge the gap between the two white points. At the output, film gamut mapping as well as an inverse film LUT is applied to go from the projector-previewed colors to values suitable for writing to film negative. Finally, Disney review the content at the film lab and call printer lights.

Stereo digital cinema – luminance is reduced to 4.5 foot-Lamberts (in the field there will be a range of stereo luminances, Disney make an assumption here that 4.5 is a reasonable target). They do a trim pass, boosting brightness, contrast, and saturation to compensate for the greatly reduced luminance. The colorist works with one stereo eye at a time (working with stereo glasses constantly would cause horrible headaches). Afterwards the result is reviewed with glasses, output & encoded similarly as the mono digital cinema deliverable.

HD mastering – Disney also use a DLP projector for HD, but view it through a Rec.709 color-space conversion and with reference white set to 100 nits. They do a trim pass (mostly global adjustments needed due to the increase in luminance), output and bake the values into Rec.709 color space. Then Disney compress and review final deliverables on a HD monitor in a correctly set up room with proper backlight etc.

After finishing “Tangled”, Disney wanted to determine whether it was really necessary for production to work in P3; could they instead work in Rec.709 and have the colorist tweak the digital cinema master to the wider P3 gamut? Stefan said that this question depends on the distribution of colors in a given movie, which in turn depends a lot on the art direction. Colors can go out of gamut due to saturation, or due to brightness, or both. Stefan analyzed the pixels that went out of Rec.709 gamut throughout “Tangled”. Most of the out-of-gamut colors were due to brightness – most importantly flesh tones. A few other colors went out of gamut due to saturation: skies, forests, dark burgundy velvet clothing on some of the characters, etc.

Stefan showed four example frames on a DreamColor monitor, comparing images in full P3 with the same images gamut-mapped to Rec.709. Two of the four barely changed. Of the remaining two, one was a forest scene with a cyan fog in the background which shifted to green when gamut-mapped. Another shot, with glowing hair, had colors out of Rec.709 gamut due to both saturation & brightness.

At the end of the day, the artists weren’t doing anything in P3 that couldn’t have been produced at the grading stage, so Stefan doesn’t think doing production in P3 had much of a benefit. P3 was mostly used to boost brightness, so working in 709 space with additional headroom (e.g. OpenEXR) would be good enough.

After “Tangled”, Disney moved from 16-bit TIFFs to OpenEXR, helped by their recent adoption of Nuke (which has fast floating-point compositing – “Tangled” was composited on Shake). They also eliminated the sRGB encoding curve, and now just use a 2.2 gamma without any LUTs. Disney no longer need to do a soft roll off of highlights when rendering since OpenEXR can contain the full highlight detail. They are doing some experiments with HDR tone mapping, especially tweaking the saturation. Disney have also moved to working in Rec.709 instead of P3 for production (for increased compatibility between formats) and are using non-wide-gamut monitors (still HP, but not DreamColor).

In the future, Disney plan to do more color management throughout the pipeline, probably using the open-source OpenColorIO library. They also plan to investigate improvements in gamut mapping, including local contrast preservation (taking account of which colors are placed next to each other spatially, and not collapsing them to the same color when gamut mapping).

Color in High-Dynamic Range Imaging

This course was presented by Greg Ward. Greg is a major figure in the HDR field, having developed various HDR image formats (LogLuv TIFF and JPEG-HDR, as well as the first HDR format, RGBE), the first widely-used HDR rendering system (RADIANCE), and the first commercially available HDR display, as well as various pieces of software relating to HDR (including the Photosphere HDR image builder and browsing program). He’s also done important work on reflectance models, but that’s outside the scope of this course.

HDR Color Space and Representations

Images can be scene-referred (data encodes scene intensities) or output-referred (data encodes display intensities). Since human visual abilities are (pretty much) known, and future display technologies are mostly unknown, then scene-referred images are more useful for long-term archival. Output-referred images are useful in the short term, for a specific class of display technology. Human perceptual abilities can be used to guide color space encoding of scene-referred images.

The human visual system is sensitive to luminance values over a range of about 1:10¹⁴, but not in a single image. The human simultaneous range is about 1:10,000. The range of sRGB displays is about 1:100.

The HDR imaging approach is to render or capture floating-point data in a color space that can store the entire perceivable gamut. Post-processing is done in the extended color space, and tone mapping is applied for each specific display. This is the method adopted in the Academy Color Encoding Specification (ACES) used for digital cinema. Manipulation of HDR data is much preferred because then you can adjust exposure and do other types of image manipulation with good results.

HDR imaging isn’t new – black & white color film can hold at least 4 orders of magnitude, and the final print has much less. Much of the talent of photographers like Ansel Adams was darkroom technique – “dodging” and “burning” to bring out the dynamic range of the scene on paper. The digital darkroom provides new challenges and opportunities.

Camera RAW is not HDR; the number of bits available is insufficient to encode HDR data. A comparison of several formats which are capable of encoding HDR follows (using various metrics, including error on an “acid test” image covering the entire visible gamut over a 1:10⁸ dynamic range).

Radiance RBGE & XYZE: a simple format (three 8-bit mantissas and one 8-bit shared exponent) with open source libraries. Supports lossless (RLE) compression (20% average compression ratio). However, does not cover visible gamut, the large dynamic range comes at the expense of accuracy, and the color quantization is not perceptually uniform. RGBE had visible error on the “acid test” image, XYZE performed much better but still had some barely perceptible error.
IEEE 96-bit TIFF (IEEE 32-bit float for each channel) is the most accurate representation, but the files are enormous (even with compression – 32-bit IEEE floats don’t compress very well).
16-bit per channel TIFF (RGB48) is supported by Photoshop and the TIFF libraries including libTIFF. 16 bits each of gamma-compressed R G and B; LZW lossless compression is available. However, does not cover the visible gamut, and most applications interpret the maximum as “white”, turning it into a high-precision LDR format rather than an HDR format.
SGI 24-bit LogLuv TIFF Codec: implemented in libTIFF. 10- bit log luminance, and a 14-bit lookup into a ‘rasterized human gamut’ in CIE (u’,v’) space. It just covers the visible gamut and range, but the dynamic range doesn’t leave headroom for processing and there is no compression support. Within its dynamic range limitations, it had barely perceptible errors on the “acid test” image (but failed completely outside these limits).
SGI 32-bit LogLuv TIFF Codec: also in libTIFF. A sign bit, 16-bit log luminance, and 8 bits each for CIE (u’,v’). Supports lossless (RLE) compression (30% average compression). It had barely perceptible errors on the “acid test” image.
ILM OpenEXR Format: 16-bit float per primary (sign bit, 5-bit exponent, 10-bit mantissa). Supports alpha and multichannel images, as well as several lossless compression options (2:1 typical compression – compressed sizes are competitive with other HDR formats). Has a full-featured open-source library as well as massive support by tools and GPU hardware. The only reasonably-sized format (i.e. excluding 96-bit TIFF) which could represent the entire “acid test” image with no visible error. However, it is relatively slow to read and write. Combined with CTL (Color Transformation Language – a similar concept to ICC, but designed for HDR images), OpenEXR is the foundation of the Academy of Motion Picture Arts & Sciences’ IIF (Image Interchange Framework).
Dolby’s JPEG-HDR (one of Greg’s projects): backwards-compatible JPEG extension for HDR. A tone-mapped sRGB image is stored for use by naïve (non-HDR-aware) applications; the (monochrome) ratio between the tone-mapped luminance and the original HDR scene luminance is stored in a subband. JPEG-HDR is very compact: about 1/10 the size of the other formats. However, it only supports lossy encoding (so repeated I/O will degrade the image) and has an expensive three-pass writing process. Dolby will soon release an improved version of JPEG-HDR on a trial basis; the current version is supported by a few applications, including Photoshop (through a plugin – not natively) and Photosphere (which will be detailed later in the course).

HDR Capture and Photosphere

Standard digital cameras capture about 2 orders of magnitude in sRGB space. Using multiple exposures enables building up HDR images, as long as the scene and camera are static. In the future, HDR imaging will be built directly into camera hardware, allowing for HDR capture with some amount of motion.

Multi-exposure merge works by using a spatially-variant weighting function that depends on where the values sit within each exposure. The camera’s response function needs to be recovered as well.

The Photosphere application (available online) implements the various algorithms discussed in this section. Exposures need to be aligned – Photosphere does this by generating median threshold bitmaps (MTBs) which are constant across exposures (unlike edge maps). MTBs are generated based on a grayscale image pyramid version of the original image, alignments are propagated up the pyramid. Rotational as well as translational alignments are supported. This technique was published by Greg in a 2003 paper in the Journal of Graphics Tools.

Photosphere also automatically removes “ghosts” (caused by objects which moved between exposures) and reconstructs an estimate of the point-spread function (PSF) for glare removal.

Greg then gave a demo of new Windows version of PhotoSphere, including its HDR image browsing and cataloging abilities. It’s merging capabilities also include the unique option of outputting absolute HDR values for all pixels, if the user inputs an absolute value for a single patch (this would typically be a grey card measured by a separate device). This only needs to be done once per camera.

Image-Based Lighting

Take an HDR (bracketed exposure) image of a mirrored ball, use for lighting. Use a background plate to fill in the “pinched” stuff in the back. Render synthetic objects with the lighting and composite into the real scene, with optional addition of shadows. Greg’s description of HDR lighting capture is a bit out of date – most VFX houses no longer use mirrored balls for this (they still use them for reference), instead panoramic cameras or DSLRs with a nodal mount are typically used.

Tone-Mapping and Display

A renderer is like an ideal camera. Tone mapping is medium-specific and goal-specific. The user needs to consider display gamut, dynamic range, and surround. What do we wish to simulate – cinematic camera and film, or human visual abilities and disabilities? Possible goals include colorimetric reproduction, matching visibility, or optimizing contrast & color sensitivity.

Histogram tone-mapping is a technique that generates a histogram of log luminance for the scene, and creates a curve that redistributes luminance to fit the output range.

Greg discussed various other tone mapping methods. He mentioned a SIGGRAPH 2005 paper that used an HDR display to compare many different tone-mapping operators.

HDR Display Technologies

Silicon Light Machines Grating Light Valve (GLV) – amazing dynamic range, widest gamut, still in development. Promising for digital cinema.
Dolby Professional Reference Monitor PRM-4200. It’s a LED-based 42″ production unit based on technology that Greg worked on. He says this is extended dynamic range, but not true HDR (it goes up to 600 cd/m²).
SIM2 Solar Series HDR display: this is also based on the (licensed) Dolby tech- Greg says this is closer to what Dolby originally had in mind. It’s a 47” display with a 2,206 LED backlight that goes up to 4000 cd/m².

As an interesting example, Greg also discussed an HDR transparency (slide) viewer that he developed back in 1995 to evaluate tone mapping operators. It looks similar to a ViewMaster but uses much brighter lamps (50 Watts for each eye, necessitating a cooling fan and heat-absorbing glass) and two transparency layers – a black-and-white (blurry) “scaling” layer as well as a color (sharp) “detail” layer. Together these layers yield 1:10,000 contrast. The principles used are similar to other dual-modulator displays; the different resolution of the two layers avoids alignment problems. Sharp high-contrast edges work well despite the blurry scaling layer – scattering in the eye masks the artifacts that would otherwise result.

New displays based on RGB LED backlights have the potential to achieve not just high dynamic range but greatly expanded gamut – the new LEDs are spectrally pure and the LCD filters can select between them easily, resulting in very saturated primaries.

HDR Imaging in Cameras, Displays and Human Vision

The course was presented by Prof. Alessandro Rizzi from the Department of Information Science and Communication at the University of Milan. With John McCann, he co-authored the book “The Art and Science of HDR Imaging” on which this course is based.

HDR Issues

The imaging pipeline starts with scene radiances generated from the illumination and objects. These radiances go through a lens, a sensor in the image plane, and sensor image processing to generate a captured image. This image goes through media processing before being shown on a print or display, to generate display radiances. These go through the eye’s lens and intraocular medium, form an image on the retina, which is then processed by the vision system’s image processing to form the final reproduction appearance. Prof. Rizzi went over HDR issues relating to various stages in the pipeline.

The dynamic range issue relates to the scene radiances. Is it useful to define HDR based on a specific threshold number for the captured scene dynamic range? No. Prof. Rizzi defines HDR as “a rendition of a scene with greater dynamic range than the reproduction media”. In the case of prints, this is this is almost always the case since print media has an extremely low dynamic range. Renaissance painters were the first to successfully do HDR renditions – example paintings were shown and compared to similar photographs. The paintings were able to capture a much higher dynamic range while still appearing natural.

A table was shown of example light levels, each listed with luminance in cd/m². Note that these values are all for the case of direct observation, e.g. “sun” refers to the brightness of the sun when looking at it directly (not recommended!) as opposed to looking at a surface illuminated by the sun (that is a separate entry).

Xenon short arc: 200,000 – 5,000,000,000
Sun: 1,600,000,000
Metal halide lamp: 10,000,000 – 60,000,000
Incandescent lamp: 20,000,000 – 26,000,000
Compact fluorescent lamp: 20,000 – 70,000
Fluorescent lamp: 5,000 – 30,000
Sunlit clouds: 10,000
Candle: 7,500
Blue sky: 5,000
Preferred values for indoor lighting: 50 – 500
White paper in sun: 10,000
White paper in 500 lux illumination (typical office lighting): 100
White paper in 5 lux illumination (very dim lighting, similar to candle-light): 1

The next issue, range limits and quantization, refers to the “captured image” stage of the imaging pipeline. A common misconception is that the problem involves squeezing the entire range of intensities which the human visual system can handle, from starlight at 10^-6 cd/m² to a flashbulb at 10⁸ cd/m², into the 1-100 cd/m² range of a typical display. The fact is that the 10^-6 — 10⁸ cd/m² range is only obtainable with isolated stimuli – humans can’t perceive a range like that in a single image. Another common misconception is to think of precision and range as being linked; e.g. 8-bit framebuffers imply a 1:255 contrast. Prof. Rizzi used a “salami” metaphor – the size of the salami represents the dynamic range, and the number of slices represents the quantization. Range and precision are orthogonal.

In most cases, the scene has a larger dynamic range than the sensor does. So with non-HDR image acquisition you have to give up some dynamic range in the highlights, the shadows, or both. The “HDR idea” is to bracket multiple acquisitions with different exposures to obtain an HDR image, and then “shrink” during tone mapping. But how? Tone mapping can be general, or can take account of a specific rendering intent. Naively “squeezing” all the detail into the final image leads to the kind of unnatural “black velvet painting”-looking “HDR” images commonly found on the web.

As an example, the response of film emulsions to light can be mapped via a density-exposure curve, commonly called a Hurter-Driffield or “H&D” curve. These curves map negative density vs. log exposure. They typically show an s-shape with a straight-line section in the middle where density is proportional to log exposure, with a “toe” on the underexposed part and a “shoulder” on the overexposed part. In photography, exposure time should be adjusted so densities lie on the straight-line portion of the curve. With a single exposure, this is not possible for the entire scene – you can’t get both shadow detail and highlight detail, so in practice only midtones are captured with full detail.

History of HDR Imaging

Before the Chiaroscuro technique was introduced, it was hard to convey brightness in painting. Chiaroscuro (the use of strong contrasts between bright and dark regions) allowed artists to convey the impression of very high scene dynamic ranges despite the very low dynamic range of the actual paintings.

HDR photography dates back to the 1850s; a notable example being the photograph “Fading Away” by H.P.Robinson, which combined five exposures. In the early 20th century, C. E. K. Mees (director of research at Kodak) worked on implementing a desirable tone reproduction curve in film. Mees showed a two-negative photograph in his 1920 book as an example of desirable scene reproduction, and worked to achieve similar results with single-negative prints. Under Mees’ direction, the Kodak Research Laboratory found that an s-shaped curve produced pleasing image reproductions, and implemented it photochemically.

Ansel Adams developed the zone system around 1940 to codify a method for photographers to expose their images in such a way as to take maximum advantage of the negative and print film tone reproduction curves. Soon after, in 1941, L. A. Jones and H. R. Condit published an important study measuring the dynamic range of various real-world scenes. The range was between 27:1 and 750:1, with 160:1 being average. They also found that flare is a more important limit on camera dynamic range than the film response.

The Retinex theory of vision developed around 1967 from the observation that luminance ratios between adjacent patches are the same in the sun and the shade. While absolute luminances don’t always correspond to lightness appearance (due to spatial factors), the ratio of luminances at an edge do correspond strongly to the ratio in lightness appearance. Retinex processing starts with ratios of apparent lightness at all edges in the image and propagates these to find a global solution for the apparent lightness of all the pixels in the image. In the 1980s this research led to a prototype “Retinex camera” which was actually a slide developing device. Full-resolution digital electronics was not feasible, so a low-resolution (64×64) CCD was used to generate a “correction mask” which modulated a low-contrast photographic negative during development. This produced a final rendering of the image which was consistent with visual appearance. The intent was to incorporate this research in a Polaroid instant camera but this product never saw the light of day.

Measuring the Dynamic Range

The sensor’s dynamic range is limited but slowly getting better – Prof. Rizzi briefly went over some recent research into HDR sensor architectures.

Given limited digital sensor dynamic range, multiple exposures are needed to capture an HDR image. This can be done via sequential exposure change, or by using multiple image detectors at once.

There have been various methods developed for composing the exposures. Before Paul Debevec’s 1997 paper “Recovering High Dynamic Range Radiance Maps from Photographs”, the emphasis was on generating pleasing pictures. From 1997 on, research focused primarily on accurately measuring scene radiance values. Combined with recent work on HDR displays, this holds the potential of accurate scene reproduction.

However, veiling glare is a physical limit on HDR image acquisition and display. At acquisition time, glare is composed of various scattered light in the camera – air-glass reflections at the various lens elements, camera wall reflections, sensor surface reflections, etc. The effect of glare on the lighter regions of the image is small, but darker regions are affected much more strongly, which limits the overall contrast (dynamic range).

Prof. Rizzi described an experiment which measured the degree to which glare limits HDR acquisition, for both digital and film cameras. A test target was assembled out of Kodak Print Scale step-wedges (circles divided into 10 wedges which transmit different amounts of light, ranging from 4% to 82%) and neutral density filters to create a test target with almost 19,000:1 dynamic range. This target was photographed against different surrounds to vary the amount of glare.

In moderate-glare scenes, glare reduced the dynamic range at the sensor or film image plane to less than 1,000:1; in high-glare scenes, to less than 100:1. This limited the range that could be measured via multiple digital exposures (negative film has more dynamic range – about 10,000:1 – than the camera glare limit, so in the case of film multiple exposures were pointless).

While camera glare limits the amount of scene dynamic range that can be captured, glare in the eye limits the amount of display dynamic range which is useful to have.

Experiments were also done with observers estimating the brightness of the various sectors on the test target. There was a high degree of agreement between the observers. The perceived brightness was strongly affected by spatial factors; the brightness differences between the segments of each circle were perceived to be very large, and the differences between the individual circles were perceived to be very small. Prof. Rizzi claimed that a global tone scale cannot correctly render appearance, since spatial factors predominate.

Spatial factors also required designing a new target, so that glare could be separated from neural contrast effects. For this target, both single-layer and double-layer projected transparencies were used, allowing them to vary the dynamic range from about 500:1 to about 250,000:1 while keeping glare and surround constant.

For low-glare images (average luminance = 8% of maximum luminance), the observers could detect appearance changes over a dynamic range of a little under 1000:1. For high-glare images (average luminance = 50% max luminance), this decreased to about 200:1. Two extreme cases were also tested: with a white surround (extreme glare) the usable dynamic range was about 100:1 and with black surround (almost no glare at all) it increased to 100,000:1. The black surround case (which is not representative of the vast majority of real images) was the only one in which the high-dynamic range image had a significant advantage, and even there the visible difference only affected the shadow region – the bottom 30% of perceived brightnesses. These results indicate that dramatically increasing display dynamic range has minor effects on the perceived image; glare inside the eye limits the effect.

Separating Glare and Contrast

Glare inside the eye reduces the contrast of the image on the retina, but neural contrast increases the contrast of the visual signal going to the brain. These two effects tend to act in opposition (for example, brightening the surround of an image will increase both effects), but they vary differently with distance and do not cancel out exactly.

It is possible to estimate the retinal image based on the CIE Glare Spread Function (GSF). When doing so for the images in the experiment above, the high-glare target (where observers could identify changes over a dynamic range of 200:1) formed an image on the retina with a dynamic range of about 100:1. With white surround (usable dynamic range of 100:1) the retinal image had a dynamic range of about 25:1 and with black surround (usable dynamic range of 100,000:1) the retinal image had a dynamic range of about 3000:1. It seems that neural contrast partially compensates for the intra-ocular glare; both effects are scene dependent.

Scene Content Controls Appearance

The appearance of a pixel cannot be predicted from its intensity values – no global tone mapping operator can mimic human vision. An image dependent, local operator is needed. The human visual system performs local range compression. It is important to choose a rendering intent – reproduce the original scene radiances, scene reflectances, scene appearance, a pleasing image, etc. If the desire is to predict appearance then Retinex processing does a pretty good job in many cases.

Color in HDR

Two different data sets can be used to describe color: CMF (color matching functions – low-level sensor data) or UCS (uniform color space – high-level perceptual information).

CMF are used for color matching and metamerism preservation. They are linear transforms of cone sensitivities modified by pre-retinal absorptions. They have no spatial information, and cannot predict appearance.

UCS – for example CIEL*a*b*. Lightness (L*) is a cube root of luminance, which compresses the visible range. 99% of possible perceived lightness values fall in a 1000:1 region of scene dynamic range. This fits well with visual limitations caused by glare.

There are some discrepancies between data from appearance experiments with observers and measurements of retinal cone response.

First discrepancy: the peaks of the color-matching functions do not line up with the peaks of the cone sensitivity functions. This is addressed by including pre-retinal absorptions, which shift peak sensitivities to longer wavelengths.

Second discrepancy: retinal cones have a logarithmic response to light, but observers report a cube-root response. This is addressed by taking account of intra-ocular glare; it turns out that due to glare, a cube-root variation in light entering the eye turns into a logarithmic variation in light at the retina.

HDR Image Processing

Around 2002-2006, Robert Sobol developed a variant of Retinex which was implemented in a (discontinued) line of Hewlett-Packard cameras; the feature was marketed as “Digital Flash”. This produced very good results and could even predict certain features of well-known perceptual illusions such as “Adelson’s Checkerboard and Tower”, which were commonly thought to be evidence of cognitive effects in lightness perception.

ACE (Automatic Color Equalization) (which Prof. Rizzi worked on) and STRESS (Spatio-Temporal Retinex-inspired Envelope with Stochastic Sampling) are other examples of spatially-aware HDR image processing algorithms. Several examples were shown to demonstrate that spatially-aware (local) algorithms produce superior results to global tone mapping operators.

Prof. Rizzi described an experiment made with a “3D Mondrian” model – a physical scene with differently colored blocks, under different illumination conditions. Various HDR processing algorithms were run on captured images of the scene, and compared with observers estimations of the colors as well as a painter’s rendition (attempting to reproduce the perceptual appearance as closely as possible). The results were interesting – appearance does not appear to correlate specifically to reflectance vs. illumination, but rather to edges vs. gradients. The results appeared to support the goals of Retinex and similar algorithms.

Prof. Rizzi finished the course with some “take home” points:

HDR works well, because it preserves image information, not because it is more accurate (accurate reproduction of scene luminances is not possible in the general case).
Dynamic range acquisition is limited by glare, which cannot be removed.
Our vision system is also limited by glare, which is counteracted to some degree by neural contrast.
Accurate reproduction of scene radiance is not needed; reproduction of appearance is important and possible without reproducing the original stimulus.
Appearances are scene-dependent, not pixel-based.
Edges and gradients generate HDR appearance and color constancy.

http://www.graphics.cornell.edu/online/formats/rgbe/

2011 Color and Imaging Conference, Part I: Introduction

A few weeks ago, I attended the 2011 Color and Imaging Conference (CIC). CIC is a small conference (a little under 200 attendees) that nevertheless commands an important role in the fields of color science and digital imaging, similar to SIGGRAPH’s importance to computer graphics. CIC is co-sponsored by the Society for Imaging Science and Technology (IS&T) and the Society for Information Display (SID); it has been held annually in various US locations since 1993.

I attended this conference for the first time last year. In both years I attended, most of the conference attendees were academic color science researchers (the field appears to be dominated by a handful of institutions, most notably the color labs at the Rochester Institute of Technology and the University of Leeds), with the remainder primarily representing the R&D divisions of various camera, printer, display, and mobile phone manufacturers. There are typically also a few color experts from film companies such as Technicolor, ILM, Pixar, and Disney. I didn’t see any other game developers – I hope this will change in future years, as our industry starts paying more attention to this critical area.

Despite its modest attendance numbers, CIC boasted an impressive array of sessions, including courses, papers, short papers, and several keynotes. The content was of very high quality. The conference organizers are currently in the process of posting video of most of the conference content for free streaming and download in a variety of formats – a step which organizers of other conferences (such as SIGGRAPH) would do well to emulate.

I’ll be putting up several other posts with details of the conference content. They will be coming in rapid succession since I’m editing them down from an existing document (a report I did for work).

Do you spell these two words correctly?

We all have dumb little blind spots. As a kid, I thought “Achilles” was pronounced “a-chi-elz” and, heaven knows how, “etiquette” was somehow “eh-teak”. When you say goofy things to other people, someone eventually corrects you. However, if most of the people around you are making the same mistake (I’m sorry, “nuclear” is not pronounced “new-cue-lar”, it just ain’t so), the error never gets corrected. I’ve already mentioned the faux pas of pronouncing SIGGRAPH as “see-graph”, which seems to be popular among non-researchers (well, admittedly there’s no “correct” pronunciation on that one, it’s just that when the conference was small and mostly researchers that “sih-graph” was the way to say it. If the majority now say “see-graph”, so be it – you then identify yourself as a general attendee or a sales person and I can feel superior to you for no valid reason, thanks).

Certain spelling errors persist in computer graphics, perhaps because it’s more work to give feedback on writing mistakes. We also see others make the same mistakes and assume they’re correct. So, here are the two I believe are the most popular goofs in computer graphics (and I can attest that I used to make them myself, once upon a time):

Tesselation – that’s incorrect, it’s “tessellation”. By all rules of English, this word truly should have just one “l”: relation, violation, adulation, ululation, emulation, and on and on, they have just one “l”. The only exceptions I could find with two “l”s were “collation”, “illation” (what the heck is that?), and a word starting with “fe” (I don’t want this post to get filtered).

The word “tessellation” is derived from “tessella” (plural “tessellae”), which is a small piece of stone or glass used in a mosaic. It’s the diminutive of “tessera”, which can also mean a small tablet or block used as a ticket or token (but “tessella” is never a small ticket). Whatever. In Ionic Greek “tesseres” means “four”, so “tessella” makes sense as being a small four-sided thing. For me, knowing that “tessella” is from the ancient Greek word for a piece in a mosaic somehow helps me to catch my spelling of it – maybe it will work for you. I know that in typing “tessella” in this post I still first put a single “l” numerous times, that’s what English tells me to do.

Google test: searching on “tessellation” on Google gives 2,580,000 pages. Searching on “tesselation -tessellation”, which gives only pages with the misspelled version, gives 1,800,000 pages. It’s nice to see that the correct spelling still outnumbers the incorrect, but the race is on. That said, this sort of test is accurate to within say plus or minus say 350%. If you search on “tessellation -tesselation”, which should give a smaller number of pages (subtracting out those that I assume say “‘tesselation’ is a misspelling of ‘tessellation'” or that reference a paper with “tesselation” in the title), you get 8,450,000! How you can get more than 3 times as many pages as just searching on “tessellation” is a mystery. Finally, searching on “tessellation tesselation”, both words on the same page, gives 3,150,000 results. Makes me want to go count those pages by hand. No it doesn’t.

One other place to search is the ACM Digital Library. There are 2,973 entries with “tessellation” in them, 375 with “tesselation”. To search just computer graphics publications, GRAPHBIB is a bit clunky but will do: 89 hits for “tessellation”, 18 hits for the wrong one. Not terrible, but that’s still a solid 20% incorrect.

Frustrum – that’s incorrect, it’s “frustum” (plural “frusta”, which even looks wrong to me – I want to say “frustra”). The word means a (finite) cone or pyramid with the tip chopped off, and we use it (always) to mean the pyramidal volume in graphics. I don’t know why the extra “r” got into this word for some people (myself included). Maybe it’s because the word then sort-of rhymes with itself, the “ru” from the first part mirrored in the second. But “frustra” looks even more correct to me, no idea why. Maybe it’s that it rolls off the tongue better.

Morgan McGuire pointed this one out to me as the most common misspelling he sees. As a professor, he no doubt spends more time teaching about frusta than tessellations. Using the wildly-inaccurate Google test, there are 673,000 frustum pages and 363,000 “frustrum -frustum” pages. And, confusingly, again, 2,100,000 “frustum -frustrum” pages, more than three times as many as pages as just “frustum”. Please explain, someone. For the digital library, 1,114 vs. 53. For GRAPHBIB I was happy to see 42 hits vs. just 1 hit (“General Clipping on an Oblique Viewing Frustrum”).

So the frustum misspell looks like one that is less likely at the start and is almost gone by the time practitioners are publishing articles, vs. the tessellation misspell, which appears to have more staying power.

Addenda: Aaron Hertzmann notes that the US and Britain double their letters differently (“calliper”? That’s just unnatural, Brits). He also notes the Oxford English Dictionary says about tessellate: “(US also tesselate)”. Which actually is fine with me, except for the fact that Microsoft Word, Google’s spellchecker, and even this blog’s software flags “tesselate” as a misspelling. If only we had the equivalent of the Académie française to decide how we all should spell (on second thought, no).

Spike Hughes notes: “I think the answer for ‘frustrum’ is that it starts out like ‘frustrate’ (and indeed, seems logically related: the pyramid WANTS to go all the way to the eye point, but is frustrated by the near-plane).” This makes a lot of sense to me, and would explain why “frustra” feels even more correct. Maybe that’s the mnemonic aid, like how with “it’s” vs. “its” there’s “It’s a wise dog that knows its own fleas”. You don’t have to remember the spelling of each “its”, just remember that they differ; then knowing “it’s” is “it is” means you can derive that the possessive “its” doesn’t have an apostrophe. Or something. So maybe, “Don’t get frustrated when drawing a frustum”, remembering that they differ. Andrew Glassner offers: “There’s no rum in a frustum,” because the poor thing has the top chopped off, so all the rum we poured inside has evaporated.

Interesting holiday present

Shapeways has an amusing concept: take two headshot photos – front and side – and in a few minutes you can make a 3D version that can then be sent to a 3D printer there. The cost in the video was less than $25, plus shipping etc.

Seven Things for 10/13/2011

Fairly new book: Practical Rendering and Computation with Direct3D 11, by Jason Zink, Matt Pettineo, and Jack Hoxley, A.K.Peters/CRC Press, July 2011 (more info). It’s meant for people who already know DirectX 10 and want to learn just the new stuff. I found the first half pretty abstract; the second half was more useful, as it gives in-depth explanation of practical examples that show how the new functionality can be used.
Two nice little Moore’s Law-related articles appeared recently in The Economist. This one is about how the law looks to have legs for a number of more years, and presents a graph showing how various breakthroughs have kept the law going over the past decades. Moore himself thought the law might hold for ten years. This one talks about how computational energy efficiency is doubling every 18 months, which is great news for mobile devices.
I used to use MWSnap for screen captures, but it doesn’t work well with two monitors and it hangs at times. I finally found a replacement that does all the things I want, with a mostly-good UI: FastStone Capture. The downside is that it actually costs money ($19.95), but I’m happy to have purchased it.
Ray tracing vs. rasterization, part XIV: Gavan Woolery thinks RT is the future, DEADC0DE argues both will always have a place, and gives a deeper analysis of the strengths and weaknesses of each (though the PITA that transparency causes rasterization is not called out) – I mostly agree with his stance. Both posts have lots of followup comments.
This shows exactly how far behind we are in blogging about SIGGRAPH: find the Beyond Programmable Shading course notes here – that’s just a mere two months overdue.
Tantalizing SIGGRAPH Talk demo: KinectFusion from Microsoft Research and many others. Watch around 3:11 on for the great reconstruction, and the last minute for fun stuff. Newer demo here.
OnLive – you should check it out, it’ll take ten minutes. Sign up for a free account and visit the Arena, if nothing else: it’s like being in a sci-fi movie, with a bunch of games being played by others before your eyes that you can scroll through and click on to watch the player. I admit to being skeptical of the whole cloud-gaming idea originally, but in trying it out, it’s surprisingly fast and the video quality is not bad. Not good enough to satisfy hardcore FPS players – I’ve seen my teenage boys pick out targets that cover like two pixels, which would be invisible with OnLive – but otherwise quite usable. The “no download, no GPU upgrade, just play immediately” aspect is brilliant and lends itself extremely well to game trials.

OnLive Arena

Seven things for 10/10/11

If you can get WebGL running properly on your browser, check out Shader Toy. Coolest thing is that you can edit any shader and immediately try it out.
Another odd little WebGL application is a random spaceship maker, with a direct tie-in to Shapeways to buy a 3D version of any model you make.
Speaking of Shapeways, I liked their “one coffee cup a day project“. The low-resolution cup is particularly good for computer graphics people, though I’m told that in real life it’s a fair bit more rounded off, due to the way the ceramic sets. Ironic. Also, note that these cups are actually quite small in real life (smaller than even espresso cups), which is too bad. Still, clever.
Source code for iOS versions of Castle Wolfenstein and the original DOOM is now available.
Patrick Cozzi has a nice rundown of his days at SIGGRAPH this August, with a particular emphasis on OpenGL and mobile. The links for each day are at the bottom of the entry.
Nice fractal video generated in near-real time (300 ms/frame) running a GLSL shader using this code. Reddit thread here, about an earlier video now pulled back online.
This site gives a darn long list of educational institutions offering videogame design degrees. It’s at least a place to start, if you’re looking for such things. That said, I’ve heard counterarguments from game company professionals to such specialized degrees, “just learn to program well and we’ll teach you the videogames business”.

Bonus thing: Draw a curve of your data for a number of years and see what it most closely correlates. Peculiar.