High Performance Graphics 2017 Call for Participation

The High-Performance Graphics 2017 conference call for participation is here.

Summary: deadline for papers is Friday April 21st. Conference itself is Friday-Sunday, July 28-30, colocated with SIGGRAPH in Los Angeles.

For me, this is one of the two great conferences each year for interactive rendering related papers (SIGGRAPH’s papers selection, for whatever reasons, seems to have mostly moved on to other things).

Moving Graphics Research into Development

guest post by Patrick Cozzi, @pjcozzi

[This is a eat your veggies/floss your teeth type of article. Nothing revolutionary; rather, good advice if you don’t already know it, and better advice if you do and need a reminder or may have missed a trick. My only “eat with your mouth closed” addition would be, “add comments as you go,” mainly because I’m wading through someone’s poorly-commented code this week. Your bonus payoff for reading this article through is seeing some nice visualizations at the end. – Eric]

The Penn graphics students I work with on MS thesis, senior design, and GPU course projects and my colleagues working on Cesium are all implementing fairly recent graphics research.

This article presents tips for implementing research that I have learned through hands-on development and through mentoring students and practitioners. There seems to be a huge difference in productivity depending how how we navigate papers and how we approach implementing them.

Implementation is a great way to generate new ideas, but this article is not specifically about generating new research; it is about utilizing existing research to solve a particular problem.

Finding Papers

A quick Google search usually provides prominent papers. I also check Ke-Sen Huang’s website, which has papers from SIGGRAPH, I3D, Eurographics, and several other conferences.

Once you have a good paper, finding more is easy:

Follow the references backwards to the seminal work.
Go to each author’s website and institute’s website and check their publications. For example, for point clouds, I like the work by Enrico Gobbetti at CRS4.
Search for the paper on Google Scholar and trace the most prominent papers that cite it. Google Scholar is also useful for searching papers published in the past n years, which is great for culling old papers, e.g., CLOD terrain algorithms that are no longer appropriate for today’s GPUs.
Ask for recommended papers on twitter, seriously.

Quickly identifying and avoiding irrelevant papers is key to staying focused in the right direction.

How to Read a Paper

Skim it first

Assuming I have some understanding of the topic, it takes me about three hours to review an eight-page paper submission for a conference or journal.

When I’m not reviewing for a committee, and instead looking for papers on a particular topic, I don’t read a paper that carefully on the first pass. When I first started reading papers, I spent too much time reading papers that were tangential. This lead to a lot of wasted time going down dead-end paths.

Instead, I suggest reviewing the figures and reading the Abstract, Introduction, and Results sections before dedicating time to a complete read. Also check out the video, demo, and source code if available. You may quickly find that the approach won’t work for you because, for example, it is not fast enough for real-time, relies on features not supported by your target graphics API, relies on an expensive preprocess step, etc. With that said, reading related though tangential papers, if you have the time, still generates potentially useful ideas.

Understand the previous work

If the paper appears relevant, but I don’t have the background to fully understand it, I try to find the seminal work reference in the Previous Work section and read it. Google Scholar can help here since it will report how many times a paper was cited, a useful measure but not ground truth. If you follow the preview work far enough, you may end up with a paper written in the 1970s or 80s, which are fun to read for their simplicity (by today’s standards) and influence. For example, enjoy Particle Systems – A Technique for Modeling a Class of Fuzzy Objects, 1983, by William Reeves.

Survey papers and the Previous Work chapters in PhD/MS theses are also great places to look for background. They distill down each relevant paper to its essence and give a framework for the subject. For example, A Developer’s Survey of Polygonal Simplification Algorithms (2001, David Luebke) and Technical Strategies for Massive Model Visualization (2008, Enrico Gobbetti, Dave Kasik, and Sung eui Yoon) lead to the bulk of the work I read for my MS thesis.

Iterate

Once I’ve found a paper that I think I want to implement, I often need to read it – or at least parts of it – multiple times to gain a solid understanding.

I interleave reading with implementation. Reading deepens my understanding to help me code, and coding deepens my understanding to help me read.

If you have the luxury of no other outside work, you might start the morning coding without even checking email, then check email after lunch, and then spend the afternoon reading so you have fresh ideas for coding the next morning. You’ll quickly have more ideas than time. Choose carefully and keep a record of those not yet examined. Often, when I go back at look at my notes, I am happy that I didn’t spend time on many of the ideas that, in retrospect, would not have been as impactful.

Reach out

Paper authors are often easily accessible via email or twitter. Ask them a specific question that shows you’ve done your research, and they are likely to reply. After all, they are interested in the same topic as you. They know their work very well; one time, an author found a bug in our translucency implementation just by looking at a screenshot!

How to implement research

The following advice applies to coding in general, but I think it is particularly relevant to implementing graphics research with non-trivial data structures and algorithms.

Start small and iterate

Don’t implement the whole paper at once. Implement the smallest useful – or even not so useful – feature, verify that it works, and build on it, verifying the results each step of the way. Get something working first, then make it fast and robust.

Verify, verify, verify. Double check the code flow in the debugger, measure the performance early, and test with simple scenarios before complex ones. When the students in our GPU course implement a rasterizer, they start with a triangle model, then a box, and then the COLLADA duck.

Implementing an out-of-core spatial data structure? Start with an in-core one. Implementing a complex GPU algorithm? Perhaps starting with a CPU implementation first is useful and gives us something to benchmark against.

As the code starts to stabilize, add unit tests. For this type of work, I don’t add unit tests too soon since they would break often.

Report statistics

At the start, take the time to add code to report key statistics about the algorithm.

For example, in the out-of-core spatial data structures we use for streaming massive 3D models, we track the number of nodes in memory, nodes visited, nodes rendered, number of pending network requests, number of received requests that are processing, etc.

Watching these stats gives us a very quick indicator if things are working properly. When I wrote the cache replacement algorithm to unload nodes from memory, I first added the relevant stats reporting so I could monitor them during development. I also started with a super-simple test case with a cache size of 1 or 2 tiles.

Test parameters

Also at the start, make it simple to tune key parameters. If your using JavaScript, dat.GUI makes it really easy to map a UI to variables.

Tuning parameters is great for understanding an algorithm, testing our implementation’s robustness, and performance testing, e.g., quickly seeing how changing the number of dynamic lights impacts a deferred shading engine.

[Note: the full-sized images can be downloaded from the article’s repo. – Eric]

Renderer with debug options to turn on/off different parts of the pipeline and debug views.

Visualize everything

As graphics developers, we love to see the results of our code. Debugging aids that visualize results are just as enjoyable, and can yield deeper insights and intuition. Some examples:

bounding volumes
wireframe
g-buffers in a deferred shader
tiles in a tile-based deferred shader
freeze frame to review culling results
shadow maps, including cascades

A couple of examples:

Left: Grand Canyon. Right: Wireframe showing skirts used to avoid cracks between tiles, how high frequency areas are more finely triangulated, and some sense of overdraw.

Left: View of Crater Lake (186 draw calls). Right: Freeze frame viewing tiles with their tile coordinates from a different perspective. Images from Graphics Tech in Cesium – The Graphics Stack.

Sometimes a graphics API debugging tool such as Renderdoc or WebGL Inspector is enough to review buffers, textures, shaders, etc. I also find engine-specific tools useful since they are higher-level, e.g., they may color objects based on a shadow-map cascade, whereas a graphics API tool may just show the shadow-map textures. Time spent on and using tools always pays for itself in fewer bugs, deeper performance insights, and creating screenshots for documentation and even twitter.

Debug visualizations are useful because when we can visualize something, an insight often becomes obvious. For example, look at how bad bounding spheres for Cesium’s terrain tiles are compared to oriented bounding boxes.

A visualization gives us an immediate sense, then our stats reporting gives precise numbers. For example, note how the visualization below complements the statistics for using fog to optimize terrain rendering by culling tiles in the far distant and increasing the geometric error for tiles in the mid-distance.

Write

I thought I knew a lot about virtual globe rendering until I tried to coauthor a book about it; 520 pages later, I knew the topic much better and had lots of new ideas. Whether it is a blog post, paper, or entire book, writing deepens our understanding and helps us generate new ideas. It also helps the field move forward as we build on each other’s work.

Real-World Sampling “Artifact”

[A repost, due to WordPress weirdness – sorry about that. Note to self: don’t paste images into WordPress, always upload and insert them.]

I’m seeing this more and more in my neighborhood in the evening:

shadows
It’s the shadow of a tree on pavement, superimposed 3 times. It’s because they’ve been installing new LED streetlights with 3 bulbs.

Hard for me to photograph the light source well, but a reflection of some sort in the camera shows the three bulbs in the upper right:

light

It’s like the artifacts you see when anyone tries to approximate an area light with point lights. So with the advent of LEDs, I guess we won’t need light area sampling algorithms as much?

Maybe one for the Real Artifacts gallery.

WebGL 2: New Features

by Shuai Shao
github repo for article here

Last time we showed how to deal with issues porting a WebGL 1 engine to WebGL 2. In this article, we will talk about what new features come with WebGL 2 and what cool things can we do with them.

New features

Multisampled Renderbuffers

Previously, if we wanted antialiasing we would either have to render it to the default backbuffer or perform our own post-process AA (such as FXAA or SMAA) on content rendered to a texture.

Now, with Multisampled Renderbuffers, we can use the general rendering pipeline in WebGL to provide multisampled antialiasing (MSAA):

pre-z pass –> rendering pass to FBO –> postprocessing pass –> render to window

renderbufferStorageMultisample is the relevant function here.

var colorRenderbuffer = gl.createRenderbuffer();
gl.bindRenderbuffer(gl.RENDERBUFFER, colorRenderbuffer);
gl.renderbufferStorageMultisample(gl.RENDERBUFFER, 4, gl.RGBA8, FRAMEBUFFER_SIZE.x, FRAMEBUFFER_SIZE.y);

gl.bindFramebuffer(gl.FRAMEBUFFER, framebuffers[FRAMEBUFFER.RENDERBUFFER]);
gl.framebufferRenderbuffer(gl.FRAMEBUFFER, gl.COLOR_ATTACHMENT0, gl.RENDERBUFFER, colorRenderbuffer);
gl.bindFramebuffer(gl.FRAMEBUFFER, null);

Pay attention to the fact that the multisample renderbuffers cannot be directly bound to textures, but they can be resolved to single-sample textures using the blitFramebuffer call. This is a new feature in WebGL 2 as well, and is used like this:

var texture = gl.createTexture();
gl.bindTexture(gl.TEXTURE_2D, texture);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MAG_FILTER, gl.NEAREST);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MIN_FILTER, gl.NEAREST);
gl.texImage2D(gl.TEXTURE_2D, 0, gl.RGBA, FRAMEBUFFER_SIZE.x, FRAMEBUFFER_SIZE.y, 0, gl.RGBA, gl.UNSIGNED_BYTE, null);
gl.bindTexture(gl.TEXTURE_2D, null);

gl.bindFramebuffer(gl.FRAMEBUFFER, framebuffers[FRAMEBUFFER.COLORBUFFER]);
gl.framebufferTexture2D(gl.FRAMEBUFFER, gl.COLOR_ATTACHMENT0, gl.TEXTURE_2D, texture, 0);
gl.bindFramebuffer(gl.FRAMEBUFFER, null);

// ...

// After drawing to the multisampled renderbuffers
gl.bindFramebuffer(gl.READ_FRAMEBUFFER, framebuffers[FRAMEBUFFER.RENDERBUFFER]);
gl.bindFramebuffer(gl.DRAW_FRAMEBUFFER, framebuffers[FRAMEBUFFER.COLORBUFFER]);
gl.clearBufferfv(gl.COLOR, 0, [0.0, 0.0, 0.0, 1.0]);
gl.blitFramebuffer(
    0, 0, FRAMEBUFFER_SIZE.x, FRAMEBUFFER_SIZE.y,
    0, 0, FRAMEBUFFER_SIZE.x, FRAMEBUFFER_SIZE.y,
    gl.COLOR_BUFFER_BIT, gl.NEAREST
);

3D Texture

The first thing that comes to mind with 3D textures is volumetric effects, such as fire, smoke, light rays, realistic fog, etc. Now we can bring these features into our WebGL engine. In addition, 3D textures can be used to store medical data such as MRI and CT scans, and are useful when implementing cross-sectioning. 3D textures can also improve performance by using them to cache light for real-time global illumination.

WebGL 2 support for 3D textures is as good as that for 2D textures. We have fast access speed and native tri-linear interpolation.

The code for setting up a 3D texture usually has a 2D texture counterpart.

Texture 2D	Texture 3D
`texImage2D`	`texImage3D`
`texSubImage2D`	`texSubImage3D`
`copyTexImage2D`	`copyTexImage3D`
`compressedTexImage2D`	`compressedTexImage3D`
`compressedTexSubImage2D`	`compressedTexSubImage3D`
`texStorage2D`	`texStorage3D`

There are certain elements that do not match exactly. For example, since we have one more dimension, we will have depth, zoffset, and TEXTURE_WRAP_T for 3D textures. Also, the internal format and type combinations are not 100% matched.

The sampler used in shaders is sampler3D instead of sampler2D.

Here’s an example setup:

var texture = gl.createTexture();
gl.activeTexture(gl.TEXTURE0);
gl.bindTexture(gl.TEXTURE_3D, texture);
gl.texParameteri(gl.TEXTURE_3D, gl.TEXTURE_BASE_LEVEL, 0);
gl.texParameteri(gl.TEXTURE_3D, gl.TEXTURE_MAX_LEVEL, Math.log2(SIZE));
gl.texParameteri(gl.TEXTURE_3D, gl.TEXTURE_MIN_FILTER, gl.LINEAR_MIPMAP_LINEAR);
gl.texParameteri(gl.TEXTURE_3D, gl.TEXTURE_MAG_FILTER, gl.LINEAR);

gl.texImage3D(
    gl.TEXTURE_3D,  // target
    0,              // level
    gl.R8,        // internalformat
    SIZE,           // width
    SIZE,           // height
    SIZE,           // depth
    0,              // border
    gl.RED,         // format
    gl.UNSIGNED_BYTE,       // type
    data            // pixel
    );

gl.generateMipmap(gl.TEXTURE_3D);

Last but not the least, the 2D Texture Array concept is available with the 3D Texture feature. That is, multiple 2D textures can be stored in an array that can be accessed. It has its own sampler: sampler2DArray, but it shares the texImage3D GL functions. Here’s an example call:

gl.texImage3D(
    gl.TEXTURE_2D_ARRAY,
    0,
    gl.RGBA,
    IMAGE_SIZE.width,
    IMAGE_SIZE.height,
    NUM_IMAGES,
    0,
    gl.RGBA,
    gl.UNSIGNED_BYTE,
    pixels
);

Uniform Buffer

Setting uniforms for shaders is often a considerable amount of the time spent by an engine. Take the Cesium Globe as an example. For regular draw calls, uniform4fv is within the top 5 GL functions taking the most execution time. Also, the sum of all uniform[i]fv and uniformMatrix[i]fv calls is nearly 2.5% of all execution time. That’s quite a large percentage. We always have to call them to update uniform values each frame. What’s more, it can be annoying that we have to make duplicated uniform calls for one same uniform object shared by several shaders.

Now the Uniform buffer object may bring us a boost in performance by allowing us to store blocks of uniforms in buffers stored on the GPU, just like vertex/index buffers. This can make switching between sets of uniforms faster. Additionally, uniform buffers can be shared by multiple programs at the same time.

Those are quite a few benefits. But, with so many improvements, the setup routine is about to change a lot. We will have a basic setup example first, and then look at something that might need your attention.

var uniformPerSceneLocation = gl.getUniformBlockIndex(program, 'PerScene');
gl.uniformBlockBinding(program, uniformPerSceneLocation, 2);
//...
var material = new Float32Array([
    0.1, 0.0, 0.0,  0.0,
    0.0, 0.5, 0.0,  0.0,
    0.0, 0.0, 0.5,  0.0,
    128.0, 0.0, 0.0, 0.0
]);
var uniformPerSceneBuffer = gl.createBuffer();
gl.bindBuffer(gl.UNIFORM_BUFFER, uniformPerSceneBuffer);
gl.bufferData(gl.UNIFORM_BUFFER, material, gl.STATIC_DRAW);
gl.bindBuffer(gl.UNIFORM_BUFFER, null);
//...
// Render
gl.bindBufferBase(gl.UNIFORM_BUFFER, 2, uniformPerSceneBuffer);

The first thing that may confuse you is the layout standard (we will focus on std140 here). You can always find the details inOpenGL ES 3.00 Spec Page 68.

One thing that I really want you to notice is:

when the data member is a three-component vector with components consuming N basic machine units, the base alignment is 4N

And:

If the member is a structure, the base alignment of the structure is N, where N is the largest base alignment value of any of its members, and rounded up to the base alignment of a vec4.

Here’s an example:

struct Material
{
    vec3 ambient;
    vec3 diffuse;
    vec3 specular;
    float shininess;
};

uniform PerScene
{
    Material material;
} u_perScene;

var material = new Float32Array([
    0.1, 0.0, 0.0,  0.0,
    0.0, 0.5, 0.0,  0.0,
    0.0, 0.0, 0.5,  0.0,
    128.0, 0.0, 0.0, 0.0
]);

Here we have vec3, so our data should add a 0 for each vec3 for alignment. Also, since we use a struct to wrap our data, it should be rounded to a multiple of vec4. That’s why we have 3 extra zeroes after the float shininess.

Another concern is about updating the uniform block. There are several different approaches that can get us there. However, their performance can vary. It’s pretty tricky to make the most of our uniform block.

Here are some detailed discussions on Stack Overflow and gamedev.net.

But, basically, we can use gl.bufferSubData to copy the updated typedArray into the uniform buffers.

Sync Objects

Sync objects can be used to synchronize execution between the GL server and the client, which gives you more control over GPU by letting you set a fence to inform the GPU to wait until a set of GL operations have finished. Sync objects are more efficient than gl.finish.

We can get more accurate benchmarks with sync objects. In addition, applications such as image manipulation, where data of each frame comes from the CPU, will benefit from this degree of control.

Query Objects

This operation is very useful when we want to do occlusion testing. We can know how many geometries are actually drawn by performing a gl.ANY_SAMPLES_PASSED query around a set of draw calls. We can use these queries and so get rid of specialized picking method code.

Keep in mind that these queries are asynchronous. A query’s result is never available in the same frame that the query is issued. This is different from OpenGL ES 3 where query result may be available in the same frame. It’s an application portability concern.

gl.beginQuery(gl.ANY_SAMPLES_PASSED, query);
gl.drawArraysInstanced(gl.TRIANGLES, 0, 3, 2);
gl.endQuery(gl.ANY_SAMPLES_PASSED);
//...
(function tick() {
    if (!gl.getQueryParameter(query, gl.QUERY_RESULT_AVAILABLE)) {
        // A query's result is never available in the same frame
        // the query was issued.  Try in the next frame.
        requestAnimationFrame(tick);
        return;
    }

    var samplesPassed = gl.getQueryParameter(query, gl.QUERY_RESULT);
    gl.deleteQuery(query);
})();

Sampler Objects

In WebGL 1 texture image data and sampling information (which tells GPU how to read the image data) are both stored in texture objects. It can be annoying when we want to read from the same texture twice but with a different method (say, linear filtering vs nearest filtering) because we need to have two texture objects. With sampler objects, we can separate these two concepts. We can have one texture object and two different sampler objects. This will result in a change in how our engine organize textures.

Here’s an example:

var samplerA = gl.createSampler();
gl.samplerParameteri(samplerA, gl.TEXTURE_MIN_FILTER, gl.NEAREST_MIPMAP_NEAREST);
gl.samplerParameteri(samplerA, gl.TEXTURE_MAG_FILTER, gl.NEAREST);
gl.samplerParameteri(samplerA, gl.TEXTURE_WRAP_S, gl.CLAMP_TO_EDGE);
gl.samplerParameteri(samplerA, gl.TEXTURE_WRAP_T, gl.CLAMP_TO_EDGE);

var samplerB = gl.createSampler();
gl.samplerParameteri(samplerB, gl.TEXTURE_MIN_FILTER, gl.LINEAR_MIPMAP_LINEAR);
gl.samplerParameteri(samplerB, gl.TEXTURE_MAG_FILTER, gl.LINEAR);
gl.samplerParameteri(samplerB, gl.TEXTURE_WRAP_S, gl.MIRRORED_REPEAT);
gl.samplerParameteri(samplerB, gl.TEXTURE_WRAP_T, gl.MIRRORED_REPEAT);

// ...

gl.activeTexture(gl.TEXTURE0);
gl.bindTexture(gl.TEXTURE_2D, texture);
gl.bindSampler(0, samplerA);

gl.activeTexture(gl.TEXTURE1);
gl.bindTexture(gl.TEXTURE_2D, texture);
gl.bindSampler(1, samplerB);

Transform Feedback

Transform feedback allows the output of the vertex shader to be captured in a buffer object. This is useful for particle systems and simulation that perform on the GPU without any CPU intervention.

In WebGL 1, when we want to implement such feature, usually a texture storing the states of particles is inevitable. Two textures, to be precise, storing states from previous frame and current frame, and ping-pong between them.

Here’s an example of the WebGL 1 approach (from toji’s WebGL Particles take 2).

In the first-pass fragment shader, do the simulation, and store the position results in a texture.

// First pass - Fragment Shader
uniform sampler2D tPositions;
varying vec2 vUv;
// ...
vec4 runSimulation(vec4 pos) {
    // simulation
    // ...
    return pos;
}

void main() {
    vec4 pos = texture2D( tPositions, vUv );
    //...
    pos = runSimulation(pos);

    // Write new position out
    gl_FragColor = pos;
}

And then we use this position texture as an input for our second pass vertex shader.

// Second pass - Vertex Shader
attribute vec3 position;
uniform float pointSize;
uniform sampler2D map;
varying vec2 vUv;

//...

void main() {
    vUv = position.xy + vec2( 0.5 / width, 0.5 / height );
    vec3 color = texture2D( map, vUv ).rgb;
    gl_PointSize = pointSize;
    gl_Position = projectionMatrix * modelViewMatrix * vec4( color, 1.0);
}

What follows is how we do it in WebGL 2. With Transform feedback, we can discard the fragment shader in step 1, as well as the texture. We write the output (position) of the vertex shader in step 1 to the vertex attribute array input of step 2. (In practice, you still need a placeholder trivial fragment shader for the first step to correctly compile the program.)

// First pass - Vertex Shader
// ...
out vec3 v_position;
void main() {
    // ...
    v_position = u_projMatrix * u_modelViewMatrix * vec4(a_position, 1.0);
}

// Second pass - Vertex Shader
in vec3 a_position;
void main() {
    gl_Position = projectionMatrix * modelViewMatrix * vec4( a_position, 1.0 );
    // ...
}

And here’s how we bind the buffers (from WebGL2SamplesPack):

var transformFeedback = gl.createTransformFeedback();
var varyings = ['v_position', /*...*/];
gl.transformFeedbackVaryings(programTransform, varyings, gl.SEPARATE_ATTRIBS);
// ...
gl.bindBuffer(gl.ARRAY_BUFFER, particleVBOs[i][Particle.POSITION]);
gl.bufferData(gl.ARRAY_BUFFER, particlePositions, gl.STREAM_COPY);
// ...
gl.bindTransformFeedback(gl.TRANSFORM_FEEDBACK, transformFeedback);
gl.bindBufferBase(gl.TRANSFORM_FEEDBACK_BUFFER, 0, particleVBOs[(currentSourceIdx + 1) % 2][Particle.POSITION]);
gl.enable(gl.RASTERIZER_DISCARD);   // we are not drawing
gl.beginTransformFeedback(gl.POINTS);

gl.drawArrays(gl.POINTS, 0, NUM_PARTICLES);

gl.endTransformFeedback();
gl.disable(gl.RASTERIZER_DISCARD);

gl.bindBufferBase(gl.TRANSFORM_FEEDBACK_BUFFER, 0, null);

A set of new texture features:

Here is a list of the new texture features in WebGL 2.

sRGB textures
Allow the application to perform gamma-correct rendering.

gl.texImage2D(
    gl.TEXTURE_2D,
    0, // Level of details
    gl.SRGB8, // Format
    gl.RGB,
    gl.UNSIGNED_BYTE, // Size of each channel
    image
);

The sRGB texture will be automatically converted to linear space when being fetched in the shader. For physically-based rendering and other operations we normally want to deal with colors in a linear space, not in display space.

Vertex texture for:
- terrain
- water
- skeleton animation
Texture LOD

The texture LOD parameter is used to determine which mipmap to fetch from; it can now be clamped. The base and maximum mipmap level can both be set as clamps. This allows mipmap streaming, i.e., loading only the mipmap levels currently needed. This is very useful for a WebGL environment, where textures are downloaded via a network.

gl.texParameterf(gl.TEXTURE_2D, gl.TEXTURE_MIN_LOD, 0.0);
gl.texParameterf(gl.TEXTURE_2D, gl.TEXTURE_MAX_LOD, 10.0);

ETC2/EAC texture compression

A mandatory supported feature, compressed textures have obvious transmission time savings.

gl.compressedTexImage2D(
    gl.TEXTURE_2D, 
    0, 
    gl.COMPRESSED_RGBA8_ETC2_EAC, 
    IMAGE_SIZE.width, 
    IMAGE_SIZE.height, 
    0, 
    pixels
);

Integer textures
Non-Power-of-Two Texture
- texturing video
- 2D Sprite
Floating point textures
- half-float: High dynamic range imaging
- full-float: Variance shadow maps soft shadow
- a feature coming together with floating point textures is floating point renderbuffer (also with multisample support).
Seamless cube map

Cube map is already available in WebGL 1. What’s new in WebGL 2 is that the cube map is seamless (and is always seamless, unlike in OpenGL where you can set it). With this feature we are free from using hacks to get rid of the artifacts near the borders.

A set of additional texture formats

textureFormats[TextureTypes.RGB] = {
    internalFormat: gl.RGB,
    format: gl.RGB,
    type: gl.UNSIGNED_BYTE
};

textureFormats[TextureTypes.RGB8] = {
    internalFormat: gl.RGB8,
    format: gl.RGB,
    type: gl.UNSIGNED_BYTE
};

textureFormats[TextureTypes.RGB16F] = {
    internalFormat: gl.RGB16F,
    format: gl.RGB,
    type: gl.HALF_FLOAT
};

textureFormats[TextureTypes.RGBA32F] = {
    internalFormat: gl.RGBA32F,
    format: gl.RGBA,
    type: gl.FLOAT
};

textureFormats[TextureTypes.R16F] = {
    internalFormat: gl.R16F,
    format: gl.RED,
    type: gl.HALF_FLOAT
};

textureFormats[TextureTypes.RG16F] = {
    internalFormat: gl.RG16F,
    format: gl.RG,
    type: gl.HALF_FLOAT
};

textureFormats[TextureTypes.RGBA] = {
    internalFormat: gl.RGBA,
    format: gl.RGBA,
    type: gl.UNSIGNED_BYTE
};

textureFormats[TextureTypes.RGB8UI] = {
    internalFormat: gl.RGB8UI,
    format: gl.RGB_INTEGER,
    type: gl.UNSIGNED_BYTE
};

textureFormats[TextureTypes.RGBA8UI] = {
    internalFormat: gl.RGBA8UI,
    format: gl.RGBA_INTEGER,
    type: gl.UNSIGNED_BYTE
};

New GLSL 3.00 ES Shader

And here comes our new shader: GLSL 3.00 ES! This new version brings in a bunch of new features that are not in GLSL 1.00. But the grammar changed at some point, so it can be quite painful converting over at the start.

Note that a shader in GLSL 1.00 is still fully supported in a WebGL 2 context. It’s only the GLSL 3.00 ES grammar that doesn’t have backwards compatibility with GLSL 1.00. Only when a #version 300 es tag is added at the top of the shaders will the GLSL 3.00 ES version be turned on.

We will quickly list here a bunch of new features and new built-in functions in GLSL 3.00 ES.

Layout qualifiers

Vertex shader inputs can now be declared with layout qualifiers to explicitly bind the location in the shader source without requiring making gl.getAttribLocation calls. These are declared like this:

#version 300 es
#define POSITION_LOCATION 0
#define TEXCOORD_LOCATION 4
// ...
layout(location = POSITION_LOCATION) in vec2 position;
layout(location = TEXCOORD_LOCATION) in vec2 texcoord;

var vertexPosLocation = 0; // set with GLSL layout qualifier
gl.enableVertexAttribArray(vertexPosLocation);
gl.bindBuffer(gl.ARRAY_BUFFER, vertexPosBuffer);
gl.vertexAttribPointer(vertexPosLocation, 2, gl.FLOAT, false, 0, 0);
gl.bindBuffer(gl.ARRAY_BUFFER, null);

var vertexTexLocation = 4; // set with GLSL layout qualifier
gl.enableVertexAttribArray(vertexTexLocation);
gl.bindBuffer(gl.ARRAY_BUFFER, vertexTexBuffer);
gl.vertexAttribPointer(vertexTexLocation, 2, gl.FLOAT, false, 0, 0);
gl.bindBuffer(gl.ARRAY_BUFFER, null);

The same applies to fragment shader outputs. Layout qualifiers can also be used to control the memory layout for uniform blocks.

Non-square matrix

Quite straightforward. One use case is replacing a 4×4 affine matrix where the last row is (0, 0, 0, 1) with a 4×3 matrix.

Full integer support

Built-in functions can now take an integer as an input variable.

Flat/smooth interpolators

We can now explicitly declare flat interpolators to have flat shading.

Centroid sampling

This is used to avoid rendering artifacts when multisampling. Read this article for more details. Here is a WebGL 2 Sample of centroid sampling.

New built-in functions

Some very handy functions such as textureOffset, texelFetch, dFdx, textureGrad, textureLOD, etc. You can always find the complete lists in the GLSL 3.00 ES Spec

gl_InstanceID and gl_VertexID

These allow identification of instances and vertices within the shader.

Other function name changes

For example, texture2D(sampler2D sampler, vec2 coord), textureCube(samplerCube sampler, vec3 coord), (and texture3D if there is any in our old version) are now replaced with a set of overrided functions of texture

gvec4 texture (gsampler2D sampler, vec2 P [, float bias] )
gvec4 texture (gsampler3D sampler, vec3 P [, float bias] )
gvec4 texture (gsamplerCube sampler, vec3 P [, float bias] )

Credits

Brandon Jones http://blog.tojicode.com/2013/09/whats-coming-in-webgl-20.html
Hongwei Li https://zhuanlan.zhihu.com/p/19957067?refer=webgl
WebGL 2 Spec https://www.khronos.org/registry/webgl/specs/latest/2.0/
OpenGL ES 3 Spec https://www.khronos.org/registry/gles/specs/3.0/es_spec_3.0.0.pdf
OpenGL ES 3 Programming Guide http://1.droppdf.com/files/v4voM/addison-wesley-opengl-es-3-0-programming-guide-2nd-2014.pdf
http://gamedev.stackexchange.com/questions/9668/what-are-3d-textures
http://www.gamedev.net/topic/655969-speed-gluniform-vs-uniform-buffer-objects/
Sijie Tian https://hacks.mozilla.org/2014/01/webgl-deferred-shading/
Dong Dong https://www.zhihu.com/question/49327688/answer/115691345?from=profile_answer_card
Cesium https://github.com/AnalyticalGraphicsInc/cesium
WebGL Stats http://webglstats.com/
WebGL 2 for Siggraph Asia 2015 https://docs.google.com/presentation/d/1Orx0GB0cQcYhHkYsaEcoo5js3c5-pv7ahPniIRIzzfg/edit#slide=id.gd1fc5cab2_0_8

Minecon 2016 Report

Say what? Minecon is a convention for Minecraft, so why in this blog? Well, I was invited to be on a panel about 3D printing Minecraft models, since I wrote Mineways (which gets a crazy 600 downloads a day; beats me who all these people are. I think it’s a case of 600 download it, 60 try it, 6 try it more than once, 0.6 become real users).

David Ng in the Mineways hat

It was a bit odd going to this convention, especially since it was at the Anaheim Convention Center, where I was just two months ago for SIGGRAPH. This convention is the same size as SIGGRAPH, 12,000 attendees plus panelists, staff, exhibitors, etc. One organizer said a total of 14K attended. Of course, the tickets for Minecon sold out in 6 minutes – there are over 100 million copies of Minecraft sold and a lot of fanatical users out there. It’s only two days long, it uses somewhat less (and sometimes more) of the convention center’s space, and the median age of an attendee is probably around 12. My photo album is here, note in particular this one, where just about everyone is in one very long room – that’s something I don’t see at SIGGRAPH.

Not SIGGRAPH

It’s not all just kids and pixelated blocks. There were a few good general technical talks about VR, video techniques, voxel modeling methods, etc. For example, John Carmack and others from Oculus spoke about the challenges of porting Minecraft to the Rift. Scattered throughout this presentation are some interesting bits about the user experience. Some clever ideas such as the person jumping a short distance actually gets a view that travels in a straight line, though his friends see him jumping in a parabola (parabolas upset the stomach more). Playing the ambient music actually helps stave off motion sickness for awhile, or so they think. They have a “chill out” feature that lets you leave the action and hang out in a quiet virtual room, looking at the game through a 2D window. Various other things. They spent 6 months of polishing the VR version before releasing it (despite a fair bit of pressure to get it out the door in a minimal-port form). Best/ickiest Carmack quote, “Don’t push it. We don’t need to be cleaning up sick in the demo room.” Honestly, an interesting session, with hints of the political pressure on the team. I’m impressed that they were able to take the time to polish the experience, given how an early release of the game’s port would undoubtedly help drive sales.

BlockWorks had a session on how they did their voxel modeling, with some slides of their incredible constructions. Seriously, click that link. It was interesting in that each artist tends to have a specialty – architecture, organics, mechanicals, etc. They use the free Chunky path tracer (great tool! I’ve played with it.) along with traditional renderers such as Cinema 4D. I would have liked to hear more about their custom voxel-based modeling tools, but alackaday, not much mention beyond WorldEdit and VoxelSniper. Also, they avoid using mesh voxelizers such as binvox and Qubicle. (Qubicle, by the way, looks like a nice package for All Things Voxel, including a mesh voxelizer than retains the color of the mesh.) Related, though not at Minecon, this article about RenderMan and Minecraft – a good detailed “how to” read.

You remember the Visible Human Project? I laughed when I saw this, and talked at length with its creator, Wizard Keen, aka Adam Clarke.

1:40 into The Torso

I also met some people working on Spigot, the unofficial mod platform for Minecraft. One explained to me in depth how Minecraft’s impressive terrain generation worked, as he had carefully decompiled it all in order to make a mod. Basically, there’s pass after pass of applying Perlin Noise in various ways. First the overall terrain, in a single block type. Then put biomes on. Then for each biome add the “topsoil layer” (e.g. grass and three dirt atop everything for the grasslands). Carve a bit more. Add tunnels separately. Add minerals. On and on. What was interesting to me was that there was this layered approach at all, that it wasn’t some giant single-pass function but each had its own function, e.g. add topsoil.

Other than the lead developer of Minecraft, YouTubers were the stars of the con. By some estimates, 15% of Youtube’s content is videogame related, and Minecraft dominates (GTA’s second). (Hey, before I did 12 steps for Minecraft, I made well over a hundred videos, mostly not worth watching. Here’s a reason why I love Minecraft.) The two main video editing packages used for Minecraft videos are Premier Pro & Vegas Pro. People uses FRAPS, OBS, and Action for video capture.

Tidbit: this Minecraft-related Hour of Code lesson from code.org is evidently extremely popular. Point kids at it, it uses a visual code editor (Blockly) to introduce “if” statements, loops, etc.

Entirely non-Minecraft, but I learned about it at Minecon: I’m probably the last person on Twitter to know about tweetdeck, which makes Twitter a bit more friendly.

My little talk at the panel went well, slides here. I particularly like that David Ng is using Minecraft with his students to build physical worlds.

Oh, and videos. I’d say the most amazing videos I saw were in the Rube Goldberg session; short session preview here, worth your while. The coaster rides by Nuropsych1 and others were astounding. If you want to veg out (and wait through ads), check the Dr. Who, GhostBusters, and Beetle Juice coasters, among others. Impressive builds, great lighting effects and optical illusions, lovely redstone (electronics) work, camera tricks, and it’s astounding it can all be done within Minecraft.

4:18 in to the Dr. Who coaster https://youtu.be/AKyt12Ezh3s?t=258

4:18 in with the Dr. Who coaster – click that link right now!

WebGL 2 Basics

This guest blog post is by Shuai Shao, a Masters student at UPenn under Patrick Cozzi. After hearing the announcement at SIGGRAPH, I was asking around for someone to write a “basics of WebGL 2” article and Patrick got Shuai involved. If you’re reading this any time after October 2016, see his Github repo for the latest version of this article, with any corrections folded in since then (we encourage you to contribute to it).

WebGL 2 is coming! Google Chrome just announced at SIGGRAPH 2016 that 100% of the WebGL 2 conformance suite is passing (on the first configurations).

If I have an engine that works well in WebGL 1, how do I move to WebGL 2? Things to consider:

What has to be changed?
What can be done in a better way?
What new features and functionalities can I add to my engine?

In this article we are focused on the first question. We discuss the main promoted features, which are supported by extensions in WebGL 1 that are part of the core of WebGL 2 and thus cannot be accessed in the old manner, along with some other compatibility issues.

You can find answers to the other two questions in our next article, which focuses on introducing new features.

In the future you may want some complete working sample code for reference, instead of just code snippets. WebGL 2 Samples pack is a resource you’ll find useful.

That’s enough for an intro. First of all, let’s get WebGL 2 working on your machine.

How do I start using WebGL 2?

Get a WebGL 2 Implementation (Browser)

You may have seen this before, let’s just hit the main points:

Just do it: Getting a WebGL Implementation
Check if your current browser supports it; see the WebGL 2 tab of the WebGL Report

Get a WebGL 2 Context

Programmers always try to support as many browsers as possible. So do I. On top the WebGL 1 version of getContext, we will first try to access WebGL 2. If this fails, then drop back to WebGL 1. Here’s an example dervived from the Cesium WebGL engine:

var defaultToWebgl2 = false;

var webgl2Supported = (typeof WebGL2RenderingContext !== 'undefined');
var webgl2 = false;
var gl;

if (defaultToWebgl2 && webgl2Supported) {
    gl = canvas.getContext('webgl2', webglOptions);
    if (gl) {
        webgl2 = true;
    }
}
if (!gl) {
    gl = canvas.getContext('webgl', webglOptions);
}
if (!gl) {
    throw new Error('The browser supports WebGL, but initialization failed.');
}

Promoted Features

Some of the new WebGL 2 features are already available in WebGL 1 as extensions. However, these features will be part of the core spec in WebGL 2, which means support is guaranteed. In this first blog entry we are going to focus on these promoted features, together with potential compatibility issues they may cause.

First let’s find if there’s a way to change fewest existing WebGL 1 code using the extension to make it work correctly with a WebGL 2 context.

We may find that in some cases (instancing and VAO), it’s only the function we are calling that changes from the extension version to core version, while the parameters and pipeline don’t change. We used to call fooEXT, now we simply switch to foo.

Thanks to Javascript’s neat support of function objects, one solution is that we can create a function handler at startup, assigned with either the extension version from WebGL 1 or the core version from WebGL 2. Within the rest of the code we call this function handler.

if (!webgl2) {
    vaoExt = gl.getExtension("OES_vertex_array_object");
    //...
    gl.createVertexArray = vaoExt.createVertexArrayOES;
    //...
}

Yet this method can fail when changes are made in the shader (e.g., MRT). We still need to take a close look at each of these promoted features. So now let’s take a look at how the code changes for each of them.

Multiple Render Targets

MRT is a commonly used extension for deferred rendering, OIT, single-pass picking, etc.

WebGL 1

For MRT we used the WEBGL_draw_buffers extension as a work-around to write g-buffers in a single pass. Though it is widely supported (currently 57%+ browsers, according to WebGL stats), the extension-style code isn’t as clean as WebGL 2:

var ext = gl.getExtension('WEBGL_draw_buffers');
if (!ext) {
  // ...
}

We then bind multiple textures, tx[] in the example below, to different framebuffer color attachments.

var fb = gl.createFramebuffer();
gl.bindFramebuffer(gl.FRAMEBUFFER, fb);
gl.framebufferTexture2D(gl.FRAMEBUFFER, ext.COLOR_ATTACHMENT0_WEBGL, gl.TEXTURE_2D, tx[0], 0);
gl.framebufferTexture2D(gl.FRAMEBUFFER, ext.COLOR_ATTACHMENT1_WEBGL, gl.TEXTURE_2D, tx[1], 0);
gl.framebufferTexture2D(gl.FRAMEBUFFER, ext.COLOR_ATTACHMENT2_WEBGL, gl.TEXTURE_2D, tx[2], 0);
gl.framebufferTexture2D(gl.FRAMEBUFFER, ext.COLOR_ATTACHMENT3_WEBGL, gl.TEXTURE_2D, tx[3], 0);

Next we map the color attachments to draw buffer slots that the fragment shader will write to using gl_FragData.

ext.drawBuffersWEBGL([
  ext.COLOR_ATTACHMENT0_WEBGL, // gl_FragData[0]
  ext.COLOR_ATTACHMENT1_WEBGL, // gl_FragData[1]
  ext.COLOR_ATTACHMENT2_WEBGL, // gl_FragData[2]
  ext.COLOR_ATTACHMENT3_WEBGL  // gl_FragData[3]
]);

Also, an extra flag is needed in the shader:

#extension GL_EXT_draw_buffers : require
precision highp float;
// ...
void main() {
    gl_FragData[0] = vec4( v_position.xyz, 1.0 );
    gl_FragData[1] = vec4( v_normal.xyz, 1.0 );
    gl_FragData[2] = texture2D( u_colmap, v_uv );
    gl_FragData[3] = texture2D( u_normap, v_uv );
}

WebGL 2

For MRT our code becomes neat and clean in WebGL 2.

gl.framebufferTexture2D(gl.DRAW_FRAMEBUFFER, gl.COLOR_ATTACHMENT0, gl.TEXTURE_2D, tex[0], 0);
gl.framebufferTexture2D(gl.DRAW_FRAMEBUFFER, gl.COLOR_ATTACHMENT1, gl.TEXTURE_2D, tex[1], 0);
gl.framebufferTexture2D(gl.DRAW_FRAMEBUFFER, gl.COLOR_ATTACHMENT2, gl.TEXTURE_2D, tex[2], 0);

defines an array of buffers into which outputs will be written. Draw by:

gl.drawBuffers( [gl.COLOR_ATTACHMENT0, gl.COLOR_ATTACHMENT1, gl.COLOR_ATTACHMENT2] );

Instead of mapping color attachments to the draw buffer, we directly use multiple out variables in the fragment shader. This code actually benefits from the new GLSL 3.0 ES, which we will discuss later in another blog post. However, using out itself is straightforward.

#version 300 es
precision highp float;
layout(location = 0) out vec4 gbuf_position;
layout(location = 1) out vec4 gbuf_normal;
layout(location = 2) out vec4 gbuf_colmap;
layout(location = 3) out vec4 gbuf_normap;
//...
void main()
{
    gbuf_position = vec4( v_position.xyz, 1.0 );
    gbuf_normal = vec4( v_normal.xyz, 1.0 );
    gbuf_colmap = texture2D( u_colmap, v_uv );
    gbuf_normap = texture2D( u_normap, v_uv );
}

Additionally, since Texture 2D Array is now available, we can choose to render to different layers of an array of texture 2d’s instead of separate 2d textures.

gl.framebufferTextureLayer(gl.DRAW_FRAMEBUFFER, gl.COLOR_ATTACHMENT0, texture, 0, 0);
gl.framebufferTextureLayer(gl.DRAW_FRAMEBUFFER, gl.COLOR_ATTACHMENT1, texture, 0, 1);
gl.framebufferTextureLayer(gl.DRAW_FRAMEBUFFER, gl.COLOR_ATTACHMENT2, texture, 0, 2);

Instancing

Instancing is a great performance booster for certain types of geometry, especially objects with many instances but without many vertices. Good examples are grass and fur. Instancing avoids the overhead of an individual API call per object, while minimizing memory costs by avoiding storing geometric data for each separate instance.

Instancing is exposed through the ANGLE_instanced_arrays extension in WebGL 1 (92%+ support). Now with WebGL 2 we can simply use drawArraysInstanced or drawArraysInstanced for the draw calls.

gl.drawArraysInstanced(gl.TRIANGLES, 0, 3, 2);

There is a new built-in variable (GLSL 3.0 ES) in the vertex shader called gl_InstanceID that can help with the draw instance call. For example, we can use this to assign each instance with a separate color.

// Vertex Shader
flat out int in instance
// ...
void main() {
    instance = gl_InstanceID;
}

// Fragment Shader
uniform Material {
    vec4 diffuse[NUM_MATERIALS];
} material;
flat in int instance;   // `flat` is a must for a int varying, plus we don't want the instance id to be interpolated
// ...
void main() {
    color = material.diffuse[instance % NUM_MATERIALS];
}

Vertex Array Object

VAO is very useful in terms of engine design. It allows us to store vertex array states for a set of buffers in a single, easy to manage object. It is exposed through the OES_vertex_array_object extension in WebGL 1 (89%+).

WebGL 1 with extension	WebGL 2
`createVertexArrayOES`	`createVertexArray`
`deleteVertexArrayOES`	`deleteVertexArray`
`isVertexArrayOES`	`isVertexArray`
`bindVertexArrayOES`	`bindVertexArray`

An example:

var vertexArray = gl.createVertexArray();
gl.bindVertexArray(vertexArray);

// set vertex array states
var vertexPosLocation = 0; // set with GLSL layout qualifier
gl.enableVertexAttribArray(vertexPosLocation);
gl.bindBuffer(gl.ARRAY_BUFFER, vertexPosBuffer);
gl.vertexAttribPointer(vertexPosLocation, 2, gl.FLOAT, false, 0, 0);
gl.bindBuffer(gl.ARRAY_BUFFER, null);
// ...

gl.bindVertexArray(null);

// ...

// render
gl.bindVertexArray(vertexArray);
gl.drawArrays(gl.TRIANGLES, 0, 6);

Shader Texture LOD

The Shader Texture LOD Bias control makes mipmap level control simpler for glossy environment effects in physically based rendering. This functionality is exposed through the EXT_shader_texture_lod extension in WebGL 1 (71%+).

vec4 texture2DLodEXT(sampler2D sampler, vec2 coord, float lod)

Now as part of core, the lodBias can be passed as an optional parameter to texture

gvec4 texture (gsampler2D sampler, vec2 P [, float bias] )

Fragment Depth

The fragment shader can explicitly set the depth value for the current fragment. This operation can be expensive because it can cause the early-z optimization to be disabled. However, it is needed in cases where the z-depth is modified on the fly.

This functionality is exposed through the EXT_frag_depth extension in WebGL 1 (66%+).

out float gl_FragDepth;

More details can be found in the GLSL 3.0 ES Spec.

Other compatibility issues

Look here for more information: WebGL 2 Spec Ch4.1

Credits

Brandon Jones http://blog.tojicode.com/2013/09/whats-coming-in-webgl-20.html
Hongwei Li https://zhuanlan.zhihu.com/p/19957067?refer=webgl
WebGL 2 Spec https://www.khronos.org/registry/webgl/specs/latest/2.0/
OpenGL ES 3 Spec https://www.khronos.org/registry/gles/specs/3.0/es_spec_3.0.0.pdf
OpenGL ES 3 Programming Guide http://1.droppdf.com/files/v4voM/addison-wesley-opengl-es-3-0-programming-guide-2nd-2014.pdf
http://gamedev.stackexchange.com/questions/9668/what-are-3d-textures
http://www.gamedev.net/topic/655969-speed-gluniform-vs-uniform-buffer-objects/
Sijie Tian https://hacks.mozilla.org/2014/01/webgl-deferred-shading/
Dong Dong https://www.zhihu.com/question/49327688/answer/115691345?from=profile_answer_card
Cesium https://github.com/AnalyticalGraphicsInc/cesium
WebGL Stats http://webglstats.com/
WebGL 2 for Siggraph Asia 2015 https://docs.google.com/presentation/d/1Orx0GB0cQcYhHkYsaEcoo5js3c5-pv7ahPniIRIzzfg/edit#slide=id.gd1fc5cab2_0_8
https://software.intel.com/en-us/articles/early-z-rejection-sample

Humblebrag on the third anniversary of the MOOC

I just realized today that it’s been three years since the 3D Interactive Graphics MOOC came out. It’s still chugging along, surprisingly enough, getting about 35 signups a day. Of course, completion rates are a small fraction of that – I’d like to know myself what it is. Me, I’m still answering questions on the discussion board.

It’s been a good week for positive comments from people taking the course. Who wouldn’t like reading posts such as this, “It is incredibly easy to learn and the examples are vivid and awesome. I wish the professors at my university were like you!” Which I take as more a reflection of the level of teaching at that (unnamed) university – I know there are more engaging and dynamic teachers than me. The takeaway is that videos and demonstrations that are just a link away can offer a fair bit, just as films have some advantages compared to live performances. Integrating these newer technologies into the classroom is the exciting challenge.

Another person gave praise to my short dot product explanation videos, even adding links to them on Wikipedia’s dot product page (which I just edited, removing my name). Looking at those videos now, hey, they’re pretty good! Here’s one showing how the dot product and cosine are related. Find the others here.

Remember how three years ago MOOCs were going to destroy the university system, and that everyone would get a cheap college education? The reality is that MOOCs are inexpensive (usually free) distance-learning systems for relatively well-off, educated people out of school who want to study a specific topic. You also have to be quite self-motivated to plow through a course, since the usual external motivators of a college education – getting a degree, keeping the parents happy, getting your money’s worth, and staying in school for the parties – are all missing.

I’d like to see are more graphics MOOCs beyond Ed Angel’s and mine. But the reality is that MOOCs are expensive and whether there’s a viable business plan blah blah blah.

Whatever the case, knowing that I’ve been able to help a number of people get some understanding of this great field of ours has been an unalloyed joy. Honestly, working on this course has been a lovely and lucky opportunity for me, and one of the best things I’ve done with my life.

Benchmarking tweets

I asked what others did for benchmarking in my last post. Here are the replies on Twitter in a semi-coherent edited form. If I missed any replies, I blame Twitter, whose interface is a magical maze.

First there were some FPS vs. SPF comments:

Richard Mitton: If you’re not measuring in milliseconds then you’re doing it wrong.

Christer Ericson: Yes, ms, not FPS. FPS is not a linear unit for the artists (or anyone).

Marc Olano: FPS isn’t linear. Usual definition of median averages middle 2 for even samples = also wrong. Use ms.

Morgan McGuire notes: FPS *is* a good measure if what you care about is interaction or visual smoothness. SPF is good for computational efficiency.

I replied to Richard & Christer: I’m interested in your reaction to the use of median vs. mean. FPS vs. SPF irrelevant for relative performance.

I also changed the original post to talk about milliseconds instead of frames, to avoid this facet of the discussion.

Christer Ericson: It’s important to catch the spikes, so in the context you’re talking about I would do max. Or mean+variance. Also, don’t think I’ve ever, for profiling reasons, looked at any average. You always look at a specific frame.

Timothy Lottes: I’m personally only interested in worst case ms/frame.

Cass Everitt: Agree with those that concentrate on worst times.

Eric Haines: Right, it depends what you’re looking for, e.g. don’t drop below 60 FPS. I’m mostly warning against using mean.

I added a note to the original post about tracking the max, which makes sense if you’re trying to guarantee a frame rate.

Tobias Berghoff, who benchmarks consoles:

I use min/max/med the most. Averages really only come into play when I need more digits. I spend significant amount of time below the 0.5% mark when wearing my platform tuning hat. I don’t miss trying to get sensible numbers out of PC h/w. But this also comes into play when measuring very short processes. When something only takes a couple of microseconds, you often end up oscillating between states that make the distribution multi-modal. Median won’t catch small shifts.

cupe: Stacked color-coded graph of nested timings (or a subtree of it). Usually unfiltered for analysis, avg for comparisons. Hierarchy is on the left, tooltip displays e.g. “scene/fluid/poisson”, click to restrict. Horizontal lines are milliseconds, orange line is 16.6 ms.

cupe1

E.g. click the big violet bar to see only post (and zoom in to stretch 4ms to screen):

cupe2

Javdev: We use a profiler, Adobe Scout, select multiple frames & see which code is most expensive & iterate it to prevent frame drops.

Björn Blissing: One option is to plot a histogram over the captured data. Reveals if your max/min are outliers or more common occurrences.

Michael Marcin: Try always running circular etwtrace and when frame time dips save and examine the trace.

Mikkel Gjoel: We filter in viewer. Options for all mentioned, and vsync (as that is what we are shipping).

Gjoel

Fabian Giesen: General order statistics (percentiles etc.) are good. Just a plot of frame durations over frame # is helpful, too! And simply recording all frame durations over a few seconds, sorting them and plotting that is quite handy, too. That gives you all the percentiles (and median etc.) and gives you a feel for the shape of the distribution, which matters. (I’m not very happy with single-value summaries; they lose too much information.)

Jaume Sanchez Elias: I like Chrome FPS meter: current, min, max; over time; frequency graph for each framerate

Elias

Krzysztof Narkowicz: Min, max, avg and std dev. Percentiles and med would make a nice addition, but it’s a hassle to compute them.

Anton the Mighty: I always use the standard deviation or standard error and indicated what value n sample size is. Most gfx benchs=bad. It’s usually worth also eyeballing actual data in detail because repeating patterns show either cycles or error in timers. Most recently there was something a friend had with the power manager in windows causing a cycling load on the cpu. I also visually check out timing for cpu+gpu functions across frames with apitrace etc. pretty neat.

All for now – feel free to email or tweet me with anything you want to add.

Don’t be mean

[Some on Twitter noted that I should be using milliseconds instead of FPS. This kind of misses the point, but let’s avoid distractions, here’s the article with that change. The sad part is that you then miss my hilarious joke about how I use FPS in the article, because if I used SPF you’d think I was talking about tanning. Which makes me think of another joke about rendering cows and the time it then takes to tan their hides. I’m full of great dad jokes.]

I think I’m reading “The Economist” too much, as I keep trying to come up with punny article titles. Sorry.

So, how do you measure a representative value for milliseconds per frame?

I don’t care about the mechanics, which timer call you use, etc. Just assume you successfully start timer/end timer and get some length of time in milliseconds for the frame. What do you do with these timings?

I usually see things such as an average, or a running average (average of last 20 or 50 or 100 or whatever frame times). I think this is mostly bad. As someone pointed out, almost everyone has more than the average number of legs. I find the same: in a given run there can sometimes be some frames where things noticeably slow down for whatever reason, some load on the computer. What you’re often trying to measure (as a graphics developer) is the performance of the rendering system itself, not the computer’s overall performance.

So, I currently use one of these two, or both: shortest time, or median time, over whatever set of frame times I have. Both have their uses. Shortest time is justifiable (to me, at least) because, assuming you have a very fine-grained timer, your best time is in some sense the “purest” measurement of the time a frame takes. Whatever other processes in your system are slowing down the other frames isn’t your concern. The timer doesn’t lie, you really did go that fast for one frame.

The other measure I’m OK with is the median. If your benchmarking system is going through a series of different frames (an animation or simulation is running, or the camera is orbiting, etc.), then grabbing the median frame is good. Choosing it instead of the average then doesn’t give so much weight to outliers. Better yet, graph the results and see whether the outliers are consistent.

Update: A number of game and VR developers pointed out that their major interest is maximum frame time. Makes sense: for a good experience (especially with VR) you don’t want to drop below your target of 30 FPS, 60 FPS, or 90 FPS.

My point is that the average, the mean, is not so good: often external slowdowns throw off the average enough and at random enough intervals that the average is very noisy and so, pretty useless. Taking the median, the central time of the sorted set, cuts out much of this variance, making each sample have an equal effect on the result.

Anyway, that’s where I’m at with benchmarking. What do you do? Comment here, tweet-reply, or email me at erich@acm.org and I’ll summarize.

p.s. pro tip: walk through your rendering pipeline every once in awhile, watching each step. It’s hard to really know where the time goes without doing so. I did this last week while looking at another bug and found a little logic error was causing a certain path to always do an additional post-process when it usually wasn’t needed. Free performance boost with a two-line fix! But, not something discoverable by benchmarking, because the variance is too much to notice “just” a few frames of difference.

This happens every few years. My favorite lucky find was around 15 years ago, walking through code in an established project and seeing that it was rendering twice for each time it displayed. A one-line change gave us 2x performance.

Three.js example thumbnails page

I made a page of thumbnail images of the 297 three.js examples. Here it is:

http://www.realtimerendering.com/threejs/

The three.js site used to have a page like this. I’m not sure why it disappeared, but now I don’t care, as I can more easily find demos I’ve looked at before but then forgot the names.

Bonus links: Stemkoski and Yomotsu also have useful demo pages, which used to be prominently linked from the three.js site but now are not.