WebGL 2 Basics

This guest blog post is by Shuai Shao, a Masters student at UPenn under Patrick Cozzi. After hearing the announcement at SIGGRAPH, I was asking around for someone to write a “basics of WebGL 2” article and Patrick got Shuai involved. If you’re reading this any time after October 2016, see his Github repo for the latest version of this article, with any corrections folded in since then (we encourage you to contribute to it).

WebGL 2 is coming! Google Chrome just announced at SIGGRAPH 2016 that 100% of the WebGL 2 conformance suite is passing (on the first configurations).

If I have an engine that works well in WebGL 1, how do I move to WebGL 2? Things to consider:

  • What has to be changed?
  • What can be done in a better way?
  • What new features and functionalities can I add to my engine?

In this article we are focused on the first question. We discuss the main promoted features, which are supported by extensions in WebGL 1 that are part of the core of WebGL 2 and thus cannot be accessed in the old manner, along with some other compatibility issues.

You can find answers to the other two questions in our next article, which focuses on introducing new features.

In the future you may want some complete working sample code for reference, instead of just code snippets. WebGL 2 Samples pack is a resource you’ll find useful.

That’s enough for an intro. First of all, let’s get WebGL 2 working on your machine.

How do I start using WebGL 2?

Get a WebGL 2 Implementation (Browser)

You may have seen this before, let’s just hit the main points:

Get a WebGL 2 Context

Programmers always try to support as many browsers as possible. So do I. On top the WebGL 1 version of getContext, we will first try to access WebGL 2. If this fails, then drop back to WebGL 1. Here’s an example dervived from the Cesium WebGL engine:

var defaultToWebgl2 = false;

var webgl2Supported = (typeof WebGL2RenderingContext !== 'undefined');
var webgl2 = false;
var gl;

if (defaultToWebgl2 && webgl2Supported) {
    gl = canvas.getContext('webgl2', webglOptions);
    if (gl) {
        webgl2 = true;
    }
}
if (!gl) {
    gl = canvas.getContext('webgl', webglOptions);
}
if (!gl) {
    throw new Error('The browser supports WebGL, but initialization failed.');
}

Promoted Features

Some of the new WebGL 2 features are already available in WebGL 1 as extensions. However, these features will be part of the core spec in WebGL 2, which means support is guaranteed. In this first blog entry we are going to focus on these promoted features, together with potential compatibility issues they may cause.

First let’s find if there’s a way to change fewest existing WebGL 1 code using the extension to make it work correctly with a WebGL 2 context.

We may find that in some cases (instancing and VAO), it’s only the function we are calling that changes from the extension version to core version, while the parameters and pipeline don’t change. We used to call fooEXT, now we simply switch to foo.

Thanks to Javascript’s neat support of function objects, one solution is that we can create a function handler at startup, assigned with either the extension version from WebGL 1 or the core version from WebGL 2. Within the rest of the code we call this function handler.

if (!webgl2) {
    vaoExt = gl.getExtension("OES_vertex_array_object");
    //...
    gl.createVertexArray = vaoExt.createVertexArrayOES;
    //...
}

Yet this method can fail when changes are made in the shader (e.g., MRT). We still need to take a close look at each of these promoted features. So now let’s take a look at how the code changes for each of them.

Multiple Render Targets

MRT is a commonly used extension for deferred rendering, OIT, single-pass picking, etc.

WebGL 1

For MRT we used the WEBGL_draw_buffers extension as a work-around to write g-buffers in a single pass. Though it is widely supported (currently 57%+ browsers, according to WebGL stats), the extension-style code isn’t as clean as WebGL 2:

var ext = gl.getExtension('WEBGL_draw_buffers');
if (!ext) {
  // ...
}

We then bind multiple textures, tx[] in the example below, to different framebuffer color attachments.

var fb = gl.createFramebuffer();
gl.bindFramebuffer(gl.FRAMEBUFFER, fb);
gl.framebufferTexture2D(gl.FRAMEBUFFER, ext.COLOR_ATTACHMENT0_WEBGL, gl.TEXTURE_2D, tx[0], 0);
gl.framebufferTexture2D(gl.FRAMEBUFFER, ext.COLOR_ATTACHMENT1_WEBGL, gl.TEXTURE_2D, tx[1], 0);
gl.framebufferTexture2D(gl.FRAMEBUFFER, ext.COLOR_ATTACHMENT2_WEBGL, gl.TEXTURE_2D, tx[2], 0);
gl.framebufferTexture2D(gl.FRAMEBUFFER, ext.COLOR_ATTACHMENT3_WEBGL, gl.TEXTURE_2D, tx[3], 0);

Next we map the color attachments to draw buffer slots that the fragment shader will write to using gl_FragData.

ext.drawBuffersWEBGL([
  ext.COLOR_ATTACHMENT0_WEBGL, // gl_FragData[0]
  ext.COLOR_ATTACHMENT1_WEBGL, // gl_FragData[1]
  ext.COLOR_ATTACHMENT2_WEBGL, // gl_FragData[2]
  ext.COLOR_ATTACHMENT3_WEBGL  // gl_FragData[3]
]);

Also, an extra flag is needed in the shader:

#extension GL_EXT_draw_buffers : require
precision highp float;
// ...
void main() {
    gl_FragData[0] = vec4( v_position.xyz, 1.0 );
    gl_FragData[1] = vec4( v_normal.xyz, 1.0 );
    gl_FragData[2] = texture2D( u_colmap, v_uv );
    gl_FragData[3] = texture2D( u_normap, v_uv );
}

WebGL 2

For MRT our code becomes neat and clean in WebGL 2.

gl.framebufferTexture2D(gl.DRAW_FRAMEBUFFER, gl.COLOR_ATTACHMENT0, gl.TEXTURE_2D, tex[0], 0);
gl.framebufferTexture2D(gl.DRAW_FRAMEBUFFER, gl.COLOR_ATTACHMENT1, gl.TEXTURE_2D, tex[1], 0);
gl.framebufferTexture2D(gl.DRAW_FRAMEBUFFER, gl.COLOR_ATTACHMENT2, gl.TEXTURE_2D, tex[2], 0);

defines an array of buffers into which outputs will be written. Draw by:

gl.drawBuffers( [gl.COLOR_ATTACHMENT0, gl.COLOR_ATTACHMENT1, gl.COLOR_ATTACHMENT2] );

Instead of mapping color attachments to the draw buffer, we directly use multiple out variables in the fragment shader. This code actually benefits from the new GLSL 3.0 ES, which we will discuss later in another blog post. However, using out itself is straightforward.

#version 300 es
precision highp float;
layout(location = 0) out vec4 gbuf_position;
layout(location = 1) out vec4 gbuf_normal;
layout(location = 2) out vec4 gbuf_colmap;
layout(location = 3) out vec4 gbuf_normap;
//...
void main()
{
    gbuf_position = vec4( v_position.xyz, 1.0 );
    gbuf_normal = vec4( v_normal.xyz, 1.0 );
    gbuf_colmap = texture2D( u_colmap, v_uv );
    gbuf_normap = texture2D( u_normap, v_uv );
}

Additionally, since Texture 2D Array is now available, we can choose to render to different layers of an array of texture 2d’s instead of separate 2d textures.

gl.framebufferTextureLayer(gl.DRAW_FRAMEBUFFER, gl.COLOR_ATTACHMENT0, texture, 0, 0);
gl.framebufferTextureLayer(gl.DRAW_FRAMEBUFFER, gl.COLOR_ATTACHMENT1, texture, 0, 1);
gl.framebufferTextureLayer(gl.DRAW_FRAMEBUFFER, gl.COLOR_ATTACHMENT2, texture, 0, 2);

Instancing

Instancing is a great performance booster for certain types of geometry, especially objects with many instances but without many vertices. Good examples are grass and fur. Instancing avoids the overhead of an individual API call per object, while minimizing memory costs by avoiding storing geometric data for each separate instance.

Instancing is exposed through the ANGLE_instanced_arrays extension in WebGL 1 (92%+ support). Now with WebGL 2 we can simply use drawArraysInstanced or drawArraysInstanced for the draw calls.

gl.drawArraysInstanced(gl.TRIANGLES, 0, 3, 2);

There is a new built-in variable (GLSL 3.0 ES) in the vertex shader called gl_InstanceID that can help with the draw instance call. For example, we can use this to assign each instance with a separate color.

// Vertex Shader
flat out int in instance
// ...
void main() {
    instance = gl_InstanceID;
}
// Fragment Shader
uniform Material {
    vec4 diffuse[NUM_MATERIALS];
} material;
flat in int instance;   // `flat` is a must for a int varying, plus we don't want the instance id to be interpolated
// ...
void main() {
    color = material.diffuse[instance % NUM_MATERIALS];
}

Vertex Array Object

VAO is very useful in terms of engine design. It allows us to store vertex array states for a set of buffers in a single, easy to manage object. It is exposed through the OES_vertex_array_object extension in WebGL 1 (89%+).

WebGL 1 with extension WebGL 2
createVertexArrayOES createVertexArray
deleteVertexArrayOES deleteVertexArray
isVertexArrayOES isVertexArray
bindVertexArrayOES bindVertexArray

An example:

var vertexArray = gl.createVertexArray();
gl.bindVertexArray(vertexArray);

// set vertex array states
var vertexPosLocation = 0; // set with GLSL layout qualifier
gl.enableVertexAttribArray(vertexPosLocation);
gl.bindBuffer(gl.ARRAY_BUFFER, vertexPosBuffer);
gl.vertexAttribPointer(vertexPosLocation, 2, gl.FLOAT, false, 0, 0);
gl.bindBuffer(gl.ARRAY_BUFFER, null);
// ...

gl.bindVertexArray(null);

// ...

// render
gl.bindVertexArray(vertexArray);
gl.drawArrays(gl.TRIANGLES, 0, 6);

Shader Texture LOD

The Shader Texture LOD Bias control makes mipmap level control simpler for glossy environment effects in physically based rendering. This functionality is exposed through the EXT_shader_texture_lod extension in WebGL 1 (71%+).

vec4 texture2DLodEXT(sampler2D sampler, vec2 coord, float lod)

Now as part of core, the lodBias can be passed as an optional parameter to texture

gvec4 texture (gsampler2D sampler, vec2 P [, float bias] )

Fragment Depth

The fragment shader can explicitly set the depth value for the current fragment. This operation can be expensive because it can cause the early-z optimization to be disabled. However, it is needed in cases where the z-depth is modified on the fly.

This functionality is exposed through the EXT_frag_depth extension in WebGL 1 (66%+).

out float gl_FragDepth;

More details can be found in the GLSL 3.0 ES Spec.

Other compatibility issues

Look here for more information: WebGL 2 Spec Ch4.1

Credits

Humblebrag on the third anniversary of the MOOC

I just realized today that it’s been three years since the 3D Interactive Graphics MOOC came out. It’s still chugging along, surprisingly enough, getting about 35 signups a day. Of course, completion rates are a small fraction of that – I’d like to know myself what it is. Me, I’m still answering questions on the discussion board.

It’s been a good week for positive comments from people taking the course. Who wouldn’t like reading posts such as this, “It is incredibly easy to learn and the examples are vivid and awesome. I wish the professors at my university were like you!” Which I take as more a reflection of the level of teaching at that (unnamed) university – I know there are more engaging and dynamic teachers than me. The takeaway is that videos and demonstrations that are just a link away can offer a fair bit, just as films have some advantages compared to live performances. Integrating these newer technologies into the classroom is the exciting challenge.

Another person gave praise to my short dot product explanation videos, even adding links to them on Wikipedia’s dot product page (which I just edited, removing my name). Looking at those videos now, hey, they’re pretty good! Here’s one showing how the dot product and cosine are related. Find the others here.

Remember how three years ago MOOCs were going to destroy the university system, and that everyone would get a cheap college education? The reality is that MOOCs are inexpensive (usually free) distance-learning systems for relatively well-off, educated people out of school who want to study a specific topic. You also have to be quite self-motivated to plow through a course, since the usual external motivators of a college education – getting a degree, keeping the parents happy, getting your money’s worth, and staying in school for the parties – are all missing.

I’d like to see are more graphics MOOCs beyond Ed Angel’s and mine. But the reality is that MOOCs are expensive and whether there’s a viable business plan blah blah blah.

Whatever the case, knowing that I’ve been able to help a number of people get some understanding of this great field of ours has been an unalloyed joy. Honestly, working on this course has been a lovely and lucky opportunity for me, and one of the best things I’ve done with my life.

Benchmarking tweets

I asked what others did for benchmarking in my last post. Here are the replies on Twitter in a semi-coherent edited form. If I missed any replies, I blame Twitter, whose interface is a magical maze.

First there were some FPS vs. SPF comments:

Richard Mitton: If you’re not measuring in milliseconds then you’re doing it wrong.

Christer Ericson: Yes, ms, not FPS. FPS is not a linear unit for the artists (or anyone).

Marc Olano: FPS isn’t linear. Usual definition of median averages middle 2 for even samples = also wrong. Use ms.

Morgan McGuire notes: FPS *is* a good measure if what you care about is interaction or visual smoothness. SPF is good for computational efficiency.

I replied to Richard & Christer: I’m interested in your reaction to the use of median vs. mean. FPS vs. SPF irrelevant for relative performance.

I also changed the original post to talk about milliseconds instead of frames, to avoid this facet of the discussion.

Christer Ericson: It’s important to catch the spikes, so in the context you’re talking about I would do max. Or mean+variance. Also, don’t think I’ve ever, for profiling reasons, looked at any average. You always look at a specific frame.

Timothy Lottes: I’m personally only interested in worst case ms/frame.

Cass Everitt: Agree with those that concentrate on worst times.

Eric Haines: Right, it depends what you’re looking for, e.g. don’t drop below 60 FPS. I’m mostly warning against using mean.

I added a note to the original post about tracking the max, which makes sense if you’re trying to guarantee a frame rate.

Tobias Berghoff, who benchmarks consoles:

I use min/max/med the most. Averages really only come into play when I need more digits. I spend significant amount of time below the 0.5% mark when wearing my platform tuning hat. I don’t miss trying to get sensible numbers out of PC h/w. But this also comes into play when measuring very short processes. When something only takes a couple of microseconds, you often end up oscillating between states that make the distribution multi-modal. Median won’t catch small shifts.

cupe: Stacked color-coded graph of nested timings (or a subtree of it). Usually unfiltered for analysis, avg for comparisons. Hierarchy is on the left, tooltip displays e.g. “scene/fluid/poisson”, click to restrict. Horizontal lines are milliseconds, orange line is 16.6 ms.

cupe1

E.g. click the big violet bar to see only post (and zoom in to stretch 4ms to screen):

cupe2

Javdev: We use a profiler, Adobe Scout, select multiple frames & see which code is most expensive & iterate it to prevent frame drops.

Björn Blissing: One option is to plot a histogram over the captured data. Reveals if your max/min are outliers or more common occurrences.

Michael Marcin: Try always running circular etwtrace and when frame time dips save and examine the trace.

Mikkel Gjoel: We filter in viewer. Options for all mentioned, and vsync (as that is what we are shipping).

Gjoel

Fabian Giesen: General order statistics (percentiles etc.) are good. Just a plot of frame durations over frame # is helpful, too! And simply recording all frame durations over a few seconds, sorting them and plotting that is quite handy, too. That gives you all the percentiles (and median etc.) and gives you a feel for the shape of the distribution, which matters. (I’m not very happy with single-value summaries; they lose too much information.)

Jaume Sanchez Elias: I like Chrome FPS meter: current, min, max; over time; frequency graph for each framerate

Elias

Krzysztof Narkowicz: Min, max, avg and std dev. Percentiles and med would make a nice addition, but it’s a hassle to compute them.

Anton the Mighty: I always use the standard deviation or standard error and indicated what value n sample size is. Most gfx benchs=bad. It’s usually worth also eyeballing actual data in detail because repeating patterns show either cycles or error in timers. Most recently there was something a friend had with the power manager in windows causing a cycling load on the cpu. I also visually check out timing for cpu+gpu functions across frames with apitrace etc. pretty neat.

All for now – feel free to email or tweet me with anything you want to add.

 

Don’t be mean

[Some on Twitter noted that I should be using milliseconds instead of FPS. This kind of misses the point, but let’s avoid distractions, here’s the article with that change. The sad part is that you then miss my hilarious joke about how I use FPS in the article, because if I used SPF you’d think I was talking about tanning. Which makes me think of another joke about rendering cows and the time it then takes to tan their hides. I’m full of great dad jokes.]

I think I’m reading “The Economist” too much, as I keep trying to come up with punny article titles. Sorry.

So, how do you measure a representative value for milliseconds per frame?

I don’t care about the mechanics, which timer call you use, etc. Just assume you successfully start timer/end timer and get some length of time in milliseconds for the frame. What do you do with these timings?

I usually see things such as an average, or a running average (average of last 20 or 50 or 100 or whatever frame times). I think this is mostly bad. As someone pointed out, almost everyone has more than the average number of legs. I find the same: in a given run there can sometimes be some frames where things noticeably slow down for whatever reason, some load on the computer. What you’re often trying to measure (as a graphics developer) is the performance of the rendering system itself, not the computer’s overall performance.

So, I currently use one of these two, or both: shortest time, or median time, over whatever set of frame times I have. Both have their uses. Shortest time is justifiable (to me, at least) because, assuming you have a very fine-grained timer, your best time is in some sense the “purest” measurement of the time a frame takes. Whatever other processes in your system are slowing down the other frames isn’t your concern. The timer doesn’t lie, you really did go that fast for one frame.

The other measure I’m OK with is the median. If your benchmarking system is going through a series of different frames (an animation or simulation is running, or the camera is orbiting, etc.), then grabbing the median frame is good. Choosing it instead of the average then doesn’t give so much weight to outliers. Better yet, graph the results and see whether the outliers are consistent.

Update: A number of game and VR developers pointed out that their major interest is maximum frame time. Makes sense: for a good experience (especially with VR) you don’t want to drop below your target of 30 FPS, 60 FPS, or 90 FPS.

My point is that the average, the mean, is not so good: often external slowdowns throw off the average enough and at random enough intervals that the average is very noisy and so, pretty useless. Taking the median, the central time of the sorted set, cuts out much of this variance, making each sample have an equal effect on the result.

Anyway, that’s where I’m at with benchmarking. What do you do? Comment here, tweet-reply, or email me at erich@acm.org and I’ll summarize.

p.s. pro tip: walk through your rendering pipeline every once in awhile, watching each step. It’s hard to really know where the time goes without doing so. I did this last week while looking at another bug and found a little logic error was causing a certain path to always do an additional post-process when it usually wasn’t needed. Free performance boost with a two-line fix! But, not something discoverable by benchmarking, because the variance is too much to notice “just” a few frames of difference.

This happens every few years. My favorite lucky find was around 15 years ago, walking through code in an established project and seeing that it was rendering twice for each time it displayed. A one-line change gave us 2x performance.

Three.js example thumbnails page

I made a page of thumbnail images of the 297 three.js examples. Here it is:

http://www.realtimerendering.com/threejs/

The three.js site used to have a page like this. I’m not sure why it disappeared, but now I don’t care, as I can more easily find demos I’ve looked at before but then forgot the names.

Bonus links: Stemkoski and Yomotsu also have useful demo pages, which used to be prominently linked from the three.js site but now are not.

Two ways to think about transforms

I was just answering a question for the Udacity Interactive Graphics MOOC. I had made a rather confusing lecture, much more involved and less informative that I would have liked, so today I wrote a re-do (sadly, it’s not easy to make a new video, since step 1 is “fly from Boston to San Francisco”). I’m still not thrilled with my description – what do you think? Is there a better way to talk about this subject? Anything I could improve? Surprisingly, this course still gets about 35 sign-ups a day (though I’m guessing maybe one of those actually finishes), so it’d be nice to make this lesson better.

Background: up to this point in the course I’d been showing how you typically write down transforms from right to left (OpenGL-style column-major matrices), e.g. “TR” means “rotate the object, then translate it (in world space) to some location.” In this lesson I wanted to point out that you can also read the transform order from left to right.

=================

You’re at 41 Avenue George V in Paris. Someone comes up and asks “How can I see the Arc de Triomphe?” You tell them, “Go up two blocks and then turn to the left – you can’t miss it.” Indeed, at 101 Avenue des Champs-Élysées he can see L’Arc de Triomphe.

So if you wanted to take this person and apply these two transforms, translation T (walk two blocks) and rotation R (turn about 60 degrees to the left), how would you write that out? Think about it for a minute, then scroll down for the answer. (And I like the disembodied arm to the right from Google’s street view).

2016-04-09_153558

The order is (right-to-left “application order”): TR. That is, you want to apply the rotation first, so that it doesn’t affect the translation. So you rotate the person 60 degrees to the left, then you translate him north two blocks north, which is then not affected by the rotation.

If you incorrectly used order RT, you would first translate him north two blocks, so far so good. But, as you saw in the snowman lesson, rotating after translation means the object is rotated around the origin from his present location; in this case, the person’s starting location is the origin. So performing a translation, then a rotation, would move him up two blocks north, then rotate him in a circle with a 2 block radius by 60 degrees, putting him somewhere else in the city (Rue Euler, I guess, which is a great coincidence that it’s named for a famous mathematician).

I hope you accept TR is the right order, then. But, to describe directions we definitely first said “perform T” – walk two blocks north – “then perform R” – rotate to the left 60 degrees. So we talk about directions in a left-to-right fashion. This may seem odd, as we are then describing the last transform that we apply, T, if we actually want to position the man in his environment.

The key thing here, and the point of the lesson, is that by specifying T first, we’re saying to the man, change your frame of reference to be 2 blocks north. From this new frame of reference, then rotate 60 degrees around where you’re standing, your new origin. It’s how we talk about directions. We don’t say “when you get to your final position, rotate 60 degrees left. Then, to get to your final position, walk two blocks north.”

The person walking has his own frame of reference, where he’s always the origin, and rotations are done relative to whichever way he’s facing at the time. To specify transforms when talking in these terms, an object-centric way of describing things, we describe “from left to right.” When we’re looking at the world and want to think how to make some other object take on a particular orientation and position, we tend to work from right to left, getting it oriented and them moving it into position.

However, it all depends. Moving a couch up a flight of stairs, down a hall, and next to a wall in room is a series of transforms, and again we specify them from left to right. We could also shortcut the process if we don’t care about the intermediate steps along the way. Say the couch is facing north, and we know it’ll end up facing east. We could specify the one 90 degree rotation to get it to face east, then the one XYZ translation to move it directly to its desired location – right to left order, so that the rotation doesn’t interfere with the translation.

The final effect of the transforms – a series of moves or the direct rotation and translation – have the same final effect. The point is, each way of thinking has its uses.

Seven Things for April 6, 2016

Let’s get visual. Last in the series, for now.

Seven Things for April 5, 2016

All linked out yet? Here’s more worthwhile stuff I’ve run across since last SIGGRAPH.

Seven Things for April 4, 2016

Next in the continuing series. In this episode Jaimie finds that the world is an illusion and she’s a butterfly’s dream, while Wilson works out his plumbing problems.

Seven Things for April 3, 2016

The things: