Inspired by Bing (a person, not a search engine) and by the acrobatics I saw tonight in Shanghai, time for a blog post.
So what’s up with graphics APIs? I’ve been working on a project for a fast 3D graphics system for Autodesk for about 4 years now; the base level (which hides the various flavors of DirectX and OpenGL) is used by Maya, Max, AutoCAD, Inventor, and other products. There are various higher-level optimizations we’ve added (and why Microsoft’s fxc effect compiler suddenly got a lot slower is a mystery), with some particularly nice work by one person here in the area of multithreading. Beyond these techniques, minimizing the raw number of calls to the API is the primary way to increase performance. Our rule of thumb is that you get about 1000-1500 calls a frame (CAD isn’t held to a 60 FPS rule, but we still need to be interactive). The usual tricks are to sort by state, and to shove as much geometry and processing as possible into a single draw call and so avoid the small batch problem. So, how silly is that? The best way to make your GPU run fast is to call it as little as possible? That’s an API with a problem.
This is old news, Tim Sweeney railed against API limitations 3 years ago (sadly, the article’s gone poof). I wrote about his ideas here and added my own two cents. So where are we since then? DirectX 11 has been out awhile, adding three more stages to the pipeline for efficient tessellation of higher-order surfaces. The pipeline’s feeling a bit unwieldy at this point, with a lot of (admittedly optional) stages. There are still some serious headaches for developers, like having to somehow manage to put lighting and material shading in the same pixel shader (one good argument for deferred lighting and similar techniques). Forget about optimization; the arcane API knowledge needed to get even a simple rendering on the screen is considerable.
I haven’t heard anything of a DirectX 12 in the works (except maybe this breathless posting, which I feel obligated to link to since I’m in China this month), nor can I imagine what they’d add of any significance. I expect there will be some minor XBox 72o (or whatever it will be called) -related tweaks specific to that architecture, if and when it exists. With the various CPU+GPU-on-a-chip products coming out – AMD’s Fusion family, NVIDIA’s Tegra 2, and similar from other companies (I think I counted 5, all totaled) – some access costs between the two processors become much cheaper and so change the rules. However, the API still looks to be the bottleneck.
Marketwise, and this is based entirely upon my work in scapulimancy, I see things shifting to mobile. If that isn’t at least the 247th time you’ve heard that, you haven’t been wasting enough time on the internet. But, it has some implications: first, DirectX 12 becomes mostly irrelevant. The GPU pipeline is creaky and overburdened enough right now, PC games are an important niche but not the focus, and mobile (specifically, iPad and other tablets) is fine with the functionality defined thus far by existing APIs. OpenGL ES will continue to evolve, but I doubt we’ll see for a good long while any algorithmically (vs. data-slinging) new elements added to the API that the current OpenGL 4.x and DX11 APIs don’t offer.
Basically, API development feels stalled to me, and that’s how it should be: mobile’s more important, PCs are a (large but slowly evolving) niche, and the current API system feels warped from a programming standpoint, with peculiar constructs like feeding text strings to the API to specify GPU shader effects, and strange contortions performed to avoid calling the API in order to coax the GPU to run fast.
Is there a way out? I felt a glimmer while attending HPG 2011 this year. The paper “High-Performance Software Rasterization on GPUs” by Samuli Laine and Tero Karras was one of my (and many attendees’) favorites, talking about how to efficiently implement a basic rasterizer using CUDA (code’s open sourced). It’s not as fast as dedicated hardware (no surprise there), but it’s at least in the same ball-park, with hardware being anywhere from 1.5x to 8.1x faster for their test cases, median being 3.6x. What I find exciting is the idea that you could actually program the pipeline, vs. it being locked away. They discuss ideas for optimization such as loosening the “first in, first out” rule for triangles currently enforced by all APIs. With its “yet another language” dependency, I can’t say I hope GPGPU is the future (and certainly CUDA isn’t, since it cuts out non-NVIDIA hardware vendors, but from all reports it’s currently the best way to experiment with GPGPU). Still, it’s nice to see that the fixed-function bits of the GPU, while important, are not an insurmountable limit in considering more flexible and general interactive rasterization programming models. Or, ray tracing – always have to stick that in there.
So it’s “forward to the past”, looking at traditional algorithms like rasterization and ray tracing and how to gain efficiency (both in raw speed and in development time) on various modern architectures. That’s ultimately what it’s about for me, at least: spending lots of time fighting the API, gluing together strings to make shaders, and all the other craziness is a distraction and a time-waster. That said, there’s a cost/benefit calculation implicit in all of this. For example, using C# or Java is way more productive than C++, I’d say about 2x, mostly because you’re not tracking down memory problems like leaks and access uninitialized or non-existent values. But, there’s so much legacy C++ code around that it’s still the language of graphics, as previously discussed here. Which means I expect none of the API weirdness to change for a solid decade, at the minimum. Please do go ahead and prove me wrong – I’d be thrilled!
Oh, and acrobatics? Hover your cursor over the image. BTW, the ERA show in Shanghai is wonderful, unlike current APIs.