GDC has put a bit of a hiatus in my I3D posts; I better get them done soon so I can move onto the GDC posts.
This post describes a talk that David Luebke (Director of Research at NVIDIA) gave during the I3D banquet titled GPU Computing: Past, Present, and Future. Although the slides for the I3D talk are not available, parts of this talk appear to be similar to one David gave a few months ago, which does have video and slides available.
I’ll summarize the talk here; anyone interested in more detail can view the materials linked above.
The first part of the talk (which isn’t in the earlier version) covered the “New Moore’s Law”: computers no longer get faster, just wider; must re-think algorithms to be parallel. David showed examples of several scientists which got profound speedups – from days to minutes. He covered several different techniques, I’ll summarize the most notable four:
- A “photonic fence” that zaps mosquitoes with lasers, to reduce the incidence of malaria in third world countries. This application needs fast computer vision combined with low power consumption, which was achieved by using GPUs.
- A military vehicle which detects Improvised Explosive Devices (IEDs) using computer vision techniques. The speedup afforded by using GPUs enables the vehicle to drive much faster (an obvious advantage when surrounded by hostile insurgents) while still reliably detecting IEDs.
- A method for processing CT scans that enables much reduced radiation exposure for the patient. When running on CPUs, the algorithm was impractically slow; GPUs enabled it to run fast enough to be used in practice.
- A motion compensation technique that enables surgery on a beating heart. The video of the heart is motion-compensated to appear static to the surgeon, who operates through a surgical robot that translates the surgeon’s motions into the moving frame of the heart.
David started the next part of the talk (which is very similar to the earlier version linked above) by going over the heritage of GPU computing. He did so by going over three separate historical threads: graphics hardware, supercomputing, and finally GPU Computing.
The “history of graphics hardware” section started with a brief mention of a different kind of hardware: Dürer‘s perspective machine. The history of electronic graphics hardware started with Ivan Sutherland’s SketchPad and continues through the development of the graphics pipeline by SGI: Geometry Engine (1982), RealityEngine (1993), and InfiniteReality (1997). In the early days, the graphics pipeline was an actual description of the physical hardware structure: each stage was a separate chip or board, with the data flow fixed by the routing of wires between them. Currently, the graphics pipeline is an abstraction; the stages are different threads running on a shared pool of cores, as seen in modern GPU designs such as the GeForce 8, GT200, and Fermi.
The second historical thread was the development of supercomputers. David covered the early development of three ways to build a parallel machine: SIMD (Goddard MPP, Maspar MP-1, Thinking Machines CM-1 and CM-2), hardware multithreading (Tera MTA) and symmetric multiprocessing (SGI Challenge, Sun Enterprise) before returning to Fermi as an example of a design that combines all three.
“GPU computing 1.0” was the use (or abuse) of graphics pipelines and APIs to do general-purpose computing, culminating with BrookGPU. CUDA ushered in “GPU computing 2.0” with an API designed for that purpose. The hardware supported branching and looping, and hid thread divergence from the programmer. David claimed that now GPU computing is in a “3.0” stage, supported by a full ecosystem (multiple APIs, languages, algorithms, tools, IDEs, production lines, etc.). David estimated that there are about 100,000 active GPU compute developers in the world. Currently CUDA includes features such as “GPU Direct” (direct GPU-to-GPU transfer via a unified address space), full C++ support, and a template library.
The “future” part of the talk discussed the workloads that will drive future GPUs. Besides current graphics and high performance computing workloads, David believes a new type of workload, which he calls computational graphics, will be important. In some cases this will be the use of GPU compute to improve (via better performance or flexibility) algorithms typically performed using the graphics pipeline (image histogram analysis for HDR tone mapping, depth of field, bloom, texture-space diffusion for subsurface scattering, tessellation), and in others it will be to perform algorithms for which the graphics pipeline is not well-suited: ray tracing, stochastic rasterization, or dynamic object-space ambient occlusion.
David believes that the next stage of GPU computing (“4.0”) poses challenges to APIs (such as CUDA), to researchers, and to the education community. CUDA needs to be able to elegantly express programming models beyond simple parallelism, it needs to better express locality, and the development environment needs to improve and mature. Researchers need to foster new high-level libraries, languages, and platforms, as well as rethinking their algorithms. Finally, computer science curricula need to start teaching parallel computing in the first year.