Khronos and the OpenGL ARB have made an unprecedented move by releasing two new versions of OpenGL along with two new versions of the OpenGL Shading language all simultaneously. The specs are available here - OpenGL 3.3 , GLSL 3.3, OpenGL 4.0 and GLSL 4.0. This release announced at GDC breathes new life into existing graphics hardware and also paves the way for the next generations of GPUs.
Why two core specs and two language specs at once? OpenGL 4.0 and GLSL 4.0 enables access to new hardware such as the AMD HD5000 series cards which have been shipping for 5 months. OpenGL 3.3 and GLSL 3.3 provide new features that will be accessible on a much larger current installed base. You may have noticed that the GLSL revision went from 1.5 to 3.3 and 4.0. To make things easier on developers, the Shading Language versions now match the core OpenGL versions.
OpenGL 3.3
OpenGL 3.3 adds numerous updates to OpenGL functionality to make it more usable. Occlusion queries get a new boolean mode which tells you if any samples passed. Texture lookups can be swizzled before reaching shaders. Instanced arrays allow instanced rendering to reuse attribute data on multiple vertices based on a divisor. Also new is timer functionality that allows applications to find out how long geometry takes to render.
Applications that use many textures or switch texture state frequently will be able to take advantage of the new sampler objects in OpenGL 3.3. These new objects can encapsulate traditional texture state, allowing an application to use the same texture sample state on multiple texture images or multiple sample states on the same texture image. This makes texture setup much faster and easier for applications to track.
OpenGL 4.0
OpenGL 4.0 includes all of OpenGL 3.3 plus a slew of new stuff including enhanced blending, indexed drawing from buffer objects and enhanced transform feedback functionality. It also provides access to double precision floating point data types in shaders, key for compute, design, and digital content creation where precision is critical. New texture functionality allows for advanced texture gather fetches, new texture buffer formats and cube map array textures.
OpenGL 4.0 tessellation for worskation applications
One of the biggest additions to OpenGL 4.0 is tessellation. This new feature allows an application to amplify geometry, generating tessellated geometry based on incoming vertices. Tessellation can help applications take a rough object defined by only a few vertices and generate new vertices to smooth out the object and provide more detail. Check the Stumbling Ahead blog for more info. Using tessellation can be a huge win for many workstation applications which tend to be vertex and bandwidth limited.
GL Shading Language
The GL Shading Language has also been updated with the ability to dynamically assign subroutine usage at runtime. This means you can create a GLSL program that has many different subroutines and then pick which ones are used to alter lighting, material, or other effects as each piece of geometry is rendered. This makes program management much easier and introduces previously unattainable runtime flexibility.
OpenGL 3.3 and 4.0 continue to progress 3D API standards, increasing flexibility and usability for applications. Just as important, Khronos and the OpenGL ARB continue to work on bringing you the latest and greatest access to 3D hardware.
I typically try to avoid propagating rumors of unannounced/unconfirmed products but since I have an iMac here on my desk, in need of a replacement, I was excited to read a rumor on BSN that Apple will be incorporating ATI Radeon HD 5750’s into an upcoming iMac refresh (the current Core i5/i7 iMacs use ATI Radeon HD 4850s).
What makes this particularly interesting: screaming OpenGL 3.2 support, screaming DirectX 11 support when running as a PC (or under virtualization?), and a great engine for OpenCL which is an integral part of the Mac OS.
AMD announced a fresh lineup of DirectX 11/ OpenGL 3.2 mobile GPUs at CES. The ATI Mobility Radeon HD 5870 notebook graphics processor is the top of the line, and hands down the top performing mobile chip out today. At 40nm, it should also squeeze out more battery life and dissipate less heat. Why aren’t these announced for MacBook systems yet!
Recently at SIGGRAPH 2009, Khronos and the ARB announced OpenGL 3.2 and GLSL 1.50. We have continued to increment 3D graphics capability on a 6 month schedule. OpenGL 3.2 adds a few larger pieces of functionality along with many smaller tweaks, while still being compatible with most modern installed GPUs. If you have a 1 or 2 year old GPU, chances are a driver update will bring you OpenGL 3.2.
The first major landmark in OpenGL 3.2 is geometry shader support. This long awaited shader pipeline stage allows for geometry primitives to be modified on the GPU. This includes generating new primitives from existing ones, modifying in-flight primitives, or removing primitives. With this feature, an app can amplify geometry without changing the stored vertices, implement tessellation schemes, or turn lines/points into volumes. One of the side effects of geometry shaders is that the amount of data handled by the CPU and passed to the GPU for the geometry generated is significantly reduced. This means precious bandwidth, CPU cycles, and memory are conserved.
OpenGL 3.2 has also added an important feature called sync objects. This feature creates a mechanism which allows the GPU and CPU to stay in sync. Previously the only way to be sure a GPU was finished with a surface or object was to flush the whole pipeline, stalling the GPU and killing performance. With sync objects, applications can be signaled when events on the GPU complete, even while the GPU is still fully saturated. This new functionality will work particularly well at syncing the CPU and GPU, keeping multiple graphics contexts in multiple threads in sync, and at synchronizing multiple GPUs when using extensions like WGL_AMD_GPU_association.
Multisample textures and samplers are now in OpenGL 3.2, giving applications the option of applying multisample rendering hardware to textures and render buffer objects, instead of only screen space windows. Now the use of off-screen real time rendering can also benefit from multisample rendering. Additionally, shaders can read from each sample of a multisampled texture and apply custom blend schemes.
With OpenGL 3.2, we have also added the idea of profiles. Two profiles exist in OpenGL 3.2, the core profile and the compatibility profile. Core profiles are ideal for modern applications that want the full performance benefits of a slimmed down API and reduced validation. Compatibility profiles are maintained for larger, older code-bases that need access to new features. AMD plans to support the Compatibility Profile, although other vendors may not. OpenGL 3.2 also adds a significant number of modifications that allow applications to be more easily ported from other 3D APIs. This is particularly important for developers bringing applications to different hardware such as mobile devices or Open Source platforms.
OpenGL 3.2 is proof of the relevance and continued evolution of open standards for 3D graphics. The OpenGL ARB continues to make forward progress, iterating through OpenGL releases that bring new and useful features to the graphics community. You can share suggestions and comments about OpenGL 3.2 with the OpenGL ARB through the official OpenGL 3.2 feedback thread on the OpenGL forum or by leaving comments for me here.
As GPUs become more powerful, we see many new applications of how they can be used as general compute devices often rivaling and surpassing the CPU. But at the same time, modern GPUs are augmented with tools and features that assist general computation. These new features add high performance paths that enhance graphics rendering capabilities. One such addition is GPU tessellation.
Tessellation in its most pure definition is the tiling of a plane or surface by smaller sub surfaces. On the GPU this translates into breaking geometry into smaller, more detailed pieces. ATI has previously done this through TrueForm® with mixed success. A tessellation mechanism can also be implemented using the geometry shader. But the new tessellation engine in ATI Radeon HD Series and FirePro/FireGL V Series graphics hardware automates this process (currently not available for OpenGL on any nVidia hardware). Very little work is needed to get this running in any OpenGL app, just enable tessellation state in OpenGL and pick your tessellation factor based on how detailed you would like the geometry to be. The application vertex shaders can also be updated to correct texture coordinates based on the generated geometry.
This powerful rendering mechanism can both enhance geometry and increase performance. By using tessellation, the same level of detail can be rendered at 6-times the speed and save more than 50% of video memory, not to mention the bandwidth saved from uploading significantly less geometry. (840 original triangle model, rendered at LOD of 1,008038 triangles with and without the tessellation engine) Such a performance boost in addition to the visual enhancement can provide a significant advantage for any application that adopts tessellation.
The result of tessellation is deterministic, and therefore well adapted to many CAD situations. But the biggest gains can be seen in digital content creation. Digital content models are often large and can be difficult to render in real-time. With tessellation, significantly smaller model sizes can be used for similar levels of detail. Pre-visualization paths can also make use of tessellation to provide better looking images faster than was previously possible. The example below is a fly-by done with tessellation enabled, showing how tessellation can enhance a landscape in real-time.
AMD has also created a white paper detailing how to implement Catmull Clark subdivision using the tessellation engine. The demo and whitepaper can be found here. Or explore many of the other possibilities for using tessellation on OpenGL or DirectX.
I was planning to post another side-by-side video comparing CATIA performance with and without VBO, but as I was watching one of the demos captured live at the COE 2009 conference I was struck by how impressive the FirePro VBO driver-accelerated demo really was. It stands on its own without the need for a comparison.
If you are a CATIA user and not using recent 3D acceleration hardware, then watch the video below to see what you are missing. This is a 5,000,000 polygon model (i.e. big) and it is being manipulated in real-time using a FirePro v5700. At less than $400 street price, this kind of performance enhancement makes the v5700 a great investment for 3D CAD users.
(If you are keen on the comparison statistics, this same model rotated programatically 100 times in CATIA on a FirePro v5700 was 2.7 times faster with VBO then without.)
The video pretty much says it all in a short and sweet fashion: Over three times the visualization performance in CATIA by using the optimized VBO (Vertex Buffer Object) OpenGL driver support on the FirePro V5700. This comparison was captured live at the COE 2009 Technifair.
Yesterday ATI released Catalyst Display Driver v8.583 for the FirePro that enables full support for OpenGL 3.0 This driver also supports the OpenGL extensions: AMD_vertex_shader_tessellator for increased geometry detail and enhanced realism and AMD_GPU_association> designed to provide improved performance scaling and parallel processing for multiple GPUs by allowing a workstation application, especially those that do off-screen rendering, to process multiple images or datasets simultaneously and combine the final image for display. AMD also announced support for OpenGL 3.1 in the very near future.
Khronos and the OpenGL ARB have done it! OpenGL 3.1 and GLSL 1.40 have been released on the 6 month schedule promised at SIGGRAPH 2008. As promised, most of the legacy features marked as deprecated have been removed. No more display lists. No more immediate mode rendering. No more fixed function pipeline. The cruft accumulated over the last 17 years has been cleaned up to create a simplified and performant 3D graphics API. OpenGL 3.1 really does match the current generation of programmable graphics devices.
In addition to removing deprecated functionality, OpenGL 3.1 adds a bunch of handy new features.
Uniform buffer objects
The first and biggest is support for uniform buffer objects. This new object allows a shader to group uniforms together into a block of uniform memory. New interfaces make updating groups of uniforms easier and much more efficient. These new buffers can also be shared between programs, reducing wasted memory usage and shader uniform-reload time.
Texture buffer objects and Copy buffers
Texture buffer objects were also introduced into core OpenGL 3.1. This new texture type allows generic buffers to be attached to a texture as a 1D array. Now general buffer data is accessible to shaders through new fetch functions. Additionally, a copy mechanism (GL_EXT_copy_buffers) that allows for direct accelerated buffer-to-buffer copies has been added. This extends the benefits of generic buffer objects and creates interesting opportunities with multithreaded load/execute algorithms.
Instanced rendering
Instanced rendering has been added to core, allowing apps to draw multiple copies of similar objects without incurring system bandwidth costs (I mentioned this inadvertently in an earlier post).
Other features
Primitive restart, SNORM textures and several other new features were also added.
OpenGL is continuing to march forward with progressive revisions bringing new functionality to 3D developers. AMD will follow with full driver support for OpenGL 3.1 shortly.
Now that OpenGL 3.0 is well on its way to a desktop nearby, you may be curious about what types of changes to expect from your favorite 3D applications. There are two main categories of improvements for OpenGL 3.0, changes that introduce new tools and changes that allow for performance enhancements. Well, let’s take a look!
FBOs
First, a new buffer binding mechanism called FBOs (Frame Buffer Objects) allow an OpenGL app to do comprehensive, fast and efficient off-screen rendering without creating a new context. Additionally, these FBOs can have floating point buffers attached as render targets. By using floating point buffers, applications can maintain more precision in the final image as effects are applied to a scene. This enables some really cool lighting effects such as lens aberration and blooming; similar to a feature film shot that catches a direct glimpse of the sun. Additionally, object highlights and specular reflections can appear much more realistic.
Transform feedback
Transform feedback, also called stream-out, is another new addition that will revolutionize what is possible on a GPU. Applications can use this feature to assist in physics computations directly on the GPU, preprocess or multi-process vertices, and perform complex math operations. Apps can also make use of transform feedback to efficiently tessellate geometry, adding significantly more detail to objects and scenes without increasing data file sizes on your hard drive. AMD supports a custom extension that offers applications even more control over tessellation.
Vertex array objects
OpenGL 3.0 also offers new ways of storing and referencing geometry, allowing for quicker access. Vertex array objects, or VAOs, make setting up rendering much quicker. New data formats also allow more efficient storage of geometry and texture information. All of these performance enhancements will allow applications to increase model sizes, use more sophisticated shading techniques, and increase overall visual fidelity.
Some applications have shorter development cycles than others. Typical CAD and digital content creation suites are large and complex; it may be a year or more before we see widespread adoption. Game engines may begin to look at the newest version of OpenGL sooner. But the good news is that changes in OpenGL 3.0 have made the API much lighter, allowing developers to achieve faster turnaround. There are many new tools in OpenGL 3.0 that bring exciting new power and flexibility to the 3D graphics arena. AMD is working closely with developers to bring OpenGL 3.0 to the next generation of professional 3D applications.
Instanced rendering (update 2/20/09 - available in the GL_ARB_draw_instanced extension - my mistake for first referencing it as core!)
There are numerous enhancements to OpenGL 3.0 that allow applications to process and render geometry much faster. One available for now as an extension is instanced rendering. This feature allows repeated rendering of some objects, sometimes at little or no additional cost. Imagine rendering hundreds of trees or blades of grass, all essentially the same geometry. This can also be applied for geometry stippling, other repeated patterns or even assist in bone-skinning for objects and characters that have moving joints.
Cut-to-the-chase summary of what to expect from OpenGL 3-enabled CAD and DCC apps:
- More realistic and interesting lighting effects
- Faster rendering of objects that repeat
- Improved visual fidelity and faster rendering
- Greater detail in objects and scenes without increasing file sizes
FireUser.com is a community resource for visualization, 3D, video and engineering professionals to learn about the latest acceleration and display technologies, discuss support issues, as well as influence the features and direction of the FireGL and FirePro accelerator line.