As part of the growing move to be green and energy efficient, AMD has released the FirePro 2450, a single low profile video card that can attach and drive as many as four DVI or VGA displays at once, even at 1920x1200 resolution. It is intended for anyone who needs as many screens as possible without the space or cost requirements of multiple video cards. In terms of power, it peaks at 32W under a heavy load, dropping to 17W in typical 2D-only conditions. It supports DirectX 10.1 and OpenGL 2.1. It comes with 512MB of memory for $499.
Tom's Hardware has published a comparison of the Radeon HD 4870 vs the FirePro V8700 for professional applications. From the article: "the ATI driver programmers have done an amazing job. The two models' hardware is 99% identical, and yet the FirePro adapter completely outclasses the cheaper Radeon gaming card. The most extreme case in point is Maya, where the FirePro V8700 is six times faster than the Radeon HD 4870."
More: "We also decided to investigate if there were visible differences in picture quality between the two models. On a basic Windows desktop we discovered no discrepancies, but as soon as you load a professional graphics application such as Maya or 3ds Max and import a complex 3D model, things change completely. When using the Radeon, you simply have to accept that wire frames will peek out of shaded surfaces all over the place, and that significant clipping occurs as numerous objects are viewed or animated. These phenomena simply don't occur when using the FirePro."
Bottom line: those who seek to be frugal with expensive workstation applications should not fall prey to false economies.
GPUs are highly parallel processors, generating results for millions of pixels, possibly hundreds of times per second. But at the same time, a GPU is a single processor that can only work on one task at a time. A single application, or even multiple applications, can rarely utilize the resources of multi-GPU systems efficiently.
Some modern GPU features help to address this issue. MultiView allows each OpenGL application on Windows XP to run locally on the GPU the monitor is attached to. CrossFire accelerates applications by tasking each GPU with a portion of rendering workload, a whole frame for instance. These options make a considerable impact by distributing work among GPUs. AMD is currently releasing an extension which allows applications to further enhance and optimize the use of multiple GPUs.
WGL_AMD_GPU_association allows applications to make the decision of what work to task to each GPU. A workstation application can use this extension to determine what types of GPUs are in a system and pick which contexts to allocate on each GPU. This allows a workstation application, especially those that do off-screen rendering, to process multiple images or datasets simultaneously and combine the final image for display. The decision of how to divide up the work is left to the application.
GPU Association is currently available in Catalyst 9.1. The extension specification can be found in the OpenGL extension registry. Soon to follow will be a Linux specific version. With this new tool, applications can fully exploit all of the graphics processing power of the next generation multi-GPU systems.
Note: Catalyst 9.1 does support the GPU association, but the extension name in this release is WGL_AMDX_GPU_association.
Last month I posted about the Catalyst 9.1 drivers with the full support for OpenGL 3.0 on Radeon HD 2400. ATI has released the new 9.2 driver and it introduces three new OpenGL extensions:
GL_EXT_bindable_uniform
GL_EXT_texture_shared_exponent
WGL_AMDX_gpu_association
I wrote to AMD to find out if the new WGL_AMDX_gpu_association was the equivalent to NVIDIA’s WGL_NV_gpu_affinity extension for binging an OpenGL render context to a specific GPU (when several GPUs are present). AMD responded back with the post from Nick Haemel.
Pic from the SolidWorks World 2009 conference of Allen Bourgoyne (FirePro team), speaking about basic techniques to measure and analyze workstation performance with respect to SolidWorks - basically the tools and resource available to them in order to reduce bottlenecks and optimize workstation hardware and software. The talk also looked at the performance differences between consumer and professional cards (as in the video), explaining why professional graphics cards are the right choice for SolidWorks users.
Now that OpenGL 3.0 is well on its way to a desktop nearby, you may be curious about what types of changes to expect from your favorite 3D applications. There are two main categories of improvements for OpenGL 3.0, changes that introduce new tools and changes that allow for performance enhancements. Well, let’s take a look!
FBOs
First, a new buffer binding mechanism called FBOs (Frame Buffer Objects) allow an OpenGL app to do comprehensive, fast and efficient off-screen rendering without creating a new context. Additionally, these FBOs can have floating point buffers attached as render targets. By using floating point buffers, applications can maintain more precision in the final image as effects are applied to a scene. This enables some really cool lighting effects such as lens aberration and blooming; similar to a feature film shot that catches a direct glimpse of the sun. Additionally, object highlights and specular reflections can appear much more realistic.
Transform feedback
Transform feedback, also called stream-out, is another new addition that will revolutionize what is possible on a GPU. Applications can use this feature to assist in physics computations directly on the GPU, preprocess or multi-process vertices, and perform complex math operations. Apps can also make use of transform feedback to efficiently tessellate geometry, adding significantly more detail to objects and scenes without increasing data file sizes on your hard drive. AMD supports a custom extension that offers applications even more control over tessellation.
Vertex array objects
OpenGL 3.0 also offers new ways of storing and referencing geometry, allowing for quicker access. Vertex array objects, or VAOs, make setting up rendering much quicker. New data formats also allow more efficient storage of geometry and texture information. All of these performance enhancements will allow applications to increase model sizes, use more sophisticated shading techniques, and increase overall visual fidelity.
Some applications have shorter development cycles than others. Typical CAD and digital content creation suites are large and complex; it may be a year or more before we see widespread adoption. Game engines may begin to look at the newest version of OpenGL sooner. But the good news is that changes in OpenGL 3.0 have made the API much lighter, allowing developers to achieve faster turnaround. There are many new tools in OpenGL 3.0 that bring exciting new power and flexibility to the 3D graphics arena. AMD is working closely with developers to bring OpenGL 3.0 to the next generation of professional 3D applications.
Instanced rendering (update 2/20/09 - available in the GL_ARB_draw_instanced extension - my mistake for first referencing it as core!)
There are numerous enhancements to OpenGL 3.0 that allow applications to process and render geometry much faster. One available for now as an extension is instanced rendering. This feature allows repeated rendering of some objects, sometimes at little or no additional cost. Imagine rendering hundreds of trees or blades of grass, all essentially the same geometry. This can also be applied for geometry stippling, other repeated patterns or even assist in bone-skinning for objects and characters that have moving joints.
Cut-to-the-chase summary of what to expect from OpenGL 3-enabled CAD and DCC apps:
- More realistic and interesting lighting effects
- Faster rendering of objects that repeat
- Improved visual fidelity and faster rendering
- Greater detail in objects and scenes without increasing file sizes
I have a video capture from the SolidWorks World 2009 conference last week that compares a top-of-the line Radeon consumer card (on the left), vs the low-end FirePro 3750 professional (on the right) on SolidWorks 2009 performance. As you can see in the video, the FirePro leaves the Radeon in the dust. Obviously, its the drivers support for OpenGL (specifically VBOs) and their SolidWorks 2009-specific optimizations that give the FirePro card the impressive edge.
With both at the same $200 price, and since I am not a gamer, the FirePro seems the obvious choice.
GPU Cafe notes that the FireStream 9270 will be released Q1 2009. What makes this interesting to me, beside the performance, is the listed support for OpenCL. AMD is embracing this parallel processing standard in a big way. I wonder how long it will be before mainstream CAD and 3D Viz software vendors can determine how to use OpenCL to accelerate their computations.
If you are into the specs and numbers race, here's some of the relevant info:
Currently the highest performing HPC processor outperforming Nvidia's Tesla C1060 by 28% (single-precision at 1.2 TFLOPS) and 207% (double-precision at 240 GFLOPS) at peak compute rates (but in real life, the API will make a big difference here).
The first public demonstration of OpenCL functionality was given by AMD at Siggraph Asia 2008. OpenCL is the new vendor-independent standard designed to extract high performance parallel computing out of GPUs, DSPs and multicore CPUs. Basically the idea is that you can write your core computational code in OpenCL and voila! - your code scales to whatever processors are available. OpenCL will greatly improve speed and responsiveness for a wide spectrum of applications from entertainment to scientific and 3D visualization.
The FirePro / FireStream teams created a screen capture of this particle & fluid simulation demo showing OpenCL functionality - embedded below. As you can see when you run it, initially the demo only uses one core of a Dragon-based system (quad-core Phenom II). As the additional cores are enabled the simulation compute time is cut in half!
Note: Set the embedded video to display at hi-quality to see more detail - it takes a while to load - but it is worth it (once you start the video, select HQ from the bottom right up-arrow of the video)
More details
CPU-optimized runtime based on the publicly available OpenCL specification from the Khronos Group designed to optimally run compute kernels on multi-core systems with linear scaling
Powdertoy is a combined particle & fluid simulator written by Stanislaw Skowronek
- Particles can change state (Snow melting into water)
- Particles affect the state of the fluid (Heat increases pressure)
- Fluid state affects the particles (Particle movement)
Computationally dense portions of the original C code ported to compute kernels in just a couple of hours
Yellow bar on right represents the time spent in all the compute kernels
- grey material used to write "powder toy" at the beginning of the video is "metal" which later melts
- AMD logo is drawn in a material that "clones" what ever touches it first, which in the video is a flammable gas (the yellow particles)
- after igniting the gas, the air near the logo rapidly heats up and expands, causing the outflow of fire particles.
The last part of the video shows the affect of enabling multiple cores. The video capture software prevents the forth core from being used by the runtime
Interested in more technical info about OpenCL and Stream Computing? Check out these papers:
Welcome to the future of professional graphics! That future begins with OpenGL 3.0. As most of you know, the OpenGL 3.0 specification was finalized late in 2008. It represents a big step in modernizing cross-platform 3D graphics support and bring applications closer to the torrential power of the modern graphics chip. OpenGL 3.0 will enable applications to continue pushing the edge of the graphics envelope while maintaining portability across all major OSes.
AMD has supported the core GL3 features as extensions for some time, allowing developers to get a head-start in developing for the new API. That means developers have had access to floating point color buffers, instanced rendering, updated GPU shader language features, new texture formats, and much more on Radeon and FirePro hardware.
The first official driver release with the full support for OpenGL 3.0 on Radeon HD 2400 and above is built into ATI Catalyst Release 9.1, set for public consumption on January 28, 2009. It will be available for download on the AMD Support & Drivers page. This release will enable full and forward-looking GL3.0 contexts on Windows XP, Windows Vista, and Linux. Look for official FirePro support later in Q1. The great news is you will be able to use GL3 on current ATI FirePro V3700 and above cards as well as ATI Radeon HD 2400 and above with a driver update.
Stay tuned for more updates on how AMD and OpenGL are enhancing application capabilities, speeding up workloads, and helping to move the CAD industry forward.
Update Jan 29, 2009 - 9.1 Driver is available for download. Check out the release notes for more info.
FireUser.com is a community resource for visualization, 3D, video and engineering professionals to learn about the latest acceleration and display technologies, discuss support issues, as well as influence the features and direction of the FireGL and FirePro accelerator line.