GPUs


Short Bytes: NVIDIA GeForce GTX 980 in 1000 Words

Short Bytes: NVIDIA GeForce GTX 980 in 1000 Words

To call the launch of NVIDIA’s Maxwell GM204 part impressive is something of an understatement. You can read our full coverage of the GTX 980 for the complete story, but here’s the short summary. Without the help of a manufacturing process shrink, NVIDIA and AMD are both looking at new ways to improve performance. The Maxwell architecture initially launched earlier this year with GM107 and the GTX 750 Ti and GTX 750, and with it we had our first viable mainstream GPU of the modern era that could deliver playable frame rates at 1080p while using less than 75W of power. The second generation Maxwell ups the ante by essentially tripling the CUDA core count of GM107, all while adding new features and still maintaining the impressive level of efficiency.

It’s worth pointing out that “Big Maxwell” (or at least “Bigger Maxwell”) is enough of a change that NVIDIA has bumped the model numbers from the GM100 series to GM200 series this round. NVIDIA has also skipped the desktop 800 line completely and is now in the 900 series. Architecturally, however, there’s enough change going into GM204 that calling this “Maxwell 2” is certainly warranted.

NVIDIA is touting a 2X performance per Watt increase over GTX 680, and they’ve delivered exactly that. Through a combination of architectural and design improvements, NVIDIA has moved from 192 CUDA cores per SMX in Kepler to 128 CUDA cores per SMM in Maxwell, and a single SMM is still able to deliver around 90% of the performance of an SMX of equivalent clocks. Put another way, NVIDIA says the new Maxwell 2 architecture is around 40% faster per CUDA core than Kepler. What that means in terms of specifications is that GM204 only needs 2048 CUDA cores to compete with – and generally surpass! – the performance of GK110 with its 2880 CUDA cores, which is used in the GeForce GTX 780 Ti and GTX Titan cards.

In terms of new features, some of the changes with GM204 come on the software/drivers side of things while other features have been implemented in hardware. Starting with the hardware side, GM204 now implements the full set of D3D 11.3/D3D 12 features, where previous designs  (Kepler and Maxwell 1) stopped at full Feature Level 11_0 with partial FL 11_1. The new features include Rasterizer Ordered Views, Typed UAV Load, Volume Tiled Resources, and Conservative Rasterization. Along with these, NVIDIA is also adding hardware features to accelerate what they’re calling VXGI – Voxel accelerated Global Illumination – a forward-looking technology that brings GPUs one step closer to doing real-time path tracing. (NVIDIA has more details available if you’re interested in learning more).

NVIDIA also has a couple new techniques to improve anti-aliasing, Dynamic Super Resolution (DSR) and Multi-Frame Anti-Aliasing (MFAA). DSR essentially renders a game at a higher resolution and then down-sizes the result to your native resolution using a high-quality 13-tap Gaussian filter. It’s similar to super sampling, but the great benefit of DSR over SSAA is that the game doesn’t have any knowledge of DSR; as long as the game can support higher resolutions, NVIDIA’s drivers take care of all of the work behind the scenes. MFAA (please, no jokes about “mofo AA”) is supposed to offer essentially the same quality as 4x MSAA with the performance hit of 2x MSAA through a combination of custom filters and looking at previously rendered frames. MFAA can also function with a 4xAA mode to provide an alternative to 8x MSAA.

The above is all well and good, but what really matters at the end of the day is the actual performance that GM204 can offer. We’ve averaged results from our gaming benchmarks at our 2560×1440 and 1920×1080 settings, as well as our compute benchmarks, with all scores normalized to the GTX 680. Here’s how the new GeForce GTX 980 compares with other GPUs. (Note that we’ve omitted the overclocking results for the GTX 980, as it wasn’t tested across all of the games, but on average it’s around 18% faster than the stock GTX 980 while consuming around 20% more power.)

Average Gaming Performance - 2560x1440

Average Gaming Performance - 1920x1080

Compute Performance

Wow. Obviously there’s not quite as much to be gained by running such a fast GPU at 1920×1080, but at 2560×1440 we’re looking at a GPU that’s a healthy 74% faster on average compared to the GTX 680. Perhaps more importantly, the GTX 980 is also on average 8% faster than the GTX 780 Ti and 13.5% faster than AMD’s Radeon R9 290X (in Uber mode, as that’s what most shipping cards use). Compute performance sees some even larger gains over previous NVIDIA GPUs, with the 980 besting the 680 by 132%; it’s also 16% faster than the 780 Ti but “only” 1.5% faster than the 290X – though the 290X still beats the GTX 980 in Sony Vegas Pro 12 and SystemCompute.

If we look at the GTX 780 Ti, on the one hand performance hasn’t improved so much that we’d recommend upgrading, though you do get some new features that might prove useful over time. For those that didn’t find the price/performance offered by GTX 780 Ti a compelling reason to upgrade, the GTX 980 sweetens the pot by dropping the MSRP down to $549, and what’s more it also uses quite a bit less power:

Gaming Power Consumption

This is what we call the trifecta of graphics hardware: better performance, lower power, and lower prices. When NVIDIA unveiled the GTX 750 Ti back in February, it was ultimately held back by performance while its efficiency was a huge step forward; it seemed almost too much to hope for that sort of product in the high performance GPU arena. NVIDIA doesn’t disappoint, however, dropping power consumption by 18% relative to the GTX 780 Ti while improving performance by roughly 10% and dropping the launch price by just over 20%. If you’ve been waiting for a reason to upgrade, GeForce GTX 980 is about as good as it gets, though the much less expensive GTX 970 might just spoil the party. We’ll have a look at the 970 next week.

Short Bytes: NVIDIA GeForce GTX 980 in 1000 Words

Short Bytes: NVIDIA GeForce GTX 980 in 1000 Words

To call the launch of NVIDIA’s Maxwell GM204 part impressive is something of an understatement. You can read our full coverage of the GTX 980 for the complete story, but here’s the short summary. Without the help of a manufacturing process shrink, NVIDIA and AMD are both looking at new ways to improve performance. The Maxwell architecture initially launched earlier this year with GM107 and the GTX 750 Ti and GTX 750, and with it we had our first viable mainstream GPU of the modern era that could deliver playable frame rates at 1080p while using less than 75W of power. The second generation Maxwell ups the ante by essentially tripling the CUDA core count of GM107, all while adding new features and still maintaining the impressive level of efficiency.

It’s worth pointing out that “Big Maxwell” (or at least “Bigger Maxwell”) is enough of a change that NVIDIA has bumped the model numbers from the GM100 series to GM200 series this round. NVIDIA has also skipped the desktop 800 line completely and is now in the 900 series. Architecturally, however, there’s enough change going into GM204 that calling this “Maxwell 2” is certainly warranted.

NVIDIA is touting a 2X performance per Watt increase over GTX 680, and they’ve delivered exactly that. Through a combination of architectural and design improvements, NVIDIA has moved from 192 CUDA cores per SMX in Kepler to 128 CUDA cores per SMM in Maxwell, and a single SMM is still able to deliver around 90% of the performance of an SMX of equivalent clocks. Put another way, NVIDIA says the new Maxwell 2 architecture is around 40% faster per CUDA core than Kepler. What that means in terms of specifications is that GM204 only needs 2048 CUDA cores to compete with – and generally surpass! – the performance of GK110 with its 2880 CUDA cores, which is used in the GeForce GTX 780 Ti and GTX Titan cards.

In terms of new features, some of the changes with GM204 come on the software/drivers side of things while other features have been implemented in hardware. Starting with the hardware side, GM204 now implements the full set of D3D 11.3/D3D 12 features, where previous designs  (Kepler and Maxwell 1) stopped at full Feature Level 11_0 with partial FL 11_1. The new features include Rasterizer Ordered Views, Typed UAV Load, Volume Tiled Resources, and Conservative Rasterization. Along with these, NVIDIA is also adding hardware features to accelerate what they’re calling VXGI – Voxel accelerated Global Illumination – a forward-looking technology that brings GPUs one step closer to doing real-time path tracing. (NVIDIA has more details available if you’re interested in learning more).

NVIDIA also has a couple new techniques to improve anti-aliasing, Dynamic Super Resolution (DSR) and Multi-Frame Anti-Aliasing (MFAA). DSR essentially renders a game at a higher resolution and then down-sizes the result to your native resolution using a high-quality 13-tap Gaussian filter. It’s similar to super sampling, but the great benefit of DSR over SSAA is that the game doesn’t have any knowledge of DSR; as long as the game can support higher resolutions, NVIDIA’s drivers take care of all of the work behind the scenes. MFAA (please, no jokes about “mofo AA”) is supposed to offer essentially the same quality as 4x MSAA with the performance hit of 2x MSAA through a combination of custom filters and looking at previously rendered frames. MFAA can also function with a 4xAA mode to provide an alternative to 8x MSAA.

The above is all well and good, but what really matters at the end of the day is the actual performance that GM204 can offer. We’ve averaged results from our gaming benchmarks at our 2560×1440 and 1920×1080 settings, as well as our compute benchmarks, with all scores normalized to the GTX 680. Here’s how the new GeForce GTX 980 compares with other GPUs. (Note that we’ve omitted the overclocking results for the GTX 980, as it wasn’t tested across all of the games, but on average it’s around 18% faster than the stock GTX 980 while consuming around 20% more power.)

Average Gaming Performance - 2560x1440

Average Gaming Performance - 1920x1080

Compute Performance

Wow. Obviously there’s not quite as much to be gained by running such a fast GPU at 1920×1080, but at 2560×1440 we’re looking at a GPU that’s a healthy 74% faster on average compared to the GTX 680. Perhaps more importantly, the GTX 980 is also on average 8% faster than the GTX 780 Ti and 13.5% faster than AMD’s Radeon R9 290X (in Uber mode, as that’s what most shipping cards use). Compute performance sees some even larger gains over previous NVIDIA GPUs, with the 980 besting the 680 by 132%; it’s also 16% faster than the 780 Ti but “only” 1.5% faster than the 290X – though the 290X still beats the GTX 980 in Sony Vegas Pro 12 and SystemCompute.

If we look at the GTX 780 Ti, on the one hand performance hasn’t improved so much that we’d recommend upgrading, though you do get some new features that might prove useful over time. For those that didn’t find the price/performance offered by GTX 780 Ti a compelling reason to upgrade, the GTX 980 sweetens the pot by dropping the MSRP down to $549, and what’s more it also uses quite a bit less power:

Gaming Power Consumption

This is what we call the trifecta of graphics hardware: better performance, lower power, and lower prices. When NVIDIA unveiled the GTX 750 Ti back in February, it was ultimately held back by performance while its efficiency was a huge step forward; it seemed almost too much to hope for that sort of product in the high performance GPU arena. NVIDIA doesn’t disappoint, however, dropping power consumption by 18% relative to the GTX 780 Ti while improving performance by roughly 10% and dropping the launch price by just over 20%. If you’ve been waiting for a reason to upgrade, GeForce GTX 980 is about as good as it gets, though the much less expensive GTX 970 might just spoil the party. We’ll have a look at the 970 next week.

Microsoft Details Direct3D 11.3 & 12 New Rendering Features

Microsoft Details Direct3D 11.3 & 12 New Rendering Features

Back at GDC 2014 in March, Microsoft and its hardware partners first announced the next full iteration of the Direct3D API. Now on to version 12, this latest version of Direct3D would be focused on low level graphics programming, unlocking the greater performance and greater efficiency that game consoles have traditionally enjoyed by giving seasoned programmers more direct access to the underlying hardware. In particular, low level access would improve performance both by reducing the overhead high level APIs incur, and by allowing developers to better utilize multi-threading by making it far easier to have multiple threads submitting work.

At the time Microsoft offered brief hints that there would be more to Direct3D 12 than just the low level API, but the low level API was certainly the focus for the day. Now as part of NVIDIA’s launch of the second generation Maxwell based GeForce GTX 980, Microsoft has opened up to the press and public a bit more on what their plans are for Direct3D. Direct3D 12 will indeed introduce new features, but there will be more in development than just Direct3D 12.

Direct3D 11.3

First and foremost then, Microsoft has announced that there will be a new version of Direct3D 11 coinciding with Direct3D 12. Dubbed Direct3D 11.3, this new version of Direct3D is a continuation of the development and evolution of the Direct3D 11 API and like the previous point updates will be adding API support for features found in upcoming hardware.

At first glance the announcement of Direct3D 11.3 would appear to be at odds with Microsoft’s development work on Direct3D 12, but in reality there is a lot of sense in this announcement. Direct3D 12 is a low level API – powerful, but difficult to master and very dangerous in the hands of inexperienced programmers. The development model envisioned for Direct3D 12 is that a limited number of code gurus will be the ones writing the engines and renderers that target the new API, while everyone else will build on top of these engines. This works well for the many organizations that are licensing engines such as UE4, or for the smaller number of organizations that can justify having such experienced programmers on staff.

However for these reasons a low level API is not suitable for everyone. High level APIs such as Direct3D 11 do exist for a good reason after all; their abstraction not only hides the quirks of the underlying hardware, but it makes development easier and more accessible as well. For these reasons there is a need to offer both high level and low level APIs. Direct3D 12 will be the low level API, and Direct3D 11 will continue to be developed to offer the same features through a high level API.

Direct3D 12

Today’s announcement of Direct3D 11.3 and the new features set that Direct3D 11.3 and 12 will be sharing will have an impact on Direct3D 12 as well. We’ll get to the new features in a moment, but at a high level it should be noted that this means that Direct3D 12 is going to end up being a multi-generational (multi-feature level) API similar to Direct3D 11.

In Direct3D 11 Microsoft introduced feature levels, which allowed programmers to target different generations of hardware using the same API, instead of having to write their code multiple times for each associated API generation. In practice this meant that programmers could target D3D 9, 10, and 11 hardware through the D3D 11 API, restricting their feature use accordingly to match the hardware capabilities. This functionality was exposed through feature levels (ex: FL9_3 for D3D9.0c capable hardware) which offered programmers a neat segmentation of feature sets and requirements.

Direct3D 12 in turn will also be making use of feature levels, allowing developers to exploit the benefits of the low level nature of the API while being able to target multiple generations of hardware. It’s through this mechanism that Direct3D 12 will be usable on GPUs as old as NVIDIA’s Fermi family or as new as their Maxwell family, all the while still being able to utilize the features added in newer generations.

Ultimately for users this means they will need to be mindful of feature levels, just as they are today with Direct3D 11. Hardware that is Direct3D 12 compatible does not mean it supports all of the latest feature sets, and keeping track of feature set compatibility for each generation of hardware will still be important going forward.

11.3 & 12: New Features

Getting to the heart of today’s announcement from Microsoft, we have the newly announced features that will be coming to Direct3D 11.3 and 12. It should be noted at this point in time this is not an exhaustive list of all of the new features that we will see, and Microsoft is still working to define a new feature level to go with them (in the interim they will be accessed through cap bits), but none the less this is our first detailed view at what are expected to be the major new features of 11.3/12

Rasterizer Ordered Views

First and foremost of the new features is Rasterizer Ordered Views (ROVs). As hinted at by the name, ROVs is focused on giving the developer control over the order that elements are rasterized in a scene, so that elements are drawn in the correct order. This feature specifically applies to Unordered Access Views (UAVs) being generated by pixel shaders, which buy their very definition are initially unordered. ROVs offers an alternative to UAV’s unordered nature, which would result in elements being rasterized simply in the order they were finished. For most rendering tasks unordered rasterization is fine (deeper elements would be occluded anyhow), but for a certain category of tasks having the ability to efficiently control the access order to a UAV is important to correctly render a scene quickly.

The textbook use case for ROVs is Order Independent Transparency, which allows for elements to be rendered in any order and still blended together correctly in the final result. OIT is not new – Direct3D 11 gave the API enough flexibility to accomplish this task – however these earlier OIT implementations would be very slow due to sorting, restricting their usefulness outside of CAD/CAM. The ROV implementation however could accomplish the same task much more quickly by getting the order correct from the start, as opposed to having to sort results after the fact.

Along these lines, since OIT is just a specialized case of a pixel blending operation, ROVs will also be usable for other tasks that require controlled pixel blending, including certain cases of anti-aliasing.

Typed UAV Load

 

The second feature coming to Direct3D is Typed UAV Load. Unordered Access Views (UAVs) are a special type of buffer that allows multiple GPU threads to access the same buffer simultaneously without generating memory conflicts. Because of this disorganized nature of UAVs, certain restrictions are in place that Typed UAV Load will address. As implied by the name, Typed UAV Load deals with cases where UAVs are data typed, and how to better handle their use.

Volume Tiled Resources

 

The third feature coming to Direct3D is Volume Tiled Resources. VTR builds off of the work Microsoft and partners have already done for tiled resources (AKA sparse allocation, AKA hardware megatexture) by extending it into the 3rd dimension.

VTRs are primarily meant to be used with volumetric pixels (voxels), with the idea being that with sparse allocation, volume tiles that do not contain any useful information can avoid being allocated, avoiding tying up memory in tiles that will never be used or accessed. This kind of sparse allocation is necessary to make certain kinds of voxel techniques viable.

Conservative Rasterization

 

Last but certainly not least among Direct3D’s new features will be conservative rasterization. Conservative rasterization is essentially a more accurate but performance intensive solution to figuring out whether a polygon covers part of a pixel. Instead of doing a quick and simple test to see if the center of the pixel is bounded by the lines of the polygon, conservative rasterization checks whether the pixel covers the polygon by testing it against the corners of the pixel. This means that conservative rasterization will catch cases where a polygon was too small to cover the center of a pixel, which results in a more accurate outcome, be it better identifying pixels a polygon resides in, or finding polygons too small to cover the center of any pixel at all. This in turn being where the “conservative” aspect of the name comes from, as a rasterizer would be conservative by including every pixel touched by a triangle as opposed to just the pixels where the tringle covers the center point.

Conservative rasterization is being added to Direct3D in order to allow new algorithms to be used which would fail under the imprecise nature of point sampling. Like VTR, voxels play a big part here as conservative rasterization can be used to build a voxel. However it also has use cases in more accurate tiling and even collision detection.

Final Words

Wrapping things up, today’s announcement of Direct3D 11.3 and its new features offers a solid roadmap for both the evolution of Direct3D and the hardware that will support it. By confirming that they are continuing to work on Direct3D 11 Microsoft has answered one of the lingering questions surrounding Direct3D 12 – what happens to Direct3D 11 – and at the same time this highlights the hardware features that the next generation of hardware will need to support in order to be compliant with the latest D3D feature level. And with Direct3D 12 set to be released sometime next year, these new features won’t be too far off either.

Microsoft Details Direct3D 11.3 & 12 New Rendering Features

Microsoft Details Direct3D 11.3 & 12 New Rendering Features

Back at GDC 2014 in March, Microsoft and its hardware partners first announced the next full iteration of the Direct3D API. Now on to version 12, this latest version of Direct3D would be focused on low level graphics programming, unlocking the greater performance and greater efficiency that game consoles have traditionally enjoyed by giving seasons programmers more direct access to the underlying hardware. In particular, low level access would improve performance both by reducing the overhead high level APIs incur, and by allowing developers to better utilize multi-threading by making it far easier to have multiple threads submitting work.

At the time Microsoft offered brief hints that there would be more to Direct3D 12 than just the low level API, but the low level API was certainly the focus for the day. Now as part of NVIDIA’s launch of the second generation Maxwell based GeForce GTX 980, Microsoft has opened up to the press and public a bit more on what their plans are for Direct3D. Direct3D 12 will indeed introduce new features, but there will be more in development than just Direct3D 12.

Direct3D 11.3

First and foremost then, Microsoft has announced that there will be a new version of Direct3D 11 coinciding with Direct3D 12. Dubbed Direct3D 11.3, this new version of Direct3D is a continuation of the development and evolution of the Direct3D 11 API and like the previous point updates will be adding API support for features found in upcoming hardware.

At first glance the announcement of Direct3D 11.3 would appear to be at odds with Microsoft’s development work on Direct3D 12, but in reality there is a lot of sense in this announcement. Direct3D 12 is a low level API – powerful, but difficult to master and very dangerous in the hands of inexperienced programmers. The development model envisioned for Direct3D 12 is that a limited number of code gurus will be the ones writing the engines and renderers that target the new API, while everyone else will build on top of these engines. This works well for the many organizations that are licensing engines such as UE4, or for the smaller number of organizations that can justify having such experienced programmers on staff.

However for these reasons a low level API is not suitable for everyone. High level APIs such as Direct3D 11 do exist for a good reason after all; their abstraction not only hides the quirks of the underlying hardware, but it makes development easier and more accessible as well. For these reasons there is a need to offer both high level and low level APIs. Direct3D 12 will be the low level API, and Direct3D 11 will continue to be developed to offer the same features through a high level API.

Direct3D 12

Today’s announcement of Direct3D 11.3 and the new features set that Direct3D 11.3 and 12 will be sharing will have an impact on Direct3D 12 as well. We’ll get to the new features in a moment, but at a high level it should be noted that this means that Direct3D 12 is going to end up being a multi-generational (multi-feature level) API similar to Direct3D 11.

In Direct3D 11 Microsoft introduced feature levels, which allowed programmers to target different generations of hardware using the same API, instead of having to write their code multiple times for each associated API generation. In practice this meant that programmers could target D3D 9, 10, and 11 hardware through the D3D 11 API, restricting their feature use accordingly to match the hardware capabilities. This functionality was exposed through feature levels (ex: FL9_3 for D3D9.0c capable hardware) which offered programmers a neat segmentation of feature sets and requirements.

Direct3D 12 in turn will also be making use of feature levels, allowing developers to exploit the benefits of the low level nature of the API while being able to target multiple generations of hardware. It’s through this mechanism that Direct3D 12 will be usable on GPUs as old as NVIDIA’s Fermi family or as new as their Maxwell family, all the while still being able to utilize the features added in newer generations.

Ultimately for users this means they will need to be mindful of feature levels, just as they are today with Direct3D 11. Hardware that is Direct3D 12 compatible does not mean it supports all of the latest feature sets, and keeping track of feature set compatibility for each generation of hardware will still be important going forward.

11.3 & 12: New Features

Getting to the heart of today’s announcement from Microsoft, we have the newly announced features that will be coming to Direct3D 11.3 and 12. It should be noted at this point in time this is not an exhaustive list of all of the new features that we will see, and Microsoft is still working to define a new feature level to go with them (in the interim they will be accessed through cap bits), but none the less this is our first detailed view at what are expected to be the major new features of 11.3/12

Rasterizer Ordered Views

First and foremost of the new features is Rasterizer Ordered Views (ROVs). As hinted at by the name, ROVs is focused on giving the developer control over the order that elements are rasterized in a scene, so that elements are drawn in the correct order. This feature specifically applies to Unordered Access Views (UAVs) being generated by pixel shaders, which buy their very definition are initially unordered. ROVs offers an alternative to UAV’s unordered nature, which would result in elements being rasterized simply in the order they were finished. For most rendering tasks unordered rasterization is fine (deeper elements would be occluded anyhow), but for a certain category of tasks having the ability to efficiently control the access order to a UAV is important to correctly render a scene quickly.

The textbook use case for ROVs is Order Independent Transparency, which allows for elements to be rendered in any order and still blended together correctly in the final result. OIT is not new – Direct3D 11 gave the API enough flexibility to accomplish this task – however these earlier OIT implementations would be very slow due to sorting, restricting their usefulness outside of CAD/CAM. The ROV implementation however could accomplish the same task much more quickly by getting the order correct from the start, as opposed to having to sort results after the fact.

Along these lines, since OIT is just a specialized case of a pixel blending operation, ROVs will also be usable for other tasks that require controlled pixel blending, including certain cases of anti-aliasing.

Typed UAV Load

 

The second feature coming to Direct3D is Typed UAV Load. Unordered Access Views (UAVs) are a special type of buffer that allows multiple GPU threads to access the same buffer simultaneously without generating memory conflicts. Because of this disorganized nature of UAVs, certain restrictions are in place that Typed UAV Load will address. As implied by the name, Typed UAV Load deals with cases where UAVs are data typed, and how to better handle their use.

Volume Tiled Resources

 

The third feature coming to Direct3D is Volume Tiled Resources. VTR builds off of the work Microsoft and partners have already done for tiled resources (AKA sparse allocation, AKA hardware megatexture) by extending it into the 3rd dimension.

VTRs are primarily meant to be used with volumetric pixels (voxels), with the idea being that with sparse allocation, volume tiles that do not contain any useful information can avoid being allocated, avoiding tying up memory in tiles that will never be used or accessed. This kind of sparse allocation is necessary to make certain kinds of voxel techniques viable.

Conservative Rasterization

 

Last but certainly not least among Direct3D’s new features will be conservative rasterization. Conservative rasterization is essentially a more accurate but performance intensive solution to figuring out whether a polygon covers part of a pixel. Instead of doing a quick and simple test to see if the center of the pixel is bounded by the lines of the polygon, conservative rasterization checks whether the pixel covers the polygon by testing it against the corners of the pixel. This means that conservative rasterization will catch cases where a polygon was too small to cover the center of a pixel, which results in a more accurate outcome, be it better identifying pixels a polygon resides in, or finding polygons too small to cover the center of any pixel at all. This in turn being where the “conservative” aspect of the name comes from, as a rasterizer would be conservative by including every pixel touched by a triangle as opposed to just the pixels where the tringle covers the center point.

Conservative rasterization is being added to Direct3D in order to allow new algorithms to be used which would fail under the imprecise nature of point sampling. Like VTR, voxels play a big part here as conservative rasterization can be used to build a voxel. However it also has use cases in more accurate tiling and even collision detection.

Final Words

Wrapping things up, today’s announcement of Direct3D 11.3 and its new features offers a solid roadmap for both the evolution of Direct3D and the hardware that will support it. By confirming that they are continuing to work on Direct3D 11 Microsoft has answered one of the lingering questions surrounding Direct3D 12 – what happens to Direct3D 11 – and at the same time this highlights the hardware features that the next generation of hardware will need to support in order to be compliant with the latest D3D feature level. And with Direct3D 12 set to be released sometime next year, these new features won’t be too far off either.

Microsoft Details Direct3D 11.3 & 12 New Rendering Features

Microsoft Details Direct3D 11.3 & 12 New Rendering Features

Back at GDC 2014 in March, Microsoft and its hardware partners first announced the next full iteration of the Direct3D API. Now on to version 12, this latest version of Direct3D would be focused on low level graphics programming, unlocking the greater performance and greater efficiency that game consoles have traditionally enjoyed by giving seasoned programmers more direct access to the underlying hardware. In particular, low level access would improve performance both by reducing the overhead high level APIs incur, and by allowing developers to better utilize multi-threading by making it far easier to have multiple threads submitting work.

At the time Microsoft offered brief hints that there would be more to Direct3D 12 than just the low level API, but the low level API was certainly the focus for the day. Now as part of NVIDIA’s launch of the second generation Maxwell based GeForce GTX 980, Microsoft has opened up to the press and public a bit more on what their plans are for Direct3D. Direct3D 12 will indeed introduce new features, but there will be more in development than just Direct3D 12.

Direct3D 11.3

First and foremost then, Microsoft has announced that there will be a new version of Direct3D 11 coinciding with Direct3D 12. Dubbed Direct3D 11.3, this new version of Direct3D is a continuation of the development and evolution of the Direct3D 11 API and like the previous point updates will be adding API support for features found in upcoming hardware.

At first glance the announcement of Direct3D 11.3 would appear to be at odds with Microsoft’s development work on Direct3D 12, but in reality there is a lot of sense in this announcement. Direct3D 12 is a low level API – powerful, but difficult to master and very dangerous in the hands of inexperienced programmers. The development model envisioned for Direct3D 12 is that a limited number of code gurus will be the ones writing the engines and renderers that target the new API, while everyone else will build on top of these engines. This works well for the many organizations that are licensing engines such as UE4, or for the smaller number of organizations that can justify having such experienced programmers on staff.

However for these reasons a low level API is not suitable for everyone. High level APIs such as Direct3D 11 do exist for a good reason after all; their abstraction not only hides the quirks of the underlying hardware, but it makes development easier and more accessible as well. For these reasons there is a need to offer both high level and low level APIs. Direct3D 12 will be the low level API, and Direct3D 11 will continue to be developed to offer the same features through a high level API.

Direct3D 12

Today’s announcement of Direct3D 11.3 and the new features set that Direct3D 11.3 and 12 will be sharing will have an impact on Direct3D 12 as well. We’ll get to the new features in a moment, but at a high level it should be noted that this means that Direct3D 12 is going to end up being a multi-generational (multi-feature level) API similar to Direct3D 11.

In Direct3D 11 Microsoft introduced feature levels, which allowed programmers to target different generations of hardware using the same API, instead of having to write their code multiple times for each associated API generation. In practice this meant that programmers could target D3D 9, 10, and 11 hardware through the D3D 11 API, restricting their feature use accordingly to match the hardware capabilities. This functionality was exposed through feature levels (ex: FL9_3 for D3D9.0c capable hardware) which offered programmers a neat segmentation of feature sets and requirements.

Direct3D 12 in turn will also be making use of feature levels, allowing developers to exploit the benefits of the low level nature of the API while being able to target multiple generations of hardware. It’s through this mechanism that Direct3D 12 will be usable on GPUs as old as NVIDIA’s Fermi family or as new as their Maxwell family, all the while still being able to utilize the features added in newer generations.

Ultimately for users this means they will need to be mindful of feature levels, just as they are today with Direct3D 11. Hardware that is Direct3D 12 compatible does not mean it supports all of the latest feature sets, and keeping track of feature set compatibility for each generation of hardware will still be important going forward.

11.3 & 12: New Features

Getting to the heart of today’s announcement from Microsoft, we have the newly announced features that will be coming to Direct3D 11.3 and 12. It should be noted at this point in time this is not an exhaustive list of all of the new features that we will see, and Microsoft is still working to define a new feature level to go with them (in the interim they will be accessed through cap bits), but none the less this is our first detailed view at what are expected to be the major new features of 11.3/12

Rasterizer Ordered Views

First and foremost of the new features is Rasterizer Ordered Views (ROVs). As hinted at by the name, ROVs is focused on giving the developer control over the order that elements are rasterized in a scene, so that elements are drawn in the correct order. This feature specifically applies to Unordered Access Views (UAVs) being generated by pixel shaders, which buy their very definition are initially unordered. ROVs offers an alternative to UAV’s unordered nature, which would result in elements being rasterized simply in the order they were finished. For most rendering tasks unordered rasterization is fine (deeper elements would be occluded anyhow), but for a certain category of tasks having the ability to efficiently control the access order to a UAV is important to correctly render a scene quickly.

The textbook use case for ROVs is Order Independent Transparency, which allows for elements to be rendered in any order and still blended together correctly in the final result. OIT is not new – Direct3D 11 gave the API enough flexibility to accomplish this task – however these earlier OIT implementations would be very slow due to sorting, restricting their usefulness outside of CAD/CAM. The ROV implementation however could accomplish the same task much more quickly by getting the order correct from the start, as opposed to having to sort results after the fact.

Along these lines, since OIT is just a specialized case of a pixel blending operation, ROVs will also be usable for other tasks that require controlled pixel blending, including certain cases of anti-aliasing.

Typed UAV Load

 

The second feature coming to Direct3D is Typed UAV Load. Unordered Access Views (UAVs) are a special type of buffer that allows multiple GPU threads to access the same buffer simultaneously without generating memory conflicts. Because of this disorganized nature of UAVs, certain restrictions are in place that Typed UAV Load will address. As implied by the name, Typed UAV Load deals with cases where UAVs are data typed, and how to better handle their use.

Volume Tiled Resources

 

The third feature coming to Direct3D is Volume Tiled Resources. VTR builds off of the work Microsoft and partners have already done for tiled resources (AKA sparse allocation, AKA hardware megatexture) by extending it into the 3rd dimension.

VTRs are primarily meant to be used with volumetric pixels (voxels), with the idea being that with sparse allocation, volume tiles that do not contain any useful information can avoid being allocated, avoiding tying up memory in tiles that will never be used or accessed. This kind of sparse allocation is necessary to make certain kinds of voxel techniques viable.

Conservative Rasterization

 

Last but certainly not least among Direct3D’s new features will be conservative rasterization. Conservative rasterization is essentially a more accurate but performance intensive solution to figuring out whether a polygon covers part of a pixel. Instead of doing a quick and simple test to see if the center of the pixel is bounded by the lines of the polygon, conservative rasterization checks whether the pixel covers the polygon by testing it against the corners of the pixel. This means that conservative rasterization will catch cases where a polygon was too small to cover the center of a pixel, which results in a more accurate outcome, be it better identifying pixels a polygon resides in, or finding polygons too small to cover the center of any pixel at all. This in turn being where the “conservative” aspect of the name comes from, as a rasterizer would be conservative by including every pixel touched by a triangle as opposed to just the pixels where the tringle covers the center point.

Conservative rasterization is being added to Direct3D in order to allow new algorithms to be used which would fail under the imprecise nature of point sampling. Like VTR, voxels play a big part here as conservative rasterization can be used to build a voxel. However it also has use cases in more accurate tiling and even collision detection.

Final Words

Wrapping things up, today’s announcement of Direct3D 11.3 and its new features offers a solid roadmap for both the evolution of Direct3D and the hardware that will support it. By confirming that they are continuing to work on Direct3D 11 Microsoft has answered one of the lingering questions surrounding Direct3D 12 – what happens to Direct3D 11 – and at the same time this highlights the hardware features that the next generation of hardware will need to support in order to be compliant with the latest D3D feature level. And with Direct3D 12 set to be released sometime next year, these new features won’t be too far off either.