GPUs


Updated: AMD Announces Radeon Pro SSG: Fiji With M.2 SSDs On-Board

Updated: AMD Announces Radeon Pro SSG: Fiji With M.2 SSDs On-Board

As part of this evening’s AMD Capsaicin event (more on that later), AMD’s Chief Architect and SVP of the Radeon Technologies Group has announced a new Radeon Pro card unlike anything else. Dubbed the Radeon Pro Solid State Graphics (SSG), this card includes M.2 slots for adding NAND SSDs, with the goal of vastly increasing the amount of local storage available to the video card.

Details are a bit thin and I’ll update this later this evening, but in short the card utilizes a Polaris 10 Fiji GPU and includes 2 PCIe 3.0 M.2 slots for adding flash drives to the card. These slots are then attached to the GPU (it’s unclear if there’s a PCIe switch involved or if it’s wired directly), which the GPU can then use as an additional tier of storage. I’m told that the card can fit at least 1TB of NAND – likely limited by M.2 MLC SSD capacities – which massively increases the amount of local storage available on the card.

As AMD explains it, the purpose of going this route is to offer another solution to the workset size limitations of current professional graphics cards. Even AMD’s largest card currently tops out at 32GB, and while this is a fair amount, there are workloads that can use more. This is particular the case for workloads with massive datasets (oil & gas), or as AMD demonstrated, scrubbing through an 8K video file.

Current cards can spill over to system memory, and while the PCIe bus is fast, it’s still much slower than local memory, plus it is subject to the latency of the relatively long trip and waiting on the CPU to address requests. Local NAND storage, by comparison, offers much faster round trips, though on paper the bandwidth isn’t as good, so I’m curious to see just how it compares to the real world datasets that spill over to system memory.  Meanwhile actual memory management/usage/tiering is handled by a combination of the drivers and developer software, so developers will need to code specifically for it as things stand.

For the moment, AMD is treating the Radeon Pro SSG as a beta product, and will be selling developer kits for it directly., with full availability set for 2017. For now developers need to apply for a kit from AMD, and I’m told the first kits are available immediately. Interested developers will need to have saved up their pennies though: a dev kit will set you back $9,999.

Update:

Now that AMD’s presentation is over, we have a bit more information on the Radeon Pro SSG and how it works.

In terms of hardware, the Fiji based card is outfit with a PCIe bridge chip – the same PEX8747 bridge chip used on the Radeon Pro Duo, I’m told – with the bridge connecting the two PCIe x4 M.2 slots to the GPU, and allowing both cards to share the PCIe system connection. Architecturally the prototype card is essentially a PCIe SSD adapter and a video card on a single board, with no special connectivity in use beyond what the PCIe bridge chip provides.

The SSDs themselves are a pair of 512GB Samsung 950 Pros, which are about the fastest thing available on the market today. These SSDs are operating in RAID-0 (striped) mode to provide the maximum amount of bandwidth. Meanwhile it turns out that due to how the card is configured, the OS actually sees the SSD RAID-0 array as well, at least for the prototype design.

To use the SSDs, applications need to be programmed using AMD’s APIs to recognize the existence of the local storage and that it is “special,” being on the same board as the GPU itself. Ultimately the trick for application developers is directly streaming resources from  the SSDs treating it as a level of cache between the DRAM and system storage. The use of NAND in this manner does not fit into the traditional memory hierarchy very well, as while the SSDs are fast, on paper accessing system memory is faster still. But it should be faster than accessing system storage, even if it’s PCIe SSD storage elsewhere on the system. Similarly, don’t expect to see frame buffers spilling over to NAND any time soon. This is about getting large, mostly static resources closer to the GPU for more efficient resource streaming.

To showcase the potential benefits of this solution, AMD had an 8K video scrubbing demonstration going, comparing performance between using a source file on the SSG’s local SSDs, and using a source file on the system SSD (also a 950 Pro).

See what the Radeon™ Pro SSG can do to help drastically improve professional workload enablement #AMDCapsaicinhttps://t.co/ZkcfffSScN

— Radeon Pro (@RadeonPro) July 26, 2016

The performance differential was actually more than I expected; reading a file from the SSG SSD array was over 4GB/sec, while reading that same file from the system SSD was only averaging under 900MB/sec, which is lower than what we know 950 Pro can do in sequential reads. After putting some thought into it, I think AMD has hit upon the fact that most M.2 slots on motherboards are routed through the system chipset rather than being directly attached to the CPU. This not only adds another hop of latency, but it means crossing the relatively narrow DMI 3.0 (~PCIe 3.0 x4) link that is shared with everything else attached to the chipset.

Though by and large this is all at the proof of concept stage. The prototype, though impressive in some ways in its own right, is really just a means to get developers thinking about the idea and writing their applications to be aware of the local storage. And this includes not just what content to put on the SSG’s SSDs, but also how to best exploit the non-volatile nature of its storage, and how to avoid unnecessary thrashing of the SSDs and burning valuable program/erase cycles. The SSG serves an interesting niche, albeit a limited one: scenarios where you have a large dataset and you are somewhat sensitive to latency and want to stay off of the PCIe bus, but don’t need more than 4-5GB/sec of read bandwidth. So it’ll be worth keeping an eye on this to see what developers can do with it.

In any case, while AMD is selling dev kits now, expect some significant changes by the time we see the retail hardware in 2017. Given the timeframe I expect we’ll be looking at much more powerful Vega cards, where the overall GPU performance will be much greater, and the difference in performance between memory/storage tiers is even more pronounced.

NVIDIA Announces Quadro Pascal Family: Quadro P6000 & P5000

NVIDIA Announces Quadro Pascal Family: Quadro P6000 & P5000

If there was one word to describe the launch of NVIDIA’s Pascal generation products, it’s “expedient.” On the consumer side of the business the company has launched 3 different GeForce cards and announced a fourth (Titan X), while on the HPC side the company has already launched their Tesla P100 accelerator, with the PCIe version due next quarter. With the company moving so quickly it was only a matter of time until a Quadro update was announced, and now today at SIGGRAPH 2016 the company is doing just that.

Being announced today are the two Quadro models that will fill out the high-end of the Quadro family, the P6000 and P5000. As hinted at by the name, these are based on NVIDIA’s latest Pascal generation GPUs., marking the introduction of Pascal to the Quadro family. And like NVIDIA’s consumer counterparts, these new cards should offer significant performance and feature upgrades over their Maxwell 2 based predecessors.

NVIDIA Quadro Specification Comparison
  P6000 P5000 M6000 M5000
CUDA Cores 3840 2560 3072 2048
Texture Units 240? 160 192 128
ROPs 96? 64 96 64
Core Clock ? ? N/A N/A
Boost Clock ~1560MHz ~1730MHz ~1140MHz ~1050MHz
Memory Clock 9Gbps GDDR5X 9Gbps GDDR5X 6.6Gbps GDDR5 6.6Gbps GDDR5
Memory Bus Width 384-bit 256-bit 384-bit 258-bit
VRAM 24GB 16GB 24GB 8GB
FP64 1/32 FP32 1/32 FP32 1/32 FP32 1/32 FP32
TDP 250W 180W 250W 150W
GPU GP102 GP104 GM200 GM204
Architecture Pascal Pascal Maxwell 2 Maxwell 2
Manufacturing Process TSMC 16nm TSMC 16nm TSMC 28nm TSMC 28nm
Launch Date October 2016 October 2016 03/22/2016 08/11/2015
Launch Price (MSRP) TBD TBD $5000 $2000

We will start, as always, at the top, with the Quadro P6000. As NVIDIA’s impending flagship Quadro card, this is based on the just-announced GP102 GPU. The direct successor to the GM200 used in the Quadro M6000, the GP102 mixes a larger number of SMs/CUDA cores and higher clockspeeds to significantly boost performance.

Paired with P6000 is 24GB of GDDR5X memory, running at a conservative 9Gbps, for a total memory bandwidth of 432GB/sec. This is the same amount of memory as in the 24GB M6000 refresh launched this spring, so there’s no capacity boost at the top of NVIDIA’s lineup. But for customers who didn’t jump on the 24GB – which is likely a lot of them, including most 12GB M6000 owners – then this is a doubling (or more) of memory capacity compared to past Quadro cards. At this time the largest capacity GDDR5X memory chips we know of (8Gb), so this is as large of a capacity that P6000 can be built with at this time. Meanwhile this is so far the first and only Pascal card with GDDR5X to support ECC, with NVIDIA implementing an optional soft-ECC method for the DRAM only, just as was the case on M6000.

NVIDIA has also sent over pictures of the card design, and confirmed that the card ships with the Quadro 6000-series standard TDP of 250W. Utilizing the same basic metal shroud and blower design as the M6000 cards, the P6000 should be suitable as drop-in replacement for older M6000 cards. Do note however that like M6000, external power is pulled via a single 8-pin power connector, so technically this card is out of spec (not that this was a problem for M6000).

Unfortunately in their zeal to get this announcement out in time for SIGGRAPH – a frequent venue for Quadro announcements – we don’t have specific performance numbers available. NVIDIA has not locked down the GPU clockspeeds, and as a result we don’t just how P6000s clockspeeds and total throughput will compare to M6000’s. It goes without saying that it should be higher, but how much higher remains to be seen.

For overall expected performance, NVIDIA has published that the P6000 is rated for 12 TFLOPs FP32. Given that it’s a fully enabled GP102 we’re looking at, this works out to a clockspeed of around 1560MHz. On paper this gives P6000 around 71% more shading performance and 37% more ROP throughput than the older Maxwell 2 M6000. This also puts the P6000 around 9% ahead of the recently announced NVIDIA Titan X.

On a quick technical note, as this announcement comes just 4 days after NVIDIA announced the GP102 GPU used on this card, this Quadro announcement does confirm a few more things about GP102. Quadro P6000 ships with 3840 CUDA cores (30 SMs), confirming our earlier suspicions that GP102 was a (or at least) 30 SM part. Meanwhile this also confirms that GP102 can be outfit with 24GB of GDDR5X. Finally, NVIDIA has confirmed that there’s no high-speed FP64 support on GP102, which is why we’re looking at a 1/32 rate for even the top Quadro card.

M5000

Moving on, let’s talk about Quadro M5000. Based on NVIDIA’s GP104 GPU, this is the smaller, cheaper, lower power sibling to the P6000. This is a fully enabled part with all 2560 CUDA cores (20 SMs) active, so the performance gains versus M5000 should be similar to what we saw with the consumer GeForce GTX 1080. Clockspeeds are also comparable, so we’re looking at sizable boost in shading/compute/texture performance of 2.06x, and ROP throughput has increased by 65%. Of the two cards, M5000 is going to the bigger upgrade versus its direct predecessor.

Meanwhile on the memory front, P5000 is equipped with 16GB of GDDR5X memory. This is attached to GP104’s 256-bit memory bus, and like P6000 is clocked at 9Gbps. P5000’s predecessor, M5000, maxed out at just 8GB of memory, so along with a 36% increase in memory bandwidth, this doubles the amount of memory available for a Quadro 5000 tier card.

Looking at the card design itself, to no surprise it strongly resembles the M5000, with its plastic blower dressed up in Quadro livery. The card’s TDP stands at 180W, which is a slight increase over M5000, but shouldn’t too significantly impact the drop-in replacement nature of the design.

Pascal Features & Availability

Along with the significant performance increase afforded by the Pascal architecture and TSMC’s 16nm FinFET manufacturing process, the other big news here is of course the functionality that comes to the Quadro P-series courtesy of Pascal. While for our regular readers there’s nothing new we haven’t seen already with GeForce, Pascal’s new functionality will apply a bit differently to the Quadro lineup.

Perhaps the biggest change here is Pascal’s new display controller. With both the P6000 and P5000 shipping with 4 DisplayPorts, the DisplayPort 1.4 capable controller means that both cards can now support higher resolutions and refresh rates. Whereas the M-series maxed out at 4 4K@60Hz monitors, the P-series can now handle 4 monitors running 5K@60Hz, 4 4K monitors running at 120Hz, or even 8K monitors with additional limitations. Do note however that the per-card monitor limit is still 4 displays, as this is as many displays as Pascal can support.

Speaking of multiple displays, alongside the Quadro card announcements NVIDIA is also announcing a new Quadro Sync card, the aptly named Quadro Sync 2. The multi-adapter/multi-display timing synchronization card is being updated to support the Pascal cards, and will support a larger number of adapters as well. The new Sync 2 will support 8 cards in sync, as opposed to 4 on the original Sync card. Coupled with the 4 display per card capability of Pascal, and this means synchronized video walls and other systems can now be built out to 32 displays.

NVIDIA will also be heavily promoting Simultaneous Multi-Projection (SMP), the company’s multi-viewport technology. Like the consumer cards, VR is a big driver here, with NVIDIA looking reach out to VR developers. NVIDIA is also pitching this at VR CAVE systems, as they can see similar benefits from SMP’s geometry reprojection.

Taking a look at the overall Quadro lineup, the P6000 and P5000 will at least for the time being be sitting alongside the existing M4000 and lower cards. Within the Quadro lineup these cards are meant for the most demanding workloads– massive memory sets and complex rendering/compute tasks – and they will be priced accordingly. Specific pricing has not been announced, but NVIDIA tells us to expect them to be priced similarly to the last generation cards. This would work out to $5000+ for Quadro P6000, and $2000+ for Quadro P5000 at launch.

Finally, as we mentioned before NVIDIA was announcing these cards early, before the final clockspeeds have been locked down. This means that while the cards are being announced today, they won’t launch for another two months; NVIDIA expects them to be available in early October. It’s not unusual for Quadro cards to be announced ahead of time, though as SIGGRAPH is also a popular venue for AMD pro card announcements, the earlier than usual announcement may have been for multiple reasons.

Ecosystem Announcements: New SDKs, Iray VR, & OptiX 4

Along with the announcement of the Quadro P-series, NVIDIA is also using SIGGRAPH to announce updates to various software and ecosystem initiatives within the company. Overall a number of the company’s SDKs are receiving an update in some form, ranging from rendering to video encode and capture, the latter taking advantage of Pascal’s 8K encode/decode capabilities.

Of particular note here, NVIDIA’s Iray physically based render plugin for 3D modeling applications is getting a significant update. As with other parts of their ecosystem, NVIDIA is doubling down on VR here as well. The next update to Iray will include support for generating panoramic VR lightfields – think high detail fixed position 3D panoramas – which can then be displayed on other devices. NVIDIA has been showing off an early version of this technology at GTC 2016 and other events, where it was used to show off renders of the company’s under-construction headquarters.

The Iray update will also be part of a larger focus on integrating the company’s software with their DGX-1 server, which incorporates 8 Tesla P100 accelerators. Iray will be coming to DGX-1 this fall, supporting the same features that are already available in multi-GPU setups with the older Quadro VCA. Longer term, in 2017, the company will be adding NVLink support for better multi-GPU scaling.

NVIDIA’s OptiX ray tracing engine is the other product that’s getting a DGX-1 update. OptiX 4.0, which is being released this week, adds support for the DGX-1, including NVLink support. It is interesting to note though that the company is only supporting clusters of 4 GPUs, despite the fact that DXG-1 has 8 GPUs (the other 4 GPUs form a second cluster). This may mean that OptiX needs direct GPU links to perform best – as in an 8-way configuration, some GPUs are 2 hops away – or it may just be that OptiX naturally doesn’t scale well beyond 4 GPUs.

Finally, NVIDIA is also announcing a change to how mental ray support is handled for Maya. Previous, integrating the ray tracer with Maya was handled by Autodesk, but NVIDIA is currently in the process of taking that over. The goal of doing so is to allow mental ray to be updated and have features added at the more brisk pace that NVIDIA tends to work at. The new plugin is currently scheduled to ship in September, and as one of their first actions, NVIDIA will be integrating a new global illumination engine, GI-Next.

NVIDIA Announces Quadro Pascal Family: Quadro P6000 & P5000

NVIDIA Announces Quadro Pascal Family: Quadro P6000 & P5000

If there was one word to describe the launch of NVIDIA’s Pascal generation products, it’s “expedient.” On the consumer side of the business the company has launched 3 different GeForce cards and announced a fourth (Titan X), while on the HPC side the company has already launched their Tesla P100 accelerator, with the PCIe version due next quarter. With the company moving so quickly it was only a matter of time until a Quadro update was announced, and now today at SIGGRAPH 2016 the company is doing just that.

Being announced today are the two Quadro models that will fill out the high-end of the Quadro family, the P6000 and P5000. As hinted at by the name, these are based on NVIDIA’s latest Pascal generation GPUs., marking the introduction of Pascal to the Quadro family. And like NVIDIA’s consumer counterparts, these new cards should offer significant performance and feature upgrades over their Maxwell 2 based predecessors.

NVIDIA Quadro Specification Comparison
  P6000 P5000 M6000 M5000
CUDA Cores 3840 2560 3072 2048
Texture Units 240? 160 192 128
ROPs 96? 64 96 64
Core Clock ? ? N/A N/A
Boost Clock ~1560MHz ~1730MHz ~1140MHz ~1050MHz
Memory Clock 9Gbps GDDR5X 9Gbps GDDR5X 6.6Gbps GDDR5 6.6Gbps GDDR5
Memory Bus Width 384-bit 256-bit 384-bit 258-bit
VRAM 24GB 16GB 24GB 8GB
FP64 1/32 FP32 1/32 FP32 1/32 FP32 1/32 FP32
TDP 250W 180W 250W 150W
GPU GP102 GP104 GM200 GM204
Architecture Pascal Pascal Maxwell 2 Maxwell 2
Manufacturing Process TSMC 16nm TSMC 16nm TSMC 28nm TSMC 28nm
Launch Date October 2016 October 2016 03/22/2016 08/11/2015
Launch Price (MSRP) TBD TBD $5000 $2000

We will start, as always, at the top, with the Quadro P6000. As NVIDIA’s impending flagship Quadro card, this is based on the just-announced GP102 GPU. The direct successor to the GM200 used in the Quadro M6000, the GP102 mixes a larger number of SMs/CUDA cores and higher clockspeeds to significantly boost performance.

Paired with P6000 is 24GB of GDDR5X memory, running at a conservative 9Gbps, for a total memory bandwidth of 432GB/sec. This is the same amount of memory as in the 24GB M6000 refresh launched this spring, so there’s no capacity boost at the top of NVIDIA’s lineup. But for customers who didn’t jump on the 24GB – which is likely a lot of them, including most 12GB M6000 owners – then this is a doubling (or more) of memory capacity compared to past Quadro cards. At this time the largest capacity GDDR5X memory chips we know of (8Gb), so this is as large of a capacity that P6000 can be built with at this time. Meanwhile this is so far the first and only Pascal card with GDDR5X to support ECC, with NVIDIA implementing an optional soft-ECC method for the DRAM only, just as was the case on M6000.

NVIDIA has also sent over pictures of the card design, and confirmed that the card ships with the Quadro 6000-series standard TDP of 250W. Utilizing the same basic metal shroud and blower design as the M6000 cards, the P6000 should be suitable as drop-in replacement for older M6000 cards. Do note however that like M6000, external power is pulled via a single 8-pin power connector, so technically this card is out of spec (not that this was a problem for M6000).

Unfortunately in their zeal to get this announcement out in time for SIGGRAPH – a frequent venue for Quadro announcements – we don’t have specific performance numbers available. NVIDIA has not locked down the GPU clockspeeds, and as a result we don’t just how P6000s clockspeeds and total throughput will compare to M6000’s. It goes without saying that it should be higher, but how much higher remains to be seen.

For overall expected performance, NVIDIA has published that the P6000 is rated for 12 TFLOPs FP32. Given that it’s a fully enabled GP102 we’re looking at, this works out to a clockspeed of around 1560MHz. On paper this gives P6000 around 71% more shading performance and 37% more ROP throughput than the older Maxwell 2 M6000. This also puts the P6000 around 9% ahead of the recently announced NVIDIA Titan X.

On a quick technical note, as this announcement comes just 4 days after NVIDIA announced the GP102 GPU used on this card, this Quadro announcement does confirm a few more things about GP102. Quadro P6000 ships with 3840 CUDA cores (30 SMs), confirming our earlier suspicions that GP102 was a (or at least) 30 SM part. Meanwhile this also confirms that GP102 can be outfit with 24GB of GDDR5X. Finally, NVIDIA has confirmed that there’s no high-speed FP64 support on GP102, which is why we’re looking at a 1/32 rate for even the top Quadro card.

M5000

Moving on, let’s talk about Quadro M5000. Based on NVIDIA’s GP104 GPU, this is the smaller, cheaper, lower power sibling to the P6000. This is a fully enabled part with all 2560 CUDA cores (20 SMs) active, so the performance gains versus M5000 should be similar to what we saw with the consumer GeForce GTX 1080. Clockspeeds are also comparable, so we’re looking at sizable boost in shading/compute/texture performance of 2.06x, and ROP throughput has increased by 65%. Of the two cards, M5000 is going to the bigger upgrade versus its direct predecessor.

Meanwhile on the memory front, P5000 is equipped with 16GB of GDDR5X memory. This is attached to GP104’s 256-bit memory bus, and like P6000 is clocked at 9Gbps. P5000’s predecessor, M5000, maxed out at just 8GB of memory, so along with a 36% increase in memory bandwidth, this doubles the amount of memory available for a Quadro 5000 tier card.

Looking at the card design itself, to no surprise it strongly resembles the M5000, with its plastic blower dressed up in Quadro livery. The card’s TDP stands at 180W, which is a slight increase over M5000, but shouldn’t too significantly impact the drop-in replacement nature of the design.

Pascal Features & Availability

Along with the significant performance increase afforded by the Pascal architecture and TSMC’s 16nm FinFET manufacturing process, the other big news here is of course the functionality that comes to the Quadro P-series courtesy of Pascal. While for our regular readers there’s nothing new we haven’t seen already with GeForce, Pascal’s new functionality will apply a bit differently to the Quadro lineup.

Perhaps the biggest change here is Pascal’s new display controller. With both the P6000 and P5000 shipping with 4 DisplayPorts, the DisplayPort 1.4 capable controller means that both cards can now support higher resolutions and refresh rates. Whereas the M-series maxed out at 4 4K@60Hz monitors, the P-series can now handle 4 monitors running 5K@60Hz, 4 4K monitors running at 120Hz, or even 8K monitors with additional limitations. Do note however that the per-card monitor limit is still 4 displays, as this is as many displays as Pascal can support.

Speaking of multiple displays, alongside the Quadro card announcements NVIDIA is also announcing a new Quadro Sync card, the aptly named Quadro Sync 2. The multi-adapter/multi-display timing synchronization card is being updated to support the Pascal cards, and will support a larger number of adapters as well. The new Sync 2 will support 8 cards in sync, as opposed to 4 on the original Sync card. Coupled with the 4 display per card capability of Pascal, and this means synchronized video walls and other systems can now be built out to 32 displays.

NVIDIA will also be heavily promoting Simultaneous Multi-Projection (SMP), the company’s multi-viewport technology. Like the consumer cards, VR is a big driver here, with NVIDIA looking reach out to VR developers. NVIDIA is also pitching this at VR CAVE systems, as they can see similar benefits from SMP’s geometry reprojection.

Taking a look at the overall Quadro lineup, the P6000 and P5000 will at least for the time being be sitting alongside the existing M4000 and lower cards. Within the Quadro lineup these cards are meant for the most demanding workloads– massive memory sets and complex rendering/compute tasks – and they will be priced accordingly. Specific pricing has not been announced, but NVIDIA tells us to expect them to be priced similarly to the last generation cards. This would work out to $5000+ for Quadro P6000, and $2000+ for Quadro P5000 at launch.

Finally, as we mentioned before NVIDIA was announcing these cards early, before the final clockspeeds have been locked down. This means that while the cards are being announced today, they won’t launch for another two months; NVIDIA expects them to be available in early October. It’s not unusual for Quadro cards to be announced ahead of time, though as SIGGRAPH is also a popular venue for AMD pro card announcements, the earlier than usual announcement may have been for multiple reasons.

Ecosystem Announcements: New SDKs, Iray VR, & OptiX 4

Along with the announcement of the Quadro P-series, NVIDIA is also using SIGGRAPH to announce updates to various software and ecosystem initiatives within the company. Overall a number of the company’s SDKs are receiving an update in some form, ranging from rendering to video encode and capture, the latter taking advantage of Pascal’s 8K encode/decode capabilities.

Of particular note here, NVIDIA’s Iray physically based render plugin for 3D modeling applications is getting a significant update. As with other parts of their ecosystem, NVIDIA is doubling down on VR here as well. The next update to Iray will include support for generating panoramic VR lightfields – think high detail fixed position 3D panoramas – which can then be displayed on other devices. NVIDIA has been showing off an early version of this technology at GTC 2016 and other events, where it was used to show off renders of the company’s under-construction headquarters.

The Iray update will also be part of a larger focus on integrating the company’s software with their DGX-1 server, which incorporates 8 Tesla P100 accelerators. Iray will be coming to DGX-1 this fall, supporting the same features that are already available in multi-GPU setups with the older Quadro VCA. Longer term, in 2017, the company will be adding NVLink support for better multi-GPU scaling.

NVIDIA’s OptiX ray tracing engine is the other product that’s getting a DGX-1 update. OptiX 4.0, which is being released this week, adds support for the DGX-1, including NVLink support. It is interesting to note though that the company is only supporting clusters of 4 GPUs, despite the fact that DXG-1 has 8 GPUs (the other 4 GPUs form a second cluster). This may mean that OptiX needs direct GPU links to perform best – as in an 8-way configuration, some GPUs are 2 hops away – or it may just be that OptiX naturally doesn’t scale well beyond 4 GPUs.

Finally, NVIDIA is also announcing a change to how mental ray support is handled for Maya. Previous, integrating the ray tracer with Maya was handled by Autodesk, but NVIDIA is currently in the process of taking that over. The goal of doing so is to allow mental ray to be updated and have features added at the more brisk pace that NVIDIA tends to work at. The new plugin is currently scheduled to ship in September, and as one of their first actions, NVIDIA will be integrating a new global illumination engine, GI-Next.