GPUs


Some Thoughts on Apple’s Metal API

Some Thoughts on Apple’s Metal API

Though it seems like Apple’s hardware divisions can hardly keep a secret these days due to the realities of mass production, the same is fortunately not true for their software divisions. Broad strokes aside, Apple managed to pack in a number of surprises in their OS X and iOS presentations at WWDC yesterday, and there’s nothing that ended up being quite as surprising to me as the announcement of the Metal API for iOS.

Later this week Apple will be holding their Metal developers sessions, at which time we’ll hopefully get some further details on the API and just how Apple intends to have developers use it. In the meantime with the preliminary Metal programming guide posted over on Apple’s developer website, I wanted to spend a few minutes musing over yesterday’s announcement, how Apple ended up developing their own API, and what this may mean for users and game developers.

Why Low-Overhead APIs?

First and foremost, let’s quickly recap just what exactly Apple has announced. Metal is Apple’s forthcoming low-overhead/low-level graphics and compute API for iOS. Metal is primarily geared towards gaming on iOS, and is intended to offer better graphics performance than the existing OpenGL ES API by curtailing driver overhead and giving developers more direct control over the GPU.

As our regular readers are no doubt well aware, Metal is the latest in a wave of low-level graphics APIs to be introduced over the last year in the GPU space, joining the ranks of AMD’s Mantle and Microsoft’s DirectX 12. In the case of Metal, as has been the case of all of these APIs, the idea is rooted in the fact that while high level APIs provide a number of important features from libraries to hardware abstraction, the overhead from this functionality is not worth the benefits, especially in the hands of highly seasoned programmers who have the experience and the means to go close-to-metal and bang on the hardware directly. The situation facing developers in these cases is that at a time when GPU performance growth is rapidly outpacing CPU performance growth, the API and driver overhead has gone from problematic to intolerable, leading to developers wanting to access the hardware directly.


How The Low-Level Mantle API Benefitted DICE’s Frostbite Engine

Metal in turn is the API through which Apple will provide this access. By peeling back the driver and API stack to the bare minimum, developers get to tell the GPU exactly what they’re doing and how they want it done, bypassing large chunks of CPU-intensive code that would previously do this for the developer. Whenever we’re talking about these low-level APIs it’s important to note that they’re merely ways to improve efficiency and are not miracle workers, but when faced with the most applicable bottleneck, the draw call – what’s essentially a single function call for the GPU – the increase in throughput can be remarkable. We won’t spend too much more time on the why’s of Metal, as we’ve written much longer outlines on low-level APIs before that don’t need repeated here, but it’s important to establish a baseline for evaluating Metal.

Are SoCs Draw Call Limited?

Upon hearing Apple’s Metal announcement, perhaps the greatest surprise was that iOS developers were in a position where they needed and could benefit from a low-level API like Metal. In the PC space we’ve been seeing low-level APIs rolled out as a solution to the widening gap between CPU and GPU performance, however the SoC class processors in Apple’s iOS devices are a very different beast. As one would expect for a mobile product, neither the CPU nor the GPU is high performance by PC standards, so why should a low-level API be necessary.

The answer to that is that while SoCs are lower performance devices, the same phenomena that has driven low-level APIs on the PC has driven them on mobile devices, just on a smaller scale. GPU performance is outgrowing CPU performance on the SoC level just as it has been the PC level, and even worse, SoC class CPUs are so slow that even small amounts of driver overhead can have a big impact. While we take 4000 draw calls for granted on desktop hardware – overhead and all – it’s something of a sobering reminder that this isn’t possible on even a relatively powerful SoC like the A7 with OpenGL ES, and that it took Metal for Crytek to get that many draw calls in motion, never mind other CPU savings such as precompiled shaders. If Apple intends to further gaming on iOS (and all signs are that they do), then capable programmers are going to want low level GPU access to maximize their graphical quality, the same as they do on the desktop and on game consoles.


Apple Metal Thread Model (Note that no Apple SoC has more than 2 CPU cores yet)

Ecosystems & Portability

But on that note there’s quite a bit that goes into providing developers with these kinds of tools, which puts Apple in a very interesting position among hardware and OS vendors. Of the other low-level APIs we’ve seen so far – AMD’s Mantle and Microsoft’s DirectX 12 – the former is an API established by a hardware vendor who has to ride on top of other companies CPUs and OSes, and the latter is an OS vendor who has to ride on top of third party CPUs and GPUs. Apple on the other hand is in the enviable position of being as close as anyone can be to offering a fully vertical ecosystem. Apple designs their own CPUs, configures their own SoCs, and writes their own OS. The only portion of the chain that Apple doesn’t control is the GPU, and even then the company has exclusively used Imagination Technologies’ PowerVR GPUs for the last 7 years with no signs of this changing. So for all practical purposes Apple has a closed ecosystem that they control from top to bottom, and can design for accordingly.

A closed ecosystem in turn means that Apple can achieve a level of OS, hardware, and programming language integration that no one else can achieve. Metal doesn’t need to take into consideration any other GPU architectures (though Apple in all likelihood has left it generic enough to be portable if the situation arises) and the OS around it can be tailored to the API, rather than making the API fit within the confines of the OS. This doesn’t necessarily mean Apple is going to make significant use of this integration, but it will be interesting to see just what Apple does do with so much control.


A7 SoC Floorplan (Image Courtesy Chipworks)

Another interesting thing to see as Metal plays out is how Apple handles portability from OpenGL ES, that is if they try to handle it at all. On the whole, it’s accepted that a low-level API like Metal will have minimal portability from higher level languages such as OpenGL ES. The exception to this thus far has been that due to the fundamentally low level nature of shader programs, that shader programs have been more portable. In the case of AMD’s Mantle, for example, we have seen AMD specifically support DirectX’s shader language – HLSL – to make porting to Mantle easier. Shader programs are just one part of a bigger picture, but their growing complexity and low level nature means that there are still benefits to being able to port them among APIs even when the API commands themselves are not portable.

At least for the moment, Apple’s Metal programming guide makes no mention of porting from the existing OpenGL ES API. Looking at the Metal shader language and comparing it to the OpenGL ES shader language (GLSL ES), while it’s initially promising since both languages are based on C++, it’s also clear that for better or worse Apple hasn’t held back from eclipsing OpenGL ES here. Metal’s shader language is based on a newer version of C++, C++11, and consequently includes features not available in GLSL ES. Furthermore comparing the function libraries there are again a number of identical functions, and yet more functions that the two shader languages do not have in common. Portability out of Metal aside, it’s not at all clear whether GLSL ES shaders are meaningfully portable into Metal; if they aren’t then that means additional work for developers, a specific concern if Apple is trying to land console-like games for iOS devices. So it will be interesting to see how this plays out.

Of course Android portability is also going to raise a flag here, though at first glance it actually seems unlikely that this will be a concern. Without an equivalent API – and the OpenGL AZDO concept isn’t going to be fully applicable to OpenGL ES – the games that benefit the most from Metal are also the games least likely to be on Android, so while portability from Android looks far from easy, there also appears to be little need to handle it. Android portability would seem to be best handled by traditional porting methods using OpenGL ES, which retains its common API status and will be sufficient for the kinds of games that will run on both ecosystems.

Metal Computing

On a final note, while we’ve discussed graphics almost exclusively thus far, it’s interesting to note that Apple is pitching Metal as an API for GPU compute as well as graphics. Despite being one of the initial promoters of the OpenCL API, Apple has never implemented OpenCL or any other GPU compute API on iOS thus far, even after they adopted the compute-friendly PowerVR Rogue GPU for their A7 SoC. As a result GPU compute on iOS has been limited to what OpenGL ES can be coaxed into, which although not wholly incapable, it is an API designed for dealing with images as opposed to free form data.

The low-level nature of Metal on the other hand means that it’s a good (or at least better) fit for GPU computing, as the lack of abstraction for graphics makes it more capable of handling the workflows and data types of compute tasks. This is one area in particular where the Metal shader language being based on a subset of C++11 is a benefit to Apple, as it provides a solid foundation for writing compute kernels. None the less it remains to be seen just how adaptable Metal is – can it match the compute functionality of OpenCL 1.2 or even OpenGL 4.x compute shaders – but even if it’s only of limited use it means Apple is finally ready to approach GPU computing on iOS devices.

ASUS Launches ROG Ares III for Water Cooling Builds

ASUS Launches ROG Ares III for Water Cooling Builds

The Radeon R9 295X2, or any graphics card that comes pre-liquid cooled, comes up against a barrier.  There will be a market segment that cares more about the card than the cooling, and would rather not have to spend the extra on the cooling be…

GeForce Experience 2.1 Released

GeForce Experience 2.1 Released

It has been a bit over 2 years since NVIDIA first announced GeForce Experience, and while it took them a bit longer to get off the ground than they had planned on, since then they’ve been quickly iterating on the utility to add features and f…

Best Video Cards: May 2014

Best Video Cards: May 2014

We’re back once again with our monthly guide to video cards and video card industry recap, this time for May of 2014.

All things considered, the month of May has been extremely quiet in the land of video cards, even more so than April was. There have been no major product announcements from either NVIDIA or AMD this month – not that we were expecting any – so the video card market hasn’t shifted very much this month compared to the bigger shake-ups of earlier this year.

The biggest change for the month of May is of course yesterday’s launch of NVIDIA’s GeForce GTX Titan Z. NVIDIA’s dual-GPU flagship was originally scheduled for release in April but was held back to this month, leaving a two month gap between announcement and release. The GTX Titan Z is NVIDIA’s most powerful card yet, packing a pair of GK110 GPUs and all functionality/features that have come to define the Titan family, including uncapped (1/3rd rate) double precision performance and 6GB of VRAM per GPU. However at a price tag of $3000 it’s twice the price of AMD’s dual-GPU R9 295X2 and NVIDIA’s more conservative power consumption and clockspeeds means that they face an uphill battle when it comes to performance. Consequently the GTX Titan Z is being treated as more a compute card than a gaming card by NVIDIA, though if money is no object then it can certainly be used as a gaming card and should turn in some impressive numbers.

The launch of the GTX Titan Z also coincides with the first WHQL release of NVIIDA’s R337 drivers, 337.88.  R337 is a performance optimization heavy driver branch for NVIIDA, incorporating a number of smaller optimizations with an apparent focus on cutting down on CPU driver overhead. 337.88 also includes support for the new generation of single tile (SST) 4K monitors, which will eventually render the existing multiple tile (MST) 4K monitors obsolete.

Meanwhile in the AMD camp, AMD engaged in a small amount of price shuffling among their product lineup. The Radeon R9 280, AMD’s 200 series analogue to the venerable 7950, received a price cut from $279 to $249. This is in reflection of the fact that the R9 280X has been back to its MSRP for several weeks now, and as such the R9 280 was overpriced for the performance it delivered. At $249 it’s now much more competitive against AMD’s other cards while capable of putting the squeeze on NVIDIA’s GeForce GTX 760.

And not to be outdone on the driver front, AMD had their own major driver release this month with the release of the Catalyst 14.6 betas. These drivers overhaul AMD’s Eyefinity functionality, incorporating support for new modes that can be used with mixed resolution monitors. AMD can’t work magic, but between the new Fit and Expand modes they have greatly increased the usability of Eyefintiy with disparate monitors, making it possible to work with Eyefinity when mixing old and new monitors, cheap and expensive monitors, etc.

Anyhow, market summaries behind us, let’s look at individual recommendations. As always, we’ve laid out our ideas of price/performance bands and recommendations in our table below, with our full explanations and alternative options to follow. As always, in the case of the sub-$200 market it’s worth pointing out that there’s a video card for roughly every $10, so picking a good video card is as much about budgets as it is finding an especially strong card.

May 2014 GPU Performance Guide
Performance Band Price Range Recommendation
1080p (Low) $99-$149 AMD Radeon R7 250X
1080p (Med)
$149-$199
1080p (High)
$199-$289
1440p (Med)
$289-$399
1440p (High)
$399-$649
1440p (Max)
$649+
4K/Multi-Monitor (High)
$1000+

As a general recommendation for gaming, we suggest starting at $99. There are cards below this price, but the amount of performance you have to give up below $99 far outweighs the cost. Even then, performance gains will generally exceed the price increases up to $150 or so.

Meanwhile for gamers looking for high quality 1080p gaming or better, that will start at $199. Going above that will find cards that are good for 1440p, 4K, and multi-monitor, while going below that will find cards that will require some quality sacrifices to stay at 1080p.

Finally, this guide is by its very nature weighted towards price/performance, based on the passionate feedback we’ve received from our readers. For these purposes we consider AMD and NVIDIA to be equal from a functionality and compatibility perspective, but it should be said that both parties have been building out their ecosystem in the past year, and this will only continue to grow as the two companies try to differentiate themselves. So if you need or want functionality beyond the core functionality a video card offers, it may be worthwhile to familiarize yourself with the NVIDIA and AMD ecosystems, including Gameworks, Eyefinity, Mantle, GeForce Experience, and more.

Budget (<$100): AMD Radeon R7 250X

At $99 there is no other card even worth considering besides AMD’s Radeon R7 250X. NVIDIA’s closest cards remain more expensive for the performance, so compared to both NVIDIA and AMD’s lineups the R7 250X will outperform any AMD or NVIDIA card at similar prices, making it the fastest thing at this price.

From a performance perspective the R7 250X isn’t going to quite hit the sweet spot we outlined earlier, but for those gamers on a strict budget it will get the job done. In the long run it should be able to run most games even at 1080p with medium-to-low settings, along with keeping texture quality down a notch to account for its 1GB of VRAM. Battlefield, GRID 2, and even Total War: Rome 2 can easily hum along on this card at decent settings at 1080p.

Mainstream Sweet Spot ($149): AMD Radeon R7 265

Among the crowded $149 market our primary recommendation is the Radeon R7 265, AMD’s recently launched Pitcairn card designed to lock in this price point. Essentially a 7850 with a higher GPU clockspeed and a revised memory bus allowing for higher memory clockspeeds, the R7 265 a very capable card for the price.

From a performance standpoint the R7 265 not going to be able to play every game at 1080p at high settings, but it will be fast enough for medium-to-high depending on the game, which will be a couple of notches higher than what the $99 cards can do. Meanwhile the 2GB of VRAM will mean that future games shouldn’t bog down the card quite as badly; higher graphical fidelity games will slow it down like any other card, but there’s enough VRAM to keep up with the demands of higher resolution textures and heavier use of intermediate buffers.

Runner Up: NVIDIA GeForce GTX 750 Ti

Since our guide is written on the assumption that most buyers are looking for the best performance at a given price, our performance recommendations are going to favor AMD, as they’re more willing to throw out larger, more powerful cards at these sub-$200 price bands. NVIDIA on the other hand isn’t going to be able to directly compete with AMD on price/performance, but they do have an interesting technological advantage for gamers who are looking for a different set of tradeoffs.

Powered by NVIDIA’s Maxwell architecture, at a price these days closer to $139 NVIDIA is able to offer the GeForce GTX 750 Ti, a card that offers performance approaching the R7 265 with much lower power consumption. The GTX 750 Ti is a sub-75W card – no external PCIe power connector required – allowing it to work in cases and systems where the near-150W R7 265 cannot, while also offering the improved acoustics that come with lower power consumption. So for OEM upgrades, or buyers just looking for something even quieter, the GTX 750 Ti is an interesting alternative. Just keep in mind that from a performance standpoint it trails the R7 265 by about 16%.

There are also a pair of options between here and the R7 250X that bear mentioning. The R7 260X resides at $119 and goes up against the GTX 750, with AMD holding a performance advantage similar to R7 265 vs. GTX 750 Ti. We’re fans of stepping up to the greater performance of the $149 cards, but it does offer something between $99 and $149.

1080p Gaming ($199): AMD Radeon R9 270X

Moving up our product lists, at $199 we’re finally up to cards that are fast enough to play most games at 1080p with high-to-ultra settings. More powerful/expensive cards will offer a further edge for the most demanding games, along with offering a bit more longevity, but for most games at the extremely common resolution of 1080p, it only takes $209 to hit a great level of graphical fidelity for that resolution.

To that end there is no better card at this price than AMD’s Radeon R9 270X. Based on a fully enabled Pitcairn GPU, 270X easily offers the most bang for the buck, keeping its distance from NVIDIA’s GTX 660 while getting rather close to NVIDIA’s more expensive GeForce GTX 760.

Runner Up: NVIDIA GeForce GTX 760: The GeForce GTX 760 offers a small but respectable performance lead over the Radeon R9 270X. On a pure price/performance basis it doesn’t make sense, and at $239 sits in an odd gap between the $199 270X and the more capable $300 cards, but for buyers looking for an NVIDIA option for 1080p gaming around $199, it’s as close as one can get.

Reaching For 1440p ($289): AMD Radeon R9 280X

Based on AMD’s venerable Tahiti GPU, the Radeon R9 280X offers the performance of the Radeon HD 7970 at around half the launch price of the aforementioned card. Since coming back down to its MSRP the 280X has continued to fall in price some, and lower-end cards can now regularly be found on sale for less than $299.

For the 280X we’re looking at a card that will straddle 1080p and 1440p. It’s not quite fast enough to work in every game at 1440p with high quality settings, but it’s fast enough for many of them. Alternatively, it’s fast enough at 1080p that it has no problem at that resolution with everything cranked up, including high levels of MSAA and even SSAA in some games. Plus the 3GB of VRAM will give it some leg room if future games demand more VRAM.

On a competitive basis the R9 280X performs very similarly to NVIDIA’s GeForce GTX 770, generally trailing by a few percent. However with AMD making the conscientious decision to undercut NVIDIA on pricing here, it gets our nod for this bracket.

Runner Up: NVIDIA GeForce GTX 770

NVIDIA’s counterpart to the R9 280X is the GeForce GTX 770. With the cheaper GTX 770 cards now available for around $319 it still holds a price premium over AMD, but less than it once was. The GTX 770 is ever so slightly faster than the 280X – leading by a few percent on average – which generally isn’t enough to offset the price difference, but it goes without saying that it leaves the two cards close. The GTX 770 is a perfectly practical alternative to the 280X in this case, trading a slightly higher price tag for entry into NVIDIA’s ecosystem, something that may become more important as G-Sync monitors are scheduled to become available next month.

Extreme Performance for Cheap ($399): AMD Radeon R9 290

For gamers who want to top-tier performance without completely breaking the bank, AMD’s Radeon R9 290 is easily going to be the card of choice. Offering performance that rivals the more expensive R9 290X and GeForce GTX 780, the 290 is unparalleled in performance for its price. At this level of performance it should be able to run virtually anything at 1440p with high-to-extreme settings, and 1080p gamers should have no trouble hitting 120fps in anything that isn’t CPU limited to begin with.

As the vast majority of reference-style cards have gone out of circulation by now, there is thankfully a good selection of superior semi-custom and fully-custom cards at $399 (and sometimes on sale for even less). These cards offer all of the fantastic performance of the 290 without the noise or throttling drawbacks of the reference 290. Just keep in mind that all of these cards will be open air cooled cards that will want a case environment that can dissipate the additional 250W-300W of heat.

Extreme Performance with Refinement ($499): NVIDIA GeForce GTX 780

Positioned as NVIDIA’s lowest tier GK110 card, the high performance offered by the GeForce GTX 780 means it should be fast enough to run virtually anything at 1440p with high-to-extreme settings, and 1080p gamers should have no trouble hitting 120fps in anything that isn’t CPU limited to begin with. To that end the GTX 780 is the cheapest card that can drive all sub-4K single-monitor setups, giving it a sweet spot position of its own in the current market.

As an added bonus, even at the $499 base price this gets access to NVIDIA’s impressive metal shrouded blower, which in our tests is enough to keep noise levels under 50dB. So for gamers looking for a balance between performance, cooling effectiveness, and noise, the GTX 780 is a star. Meanwhile for gamers looking at open air coolers, GTX 780 cards with alternative coolers such as EVGA’s ACX cooler will find that the GTX 780 can be even quieter for the usual tradeoff between a blower and an open air cooler.

Runner Up: Radeon R9 290X

With prices on the R9 290X falling as low as $499 these days, the 290X has shifted over to being direct competition for the GTX 780. Like the R9 290 the supply of reference cards is nearly gone, so everything from top to bottom is an acoustically superior semi/fully-custom open air cooled card. Looking primarily at the cheapest cards – what are essentially near-reference in clockspeeds and features – the R9 290X puts up a good fight with the GTX 780, capable of edging it out at the same retail price. The only issue the 290X faces is that the 290 is so close in performance that it doesn’t have much of a lead here, making it harder to stand apart from the 290 and the GTX 780.

Taking the Single-GPU Crown For Gaming ($649): NVIDIA GeForce GTX 780 Ti

For the fastest single-GPU card on the market for gamers, NVIDIA’s top tier GK110 part, GeForce GTX 780 Ti, stands alone. The performance advantage over the Radeon R9 290X (or even the 290 for that matter) is not incredible, but there’s admittedly nothing new about paying a notable price premium for the very best card.

4K for Me ($1000+): 2x AMD Radeon R9 290X

Though the Radeon R9 290X doesn’t make a ton of sense on its own in light of competition from the GTX 780 and R9 290, if we want to move into 4K gaming and the extreme load it presents, a pair of 290Xs becomes a very tantalizing option. Thanks to AMD’s XDMA engine the 290X has no problem scaling up to 4K in Crossfire, taking AMD’s decent single-card 4K performance and scaling it up to something that allows for 4K without the quality compromises. Considering 60Hz 4K monitors still run for $800+ and demand incredible GPU performance to drive them, to get the most of of such a monitor it doesn’t make sense to pair it up with anything less than a pair of 290Xs.

Of course the Radeon R9 295X2 deserves a mention here. As we discussed in our recap of April, AMD’s recently launched dual-GPU flagship card offers all of the performance of the 290X in Crossfire with much better acoustics and in a smaller package. Since our guide is based first and foremost on price/performance our primary recommendation for this bracket is going to be the 290X CF, but if you can make the steep climb to $1500 the 295X2 and its liquid cooler is a very impressive product whose vastly improved acoustics make 295X2 a superior option to 290X CF.

Meanwhile the GTX 780 Ti in SLI is also going to be a viable alternative here. From a performance perspective it will trail the AMD setups by 5% or so at 4K, so while it can’t match the AMD setups hit-for-hit it doesn’t significantly fall behind, making it practical to get similar performance in the NVIDIA ecosystem. The catch is that at $1300 for the dual card setup it’s closer to the 295X2 than the 290X CF in price, so it doesn’t have a distinct sweet spot on price or acoustics like either AMD configuration. But unlike either AMD option, the GTX 780 Ti is available in a high quality blower configuration, allowing a 3rd option between the widely spaced open air cards of R9 290X and the unconventional CLLC of the R9 295X2.

AMD Posts Mantle Whitepaper

AMD Posts Mantle Whitepaper

As part of a larger Mantle promotion, AMD has posted a number of blogs on their site detailing their low level API. The blog posts themselves are unabashedly closer to advertising than technical writing, but as something of a diamond in the rough AMD has also published a whitepaper on Mantle.

At 11 pages long the Mantle whitepaper offers a solid high level overview of the technology. In it AMD delves into further detail about several aspects of the API, without getting buried in the minutia of an API in a way that only seasoned programmers can appreciate. Among the subjects covered are Mantle’s memory model, execution model, pipeline model, and the basic tenet of where low-level APIs can reduce overhead and improve performance over high level APIs.

The bulk of this information is a repeat from AMD’s earlier developer presentations, so we won’t spend any time going over the materials in-depth here, but for a more approachable look at the API from AMD’s perspective this is a great start.