GPUs


Benchmarked: Lords of the Fallen

Benchmarked: Lords of the Fallen

Officially launched on October 28, Lords of the Fallen had a bit of a rocky start so it took longer for me to finish running benchmarks on the game, but I’ll get into that momentarily. At its core, Lords of the Fallen is a melee third-person action-RPG similar in many ways to the Dark Souls games. There are periodic boss fights to shake things up, as you play the convicted criminal Harkyn (his face has tattoos to publicly declare his sins) trying to stop the invasion of the demonic Rhogar. The game has received decent reviews, with a current Metacritic score of 72%.

Like many recent releases, Lords of the Fallen is a multi-platform title that launched simultaneously for the PC, PS4, and Xbox One. The updated consoles now sport more memory than the old PS3 and Xbox 360, which gives developers opportunities to do a lot more in terms of textures and graphics quality, and at launch I think that ended up creating some problems on the PC. The short summary is that GPUs with 2GB of VRAM or less tended to have quite poor performance. CrossFire also had issues: it wasn’t just functioning poorly but it actually caused the game to crash to the desktop.

The first update to the game was released about a week later, and it fixed a few bugs and instability issues, but more importantly it offered much better performance on GPUs with limited VRAM. CrossFire is also “working” now – and I put that in quotes because CrossFire is actually causing degraded performance in some cases and is basically not scaling well enough to be worth the hassle so far. Need I mention that this is an NVIDIA “The Way It’s Meant To Be Played” title? Not that the developers have intentionally crippled AMD performance, but I don’t think AMD has spent as much time optimizing their drivers for the game. It runs well enough with the right hardware and settings, but this is definitely a game that favors NVIDIA.

We tested using the game’s built-in Ultra and High settings, but there’s not a significant difference in performance or quality between the two modes so I’m in the process of running another set of performance figures at Medium quality. (I’m also testing Assassin’s Creed: Unity performance, which will be the next Benchmarked article, so it will be a bit before I can post the full 1080p Medium results.) Before we get to the performance, here’s a quick look at image quality using the four presets:

The major difference between Ultra and High seems to be a minor change in the handling of shadows; I’m not sure you could definitively call Ultra “better”, and performance is so close that it’s mostly a moot point. Medium appears to disable Ambient Occlusion on the shadows, resulting in a much more noticeable change to the graphics, while Low also disables the Post Processing effect. I’m not sure that’s actually a bad thing, though – the effect warps the image a bit, particularly on the right and left thirds of the screen, and tends to make everything look a little blurry/weird.

Lords of the Fallen also features support for some NVIDIA technologies, including PhysX APEX particle support. Many of the effects use PhysX on the CPU and thus work with all GPUs (as well as running on the PS4 and Xbox One), but there’s an additional effect called Turbulence that’s only available on NVIDIA GPUs. I didn’t try to do thorough testing of performance with and without Turbulence enabled since it’s an NVIDIA exclusive, but informally it looks like the performance hit is relatively small – around 5-10% – so if you’re running an NVIDIA GPU it’s probably worth enabling.

One final thing to note before we get to the benchmarks is that Lords of the Fallen is very demanding when it comes to GPUs. Moderate hardware (e.g. Radeon R7 and similar, or NVIDA GTX 750 Ti and lower) are going to struggle to break 30FPS at 1080p Ultra or High settings, so 1080p Medium or even 1600×900 Medium might be required. I’ll add the Medium results in the next day or two once I finish retesting. And once more as a quick overview, here’s the hardware used for our Benchmarked articles:

Gaming Benchmarks Test Systems
CPU Intel Core i7-4770K (4x 3.5-3.9GHz, 8MB L3)
Overclocked to 4.1GHz
Motherboard Gigabyte G1.Sniper M5 Z87
Memory 2x8GB Corsair Vengeance Pro DDR3-1866 CL9
GPUs Desktop GPUs:
Sapphire Radeon R9 280
Sapphire Radeon R9 280X
Gigabyte Radeon R9 290X
EVGA GeForce GTX 770
EVGA GeForce GTX 780
Zotac GeForce GTX 970
Reference GeForce GTX 980

Laptops:
GeForce GTX 980M (MSI GT72 Dominator Pro)
GeForce GTX 880M (MSI GT70 Dominator Pro)
GeForce GTX 870M (MSI GS60 Ghost 3K Pro)
GeForce GTX 860M (MSI GE60 Apache Pro)

Storage Corsair Neutron GTX 480GB
Power Supply Rosewill Capstone 1000M
Case Corsair Obsidian 350D
Operating System Windows 7 64-bit

Lords of the Fallen Average FPS

As far as target FPS for a decent experience, Lords of the Fallen isn’t quite as twitch-heavy as some games, so I’d recommend shooting for anything above 40FPS average. If you have a G-SYNC display with an NVIDIA GPU, that will also allow you to still experience “smooth” gameplay without tearing. For our testing, however, we disable VSYNC as usual.

Lords of the Fallen 4K Ultra

Lords of the Fallen QHD Ultra

Lords of the Fallen 1080p Ultra

Lords of the Fallen 1080p High

Starting with average frame rates, 4K is basically a stretch at best for even the fastest single GPU configurations. The GTX 980 can technically break 30FPS (barely), but it’s not as smooth as I’d like so dropping down a notch is recommended. SLI reportedly works well, though I don’t have the hardware to test this (yet), so two high-end NVIDIA GPUs might be enough to get into the playable frame rate territory at 4K. At present CrossFire R9 290X still falls well short, but that’s also due to the fact that CrossFire scaling is very low right now.

There’s a sizeable jump in performance going from 4K to QHD as expected, with most of the GPUs basically doubling their performance – not too surprising as 4K has 2.25X as many pixels to render as QHD. I mentioned earlier how the patch changed performance in some cases, particularly for GPUs with 2GB or less VRAM. The big beneficiary for higher performance GPUs ends up being the GTX 770, which saw a jump in QHD performance of over 70% with the patch (and a still significant increase of 30% at 1080p Ultra/High).

On the AMD side of the equation, the R9 GPUs don’t do all that well compared to NVIDIA. We’re used to seeing the 780/970 trade blows with the 290X in most games, but here the 290X is closer to a 770, with the 780/970 offering a solid 15-20% increase in performance. Meanwhile the 280X is mostly playable at QHD but certainly not ideal, and the 280 has to drop to 1080p before it can achieve “acceptable” performance. Overall, the R9 290X along with all of the GTX desktop GPUs I tested can handle QHD Ultra and provide a good experience.

Moving to the 1080p results and looking at the laptops, the GTX 980M is clearly a force to be reckoned with, essentially matching the R9 290X and the GTX 770 and easily handling 1080p Ultra. The next step down to the GTX 880M is a pretty big one – the 980M is about 35% faster than the 880M – but the 880M is still able to handle 1080p Ultra. The 870M meanwhile is in that “questionable” range, and dropping to High settings is only good for about 3-5% more performance on most of our GPUs, so a bit more tweaking is going to be required. Last but not least, the 860M falls short of even the 30FPS mark, and it will need some tuning or Medium quality before it’s really acceptable at 1080p.

Our sole “low-end” GPU is the R7 250X, and as you can see it’s really not doing well at 1080p High, falling below 20FPS. It also benefited quite a bit from the patch, improving by around 35% at 1080p High, but going from 13.7 FPS to 18.5 FPS still means it’s unplayable. I also tested an Intel HD 4600 just for fun (though it’s not shown in the charts since it only managed 6.5 FPS); even at 1366×768 and Low quality, it’s still far short of being playable with frame rates of around 17 FPS.

Lords of the Fallen Minimum FPS

As with Civilization: Beyond Earth, for the “minimum” FPS I’m actually using an average of the bottom 1% of frame rates. What that means is that this is a realistic look at minimum frame rates, as our benchmark run typically consists of a couple thousand frames of data so we’re looking at an average of 20+ frames. Thus, a single frame that took a long time to render won’t have as great of an impact as consistently low frame rates. The goal here is to give you a better idea of what performance will be like in the most graphically intense situations.

Lords of the Fallen 4K Ultra Minimums

Lords of the Fallen QHD Ultra Minimums

Lords of the Fallen 1080p Ultra Minimums

Lords of the Fallen 1080p High Minimums

When we look at the minimum FPS, you can now see why I recommended at least 40FPS average frame rates for Lords of the Fallen to be “playable”. That translates into minimum frame rates of roughly 30FPS, so even in higher complexity scenes the game will still stay reasonably smooth. On the other hand, if you’re averaging closer to 30FPS, minimum FPS is going to drop into the low 20s, and that can be quite choppy.

The standings of the various GPUs don’t really change much in our minimum FPS results. In most cases the minimum is around 70-75% of the average FPS, with GPUs that have less RAM generally faring slightly worse than those with more RAM. NVIDIA seems to do a bit better than AMD at 1080p than at QHD, but there aren’t any clear issues on any of the GPUs.

Closing Thoughts

I never played any of the Dark Souls games for whatever reason (lack of time, mostly), so for me Lords of the Fallen is actually pretty fun. Of course, having benchmarked the same sequence I don’t know how many times (well over 100) does become rather tedious. With so many other games coming out right now, I don’t think I’d place Lords of the Fallen at the top of any recommendations list, but it has enough to warrant picking it up if it goes on sale. In the meantime, I’d suggest Middle-Earth: Shadow of Mordor or Assassin’s Creed: Unity as better games, at least in my opinion.

Now that we’ve had a few of these Benchmarked articles, let me also ask for reader feedback. The good thing about these Benchmarked articles is that once I’m done with the initial benchmarking, I won’t necessarily be retesting this same game on different systems for another year or two. It’s also useful to increase the number of games we benchmark, as it helps to keep the GPU manufacturers honest – they can’t just optimize drivers for the ten or so games that most sites use for benchmarking as an example. But what do you think – do you like these articles? Short of the desire to test even more configurations (it’s always something that would be nice to have but very time consuming to deliver), what else would you like to see? Are there any recently released games that you’d like to see us test? Let us know!

Benchmarked: Lords of the Fallen

Benchmarked: Lords of the Fallen

Officially launched on October 28, Lords of the Fallen had a bit of a rocky start so it took longer for me to finish running benchmarks on the game, but I’ll get into that momentarily. At its core, Lords of the Fallen is a melee third-person action-RPG similar in many ways to the Dark Souls games. There are periodic boss fights to shake things up, as you play the convicted criminal Harkyn (his face has tattoos to publicly declare his sins) trying to stop the invasion of the demonic Rhogar. The game has received decent reviews, with a current Metacritic score of 72%.

Like many recent releases, Lords of the Fallen is a multi-platform title that launched simultaneously for the PC, PS4, and Xbox One. The updated consoles now sport more memory than the old PS3 and Xbox 360, which gives developers opportunities to do a lot more in terms of textures and graphics quality, and at launch I think that ended up creating some problems on the PC. The short summary is that GPUs with 2GB of VRAM or less tended to have quite poor performance. CrossFire also had issues: it wasn’t just functioning poorly but it actually caused the game to crash to the desktop.

The first update to the game was released about a week later, and it fixed a few bugs and instability issues, but more importantly it offered much better performance on GPUs with limited VRAM. CrossFire is also “working” now – and I put that in quotes because CrossFire is actually causing degraded performance in some cases and is basically not scaling well enough to be worth the hassle so far. Need I mention that this is an NVIDIA “The Way It’s Meant To Be Played” title? Not that the developers have intentionally crippled AMD performance, but I don’t think AMD has spent as much time optimizing their drivers for the game. It runs well enough with the right hardware and settings, but this is definitely a game that favors NVIDIA.

We tested using the game’s built-in Ultra and High settings, but there’s not a significant difference in performance or quality between the two modes so I’m in the process of running another set of performance figures at Medium quality. (I’m also testing Assassin’s Creed: Unity performance, which will be the next Benchmarked article, so it will be a bit before I can post the full 1080p Medium results.) Before we get to the performance, here’s a quick look at image quality using the four presets:

The major difference between Ultra and High seems to be a minor change in the handling of shadows; I’m not sure you could definitively call Ultra “better”, and performance is so close that it’s mostly a moot point. Medium appears to disable Ambient Occlusion on the shadows, resulting in a much more noticeable change to the graphics, while Low also disables the Post Processing effect. I’m not sure that’s actually a bad thing, though – the effect warps the image a bit, particularly on the right and left thirds of the screen, and tends to make everything look a little blurry/weird.

Lords of the Fallen also features support for some NVIDIA technologies, including PhysX APEX particle support. Many of the effects use PhysX on the CPU and thus work with all GPUs (as well as running on the PS4 and Xbox One), but there’s an additional effect called Turbulence that’s only available on NVIDIA GPUs. I didn’t try to do thorough testing of performance with and without Turbulence enabled since it’s an NVIDIA exclusive, but informally it looks like the performance hit is relatively small – around 5-10% – so if you’re running an NVIDIA GPU it’s probably worth enabling.

One final thing to note before we get to the benchmarks is that Lords of the Fallen is very demanding when it comes to GPUs. Moderate hardware (e.g. Radeon R7 and similar, or NVIDA GTX 750 Ti and lower) are going to struggle to break 30FPS at 1080p Ultra or High settings, so 1080p Medium or even 1600×900 Medium might be required. I’ll add the Medium results in the next day or two once I finish retesting. And once more as a quick overview, here’s the hardware used for our Benchmarked articles:

Gaming Benchmarks Test Systems
CPU Intel Core i7-4770K (4x 3.5-3.9GHz, 8MB L3)
Overclocked to 4.1GHz
Motherboard Gigabyte G1.Sniper M5 Z87
Memory 2x8GB Corsair Vengeance Pro DDR3-1866 CL9
GPUs Desktop GPUs:
Sapphire Radeon R9 280
Sapphire Radeon R9 280X
Gigabyte Radeon R9 290X
EVGA GeForce GTX 770
EVGA GeForce GTX 780
Zotac GeForce GTX 970
Reference GeForce GTX 980

Laptops:
GeForce GTX 980M (MSI GT72 Dominator Pro)
GeForce GTX 880M (MSI GT70 Dominator Pro)
GeForce GTX 870M (MSI GS60 Ghost 3K Pro)
GeForce GTX 860M (MSI GE60 Apache Pro)

Storage Corsair Neutron GTX 480GB
Power Supply Rosewill Capstone 1000M
Case Corsair Obsidian 350D
Operating System Windows 7 64-bit

Lords of the Fallen Average FPS

As far as target FPS for a decent experience, Lords of the Fallen isn’t quite as twitch-heavy as some games, so I’d recommend shooting for anything above 40FPS average. If you have a G-SYNC display with an NVIDIA GPU, that will also allow you to still experience “smooth” gameplay without tearing. For our testing, however, we disable VSYNC as usual.

Lords of the Fallen 4K Ultra

Lords of the Fallen QHD Ultra

Lords of the Fallen 1080p Ultra

Lords of the Fallen 1080p High

Starting with average frame rates, 4K is basically a stretch at best for even the fastest single GPU configurations. The GTX 980 can technically break 30FPS (barely), but it’s not as smooth as I’d like so dropping down a notch is recommended. SLI reportedly works well, though I don’t have the hardware to test this (yet), so two high-end NVIDIA GPUs might be enough to get into the playable frame rate territory at 4K. At present CrossFire R9 290X still falls well short, but that’s also due to the fact that CrossFire scaling is very low right now.

There’s a sizeable jump in performance going from 4K to QHD as expected, with most of the GPUs basically doubling their performance – not too surprising as 4K has 2.25X as many pixels to render as QHD. I mentioned earlier how the patch changed performance in some cases, particularly for GPUs with 2GB or less VRAM. The big beneficiary for higher performance GPUs ends up being the GTX 770, which saw a jump in QHD performance of over 70% with the patch (and a still significant increase of 30% at 1080p Ultra/High).

On the AMD side of the equation, the R9 GPUs don’t do all that well compared to NVIDIA. We’re used to seeing the 780/970 trade blows with the 290X in most games, but here the 290X is closer to a 770, with the 780/970 offering a solid 15-20% increase in performance. Meanwhile the 280X is mostly playable at QHD but certainly not ideal, and the 280 has to drop to 1080p before it can achieve “acceptable” performance. Overall, the R9 290X along with all of the GTX desktop GPUs I tested can handle QHD Ultra and provide a good experience.

Moving to the 1080p results and looking at the laptops, the GTX 980M is clearly a force to be reckoned with, essentially matching the R9 290X and the GTX 770 and easily handling 1080p Ultra. The next step down to the GTX 880M is a pretty big one – the 980M is about 35% faster than the 880M – but the 880M is still able to handle 1080p Ultra. The 870M meanwhile is in that “questionable” range, and dropping to High settings is only good for about 3-5% more performance on most of our GPUs, so a bit more tweaking is going to be required. Last but not least, the 860M falls short of even the 30FPS mark, and it will need some tuning or Medium quality before it’s really acceptable at 1080p.

Our sole “low-end” GPU is the R7 250X, and as you can see it’s really not doing well at 1080p High, falling below 20FPS. It also benefited quite a bit from the patch, improving by around 35% at 1080p High, but going from 13.7 FPS to 18.5 FPS still means it’s unplayable. I also tested an Intel HD 4600 just for fun (though it’s not shown in the charts since it only managed 6.5 FPS); even at 1366×768 and Low quality, it’s still far short of being playable with frame rates of around 17 FPS.

Lords of the Fallen Minimum FPS

As with Civilization: Beyond Earth, for the “minimum” FPS I’m actually using an average of the bottom 1% of frame rates. What that means is that this is a realistic look at minimum frame rates, as our benchmark run typically consists of a couple thousand frames of data so we’re looking at an average of 20+ frames. Thus, a single frame that took a long time to render won’t have as great of an impact as consistently low frame rates. The goal here is to give you a better idea of what performance will be like in the most graphically intense situations.

Lords of the Fallen 4K Ultra Minimums

Lords of the Fallen QHD Ultra Minimums

Lords of the Fallen 1080p Ultra Minimums

Lords of the Fallen 1080p High Minimums

When we look at the minimum FPS, you can now see why I recommended at least 40FPS average frame rates for Lords of the Fallen to be “playable”. That translates into minimum frame rates of roughly 30FPS, so even in higher complexity scenes the game will still stay reasonably smooth. On the other hand, if you’re averaging closer to 30FPS, minimum FPS is going to drop into the low 20s, and that can be quite choppy.

The standings of the various GPUs don’t really change much in our minimum FPS results. In most cases the minimum is around 70-75% of the average FPS, with GPUs that have less RAM generally faring slightly worse than those with more RAM. NVIDIA seems to do a bit better than AMD at 1080p than at QHD, but there aren’t any clear issues on any of the GPUs.

Closing Thoughts

I never played any of the Dark Souls games for whatever reason (lack of time, mostly), so for me Lords of the Fallen is actually pretty fun. Of course, having benchmarked the same sequence I don’t know how many times (well over 100) does become rather tedious. With so many other games coming out right now, I don’t think I’d place Lords of the Fallen at the top of any recommendations list, but it has enough to warrant picking it up if it goes on sale. In the meantime, I’d suggest Middle-Earth: Shadow of Mordor or Assassin’s Creed: Unity as better games, at least in my opinion.

Now that we’ve had a few of these Benchmarked articles, let me also ask for reader feedback. The good thing about these Benchmarked articles is that once I’m done with the initial benchmarking, I won’t necessarily be retesting this same game on different systems for another year or two. It’s also useful to increase the number of games we benchmark, as it helps to keep the GPU manufacturers honest – they can’t just optimize drivers for the ten or so games that most sites use for benchmarking as an example. But what do you think – do you like these articles? Short of the desire to test even more configurations (it’s always something that would be nice to have but very time consuming to deliver), what else would you like to see? Are there any recently released games that you’d like to see us test? Let us know!

Apple A8X’s GPU - GXA6850, Even Better Than I Thought

Apple A8X’s GPU – GXA6850, Even Better Than I Thought

Working on analyzing various Apple SoCs over the years has become a process of delightful frustration. Apple’s SoC development is consistently on the cutting edge, so it’s always great to see something new, but Apple has also developed a love for curveballs. Coupled with their infamous secrecy and general lack of willingness to talk about the fine technical details of some of their products, it’s easy to see how well Apple’s SoCs perform but it is a lot harder to figure out why this is.

Since publishing our initial iPad Air 2 review last week, a few new pieces of information have come in that have changed our perspective on Apple’s latest SoC. As it turns out I was wrong. Powered by what we’re going to call the GXA6850, the A8X’s GPU is even better than I thought.

Apple SoC Comparison
  A8X A8 A7 A6X
CPU 3x “Enhanced Cyclone” 2x “Enhanced Cyclone” 2x Cyclone 2x Swift
CPU Clockspeed 1.5GHz 1.4GHz 1.4GHz (iPad) 1.3GHz
GPU Apple/PVR GXA6850 PVR GX6450 PVR G6430 PVR SGX554 MP4
RAM 2GB 1GB 1GB 1GB
Memory Bus Width 128-bit 64-bit 64-bit 128-bit
Memory Bandwidth 25.6GB/sec 12.8GB/sec 12.8GB/sec 17.1GB/sec
L2 Cache 2MB 1MB 1MB 1MB
L3 Cache 4MB 4MB 4MB N/A
Transistor Count ~3B ~2B >1B N/A
Manufacturing Process TSMC(?) 20nm TSMC 20nm Samsung 28nm Samsung 32nm

Briefly, without a public die shot of A8X we have been left to wander through the dark a bit more than usual on its composition. A8X’s three “Enhanced Cyclone” CPU cores and 2MB of L2 cache were easy enough to discover, as the OS will cheerfully report those facts. However the GPU is more of an enigma since the OS does not report the GPU configuration and performance is a multi-variable equation that is reliant on both GPU clockspeed and GPU width (the number of clusters). Given Apple’s performance claims and our own benchmarks we believed we had sufficient information to identify this as Imagination’s PowerVR GX6650, the largest of Imagination’s GPU designs.

Since then, we have learned a few things that have led us to reevaluate our findings and discover that A8X’s GPU is even more powerful than GX6650. First and foremost, on Monday Imagination announced the PowerVR Series7 GPUs. Though not shipping for another year, we learned from Imagination’s announcement that Series7XT scales up to 16 clusters, twice the number of clusters as Series6XT. This immediately raises a red flag since Imagination never released an 8 cluster design – and indeed is why we believed it was GX6650 in the first place – warranting further investigation. This revelation meant that an 8 cluster design was possible, though by no means assured.


PowerVR Series7XT: Up 16 Clusters, Twice As Many As Series6XT

The second piece of information came from analyzing GFXBench 3.0 data to look for further evidence. While we don’t publish every single GFXBench subtest in our reviews, we still collect the data for Bench and for internal use. What we noticed is that the GFXBench fill rate test is showing more than double the performance of the A8 iPhone 6 Plus. Keeping in mind that performance here is a combination of width and clockspeed, fillrate alone does not prove an 8 cluster design or a 6 cluster design, only that the combination of width and clockspeeds leads to a certain level of performance. In other words, we couldn’t rule out a higher clocked GX6650.

GFXBench 3.0 Fill Rate Test (Offscreen)

At the same time in the PC space the closest equivalent fillrate test, 3DMark Vantage’s pixel fill test, is known to be constrained by memory bandwidth as much as or more than it is GPU performance (this leading to the GTX 980’s incredible fillrate). However as we have theorized and since checked with other sources, GFXBench 3.0’s fillrate test is not bandwidth limited in the same way, at least not on Apple’s most recent SoCs. Quite possibly due to the 4MB of SRAM that is A7/A8/A8X’s L3 cache, this is a relatively “pure” test of pixel fillrate, meaning we can safely rule out any other effects.

With this in mind, normally Apple has a strong preference for wide-and-slow architectures in their GPUs. High clockspeeds require higher voltages, so going wide and staying with lower clockspeeds allows Apple to conserve power at the cost of some die space. This is the basic principle behind Cyclone and it has been the principle in Apple’s GPU choices as well. Given this, one could reasonably argue that A8X was using an 8 cluster design, but even with this data we were not entirely sure.

The final piece of the puzzle came in this afternoon when after some additional poking around we were provided with a die shot of A8X. Unfortunately at this point we have to stop and clarify that as part of our agreement with our source we are not allowed to publish this die shot. The die shot itself is legitimate, coming from a source capable of providing such die shots, however they didn’t wish to become involved in the analysis of the A8X and as a result we were only allowed to see it so long as we didn’t publish it.

Update: Chipworks has since published their A8X die shot, which we have reproduced below

To get right down to business then, the die shot confirms what we had begun suspecting: that A8X has an 8 cluster Series6XT configuration. All 8 GPU clusters are clearly visible, and perhaps unsurprisingly it looks a lot like the GPU layout of the GX6450. To put it in words, imagine A8’s GX6450 with another GX6450 placed right above it, and that would be the A8X’s 8 cluster GPU.


Chipworks A8X Die Shot

With 8 clearly visible GPU clusters, there is no question at this point that A8X is not using a GX6650, but rather something more. And this is perhaps where the most interesting point comes up, due to the fact that Imagination does not have an official 8 cluster Series6XT GPU design. While Apple licenses PowerVR GPU cores, not unlike their ARM IP license they are free to modify the Imagination designs to fit their needs, resulting in an unusual semi-custom aspect to their designs (and explaining what Apple has been doing with so many GPU engineers over the last couple of years). In this case it appears that Apple has taken the GX6450 design and created a new design from it, culminating in an 8 cluster Series6XT design. Officially this design has no public designation – while it’s based on an Imagination design it is not an official Imagination design, and of course Apple doesn’t reveal codenames – but for the sake of simplicity we are calling it the GXA6850.

Imagination/Apple PowerVR Series6XT GPU Comparison
  GXA6850 GX6650 GX6450 GX6250
Clusters 8 6 4 2
FP32 ALUs 256 192 128 64
FP32 FLOPs/Clock 512 384 256 128
FP16 FLOPs/Clock 1024 768 512 256
Pixels/Clock (ROPs) 16 12 8 4
Texels/Clock 16 12 8 4
OpenGL ES 3.1 3.1 3.1 3.1

Other than essentially doubling up on GX6450s, the GXA6850 appears to be unchanged from the design we saw in the A8. Apple did the necessary interconnect work to make an 8 cluster design functional and made their own power/design optimizations throughout the core, but there do not appear to be any further surprises in this GPU design. So what we have is an Apple variant on a Series6XT design, but something that is clearly a semi-custom Series6XT design and not a full in-house custom GPU design.


Unofficial GXA6850 Logical Diagram

Meanwhile the die shot places the die size of A8X at roughly 128mm2. This is in-line with our estimates – though certainly on the lower end – making A8X only a hair larger than the 123mm2 A6X. At roughly 3 billion transistors Apple has been able to increase their transistor count by nearly 50% while increasing the die size by only 40%, meaning Apple achieved better than linear scaling and A8X packs a higher average transistor density. On a size basis, A8X is a bit bigger than NVIDIA’s 118mm2 GK107 GPU or a bit smaller than Intel’s 2C+GT2 Haswell CPU, which measures in at 130mm2. Meanwhile on a transistor basis, as expected the 20nm A8X packs a far larger number of transistors than those 28nm/22nm products, with 3B transistors being larger than even Intel’s 4C+GT3 Haswell design (1.7B transistors) and right in between NVIDIA’s GK104 (3.5B) and GK106 (2.5B) GPUs.

Apple iPad SoC Evolution
  Die Size Transistors Process
A5 122mm2 <1B 45nm
A5X 165mm2 ? 45nm
A6X 123mm2 ? 32nm
A7 102mm2 >1B 28nm
A8X 128mm2 ~3B 20nm

Of this die space GXA6850 occupies 30% of A8X’s die, putting the GPU size at roughly 38mm2. This isn’t sufficient to infer the GPU transistor count, but in terms of absolute die size it’s still actually quite small thanks to the 20nm process. Roughly speaking an Intel Haswell GT2 GPU is 87mm2, but of course Apple has better density.

Moving on, the bigger question at this point remains why Apple went with an 8 cluster GPU over a 6 cluster GPU. From a performance standpoint this is greatly appreciated, but comparing iPad Air 2 to iPhone 6 Plus, the iPad Air 2 is nowhere near twice as many pixels as the iPhone 6 Plus. So the iPad Air 2 is “overweight” on GPU performance on a per-pixel basis versus its closest phone counterpart, offering roughly 30% better performance per pixel. Apple certainly has gaming ambitions with the iPad Air 2, and this will definitely help with that. But I believe there may also be a technical reason for such a large die.

The 128bit DDR3 memory bus used by the A8X requires pins, quite a lot in fact. Coupled with all of the other pins that need to come off of the SoC – NAND, display, audio, USB, WiFi, etc – and this is a lot of pins in a not very large area of space. At this point I am increasingly suspicious that Apple is pad limited, and that in order to fit a 128bit memory interface A8X needs to reach a minimum die size. With only a small organic substrate to help spread out pads, Apple has only as many pads as they can fit on the die, making a larger die a potential necessity. Ultimately if this were the case, Apple would have some nearly-free die space to spend on additional features if a 6 cluster A8X came in at under 128mm2, making the addition of 2 more clusters (~10mm2) a reasonable choice in this situation.

Finally, while we’re digging around in A8X’s internals, let’s quickly talk about the CPU block. There are no great surprises – nor did we expect to find any – but viewing the A8X die has confirmed that A8X is indeed an asymmetrical 3 CPU core design, and that there is no 4th (disabled) CPU core on the SoC. An odd number of CPU cores is unusual, though by no means unheard of. In this case Apple laid down a 3rd Enhanced Cyclone core, doubled the L2 cache, and left it at that.

Wrapping things up, it has become clear that with A8X Apple has once again thrown us a curveball. By drawing outside of the lines and building an eight cluster GPU configuration where none previously existed, the A8X and its GXA6850 GPU are more powerful than even we first suspected. Apple traditionally aims high with its SoCs, but this ended up being higher still.

As far as performance is concerned this doesn’t change our initial conclusions – iPad Air 2 performs the same no matter how many GPU clusters we think are in it – but it helps to further explain iPad Air 2’s strong GPU performance. With 256 FP32 ALUs Apple has come very close to implementing a low-end desktop class GPU on a tablet SoC, and perhaps just as impressively can sustain that level of performance for hours. Though I don’t want to reduce this to a numbers war between A8X and NVIDIA’s TK1, it’s clear that these two SoCs stand apart from everything else in the tablet space.