Vik
March 2, 2015
Comments are Closed

Imagination Announces PowerVR G6020 GPU & PowerVR Series 5 Video Encoder

With Mobile World Congress 2015 now in full swing, Imagination Technologies is taking to the show today to announce a couple of new additions to the PowerVR family of video products.

First off is a new low-end GPU in the PowerVR Series6XE family, the G6020. The G6020 is aimed at entry-level mobile devices, embedded computers, and high-end wearables, and is intended to be Imagination’s new entry-level Series6XE part.

From a design perspective, the G6020 is aimed at very simple desktop workloads – the Android UI, wearable interfaces, etc. Imagination has essentially built the bare minimum GPU needed to drive a 720p60 display, taking out any hardware not necessary to that goal such as compute and quite a bit of geometry throughput. What remains is enough of a ROP backend (pixel co-processor) to drive 720p, and the FP16 shading resources to go with it.

Meanwhile from a hardware perspective this is basically a significantly cut down 6XE part. G6020 drops to just a single 4 pipeline USC, versus the 8 pipeline USC found in the G6050, and 16 pipelines as found in a “complete” USC. The number of FP32 ALUs in each pipeline has also been reduced, going from the 6XE standard of 2 per pipeline to 1 for G6020, while the number of FP16 ALUs remains unchanged at 4. Along with scaling down the USCs, Imagination has also stripped down the G6020 in other ways, such as by taking out the compute data master.

PowerVR Series6/6XE “Rogue”
GPU	# of Clusters	# of FP32 Ops	# of FP16 Ops	Optimization
G6020	0.25	8	32	Area + Bandwidth
G6050	0.5	32	64	Area
G6060	0.5	32	64	Area + Bandwidth
G6100	1	64	96	Area
G6100 (XE)	1	64	128	Area
G6110	1	64	128	Area + Bandwidth

The end result of their efforts is designed to be an incredibly small and incredibly low power OpenGL ES 3.0 GPU for devices that fall in the cheap/small range. G6020 is only 2.2mm2 in size on 28nm, making it similar in size to ARM’s Cortex-A7 CPU cores (a likely pairing target). And power consumption is low enough that it should be able to just fit into high-end wearables.

PowerVR Series 5 Video Encoder

Meanwhile Imagination’s second PowerVR announcement of the day is the announcement of their new PowerVR Series 5 family of video encoders. This is Imagination’s entry into the HEVC (H.265) hardware encoder market, offering scalable designs for encoding H.264 and HEVC video.

In terms of designs Imagination will be offering 3 designs, the E5800, E5505, and E5300, targeted at progressively lower-end markets. The E5800 is the largest configuration and is aimed at the prosumer market, offering 4Kp60 encoding with 10-bit color and 4:2:2 chroma sampling (twice the sampling of standard 4:2:0 video). Below that is the E5505, the mainstream/premium mobile part with support for encoding up to 4Kp30, along with VP8 encoding and even MJPEG for certain legacy applications. Finally at the bottom of the list is the E5300, which is a small, low power encoder for 1080p30 applications (cameras/sensors/IoT and the like).

PowerVR Series 5 HEVC Encoders
Encoder	Max Resolution	Chroma Subsampling	Market
E5800	4Kp60	4:2:2	Prosumer/Pro-Cameras
E5505	4Kp30	4:2:0	Mobile
E5300	1080p30	4:2:0	Sensor/IoT/Security Cameras

From a competitive standpoint, along with the expected synergy between the PowerVR encoders and PowerVR GPUs –support for directly handing off compressed memory, in particular – Imagination is also banking on being able to win a quality war with other mobile HEVC encoders. By Imagination’s estimates they can offer equivalent quality at just 70% of the bitrate, which would give them a significant advantage. The company says that this is a result of having a newer encoder that is better tuned than competing encoders, and one that implements more HEVC features (e.g. 10-bit color), allowing them to achieve better compression and the resulting reduction in bitrates.

While Imagination’s testing methodology and resulting numbers to get here are open to interpretation – PSNR is important, though not the end-all of encoder measurements – HEVC encoders are still a fledgling field. There is still ample opportunity to improve on HEVC encoders and reach the same kind of highly tuned status that H.264 encoders have evolved to.

Wrapping things up, both new PowerVR products are now available for licensing.

Vik
February 26, 2015
Comments are Closed

DirectX 12 Performance Preview, Part 3: Star Swarm & Intel’s iGPUs

We’re back once again for the 3^rd and likely final part to our evolving series previewing the performance of DirectX 12. After taking an initial look at discrete GPUs from NVIDIA and AMD in part 1, and then looking at AMD’s integrated GPUs in part 2, today we’ll be taking a much requested look at the performance of Intel’s integrated GPUs. Does Intel benefit from DirectX 12 in the same way the dGPUs and AMD’s iGPU have? And where does Intel’s most powerful Haswell GPU configuration, Iris Pro (GT3e) stack up? Let’s find out.

As our regular readers may recall, when we were initially given early access to WDDM 2.0 drivers and a DirectX 12 version of Star Swarm, it only included drivers for AMD and NVIDIA GPUs. Those drivers in turn only supported Kepler and newer on the NVIDIA side and GCN 1.1 and newer on the AMD side, which is why we haven’t yet been able to look at older AMD or NVIDIA cards, or for that matter any Intel iGPUs. However as of late last week that changed when Microsoft began releasing WDDM 2.0 drivers for all 3 vendors through Windows Update on Windows 10, enabling early DirectX 12 functionality on many supported products.

With Intel WDDM 2.0 drivers now in hand, we’re able to take a look at how Intel’s iGPUs are affected in this early benchmark. Driver version 10.18.15.4098, these drivers enable DirectX 12 functionality on Gen 7.5 (Haswell) and newer GPUs, with Gen 7.5 being the oldest Intel GPU generation that will support DirectX 12.

Today we’ll be looking at all 3 Haswell GPU tiers, GT1, GT2, and GT3e. We also have our AMD A10 and A8 results from earlier this month to use as a point of comparison (though please note that this combination of Mantle + SS is still non-functional on AMD APUs). With that said, before starting we’d like to once again remind everyone that this is an early driver on an early OS running an early DirectX 12 application, so everything here is subject to change. Furthermore Star Swarm itself is a very directed benchmark designed primarily to showcase batch counts, so what we see here should not be considered a well-rounded look at the benefits of DirectX 12. At the end of the day this is a test that more closely measures potential than real-world performance.

CPU:	AMD A10-7800 AMD A8-7600 Intel Core i3-4330 Intel Core i5-4690 Intel Core i7-4770R Intel Core i7-4790K
Motherboard:	GIGABYTE F2A88X-UP4 for AMD ASUS Maximus VII Impact for Intel LGA-1150 Zotac ZBOX EI750 Plus for Intel BGA
Power Supply:	Rosewill Silent Night 500W Platinum
Hard Disk:	OCZ Vertex 3 256GB OS SSD
Memory:	G.Skill 2x4GB DDR3-2133 9-11-10 for AMD G.Skill 2x4GB DDR3-1866 9-10-9 at 1600 for Intel
Video Cards:	AMD APU Integrated Intel CPU Integrated
Video Drivers:	AMD Catalyst 15.200 Beta Intel 10.18.15.4098
OS:	Windows 10 Technical Preview 2 (Build 9926)

Since we’re looking at fully integrated products this time around, we’ll invert our usual order and start with our GPU-centric view first before taking a CPU-centric look.

As Star Swarm was originally created to demonstrate performance on discrete GPUs, these integrated GPUs do not perform well. Even at low settings nothing cracks 30fps on DirectX 12. None the less there are a few patterns here that can help us understand what’s going on.

Right off the bat then there are two very apparent patterns, one of which is expected and one which caught us by surprise. At a high level, both AMD APUs outperform our collection of Intel processors here, and this is to be expected. AMD has invested heavily in iGPU performance across their entire lineup, where most Intel desktop SKUs come with the mid-tier GT2 GPU.

However what’s very much not expected is the ranking of the various Intel processors. Despite having all 3 Intel GPU tiers represented here, the performance between the Intel GPUs is relatively close, and this includes the Core i7-4770R and its GT3e GPU. GT3e’s performance here immediately raises some red flags – under normal circumstances it substantially outperforms GT2 – and we need to tackle this issue first before we can discuss any other aspects of Intel’s performance.

As long-time readers may recall from our look at Intel’s Gen 7.5 GPU architecture, Intel scales up from GT1 through GT3 by both duplicating the EU/texture unit blocks (the subslice) and the ROP/L3 blocks (the slice common). In the case of GT3/GT3e, it has twice as many slices as GT2 and consequently by most metrics is twice the GPU that GT2 is, with GT3e’s Crystal Well eDRAM providing an extra bandwidth kick. Immediately then there is an issue, since in none of our benchmarks does the GT3e equipped 4770R surpass any of the GT2 equipped SKUs.

The explanation, we believe, lies in the one part of an Intel GPU that doesn’t get duplicated in GT3e, which is the front-end, or as Intel calls it the Global Assets. Regardless of which GPU configuration we’re looking at – GT1, GT2, or GT3e – all Gen 7.5 configurations share what’s essentially the same front-end, which means front-end performance doesn’t scale up with the larger GPUs beyond any minor differences in GPU clockspeed.

Star Swarm for its part is no average workload, as it emphasizes batch counts (draw calls) above all else. Even though the low quality setting has much smaller batch counts than the extreme setting we use on the dGPUs, it’s still over 20K batches per frame, a far higher number than any game would use if it was trying to be playable on an iGPU. Consequently based on our GT2 results and especially our GT3e result, we believe that Star Swarm is actually exposing the batch processing limits of Gen 7.5’s front-end, with the front-end bottlenecking performance once the CPU bottleneck is scaled back by the introduction of DirectX 12.

The result of this is that while the Intel iGPUs are technically GPU limited under DirectX 12, it’s not GPU limited in a traditional sense; it’s not limited by shading performance, or memory bandwidth, or ROP throughput. This means that although Intel’s iGPUs benefit from DirectX 12, it’s not by nearly as much as AMD’s iGPUs did, never mind the dGPUs.

Update: Between when this story was written and when it was published, we heard back from Intel on our results. We are publishing our results as-is, but Intel believes that the lack of scaling with GT3e stems in part from a lack of optimizations for lower performnace GPUs in our build of Star Swarm, which is from an October branch of Oxide’s code base. Intel tells us that newer builds do show much better overall performance and more consistent gains for the GT3e, all the while the Oxide engine itself is in flux with its continued development. In any case this reiterates the fact that we’re still looking at early code here from all parties and performance is subject to change, especially on a test as directed/non-standard as Star Swarm.

So how much does Intel actually benefit from DirectX 12 under Star Swarm? As one would reasonably expect, with their desktop processors configured for very high CPU performance and much more limited GPU performance, Intel is the least CPU bottlenecked in the first place. That said, if we take a look at the mid quality results in particular, what we find is that Intel still benefits from DX12. The 4770R is especially important here, as it’s a relatively weaker GPU (base frequency 3.2GHz) coupled with a more powerful GPU. It starts out trailing the other Core processors in DX11, only to reach parity with them under DX12 when the bottleneck shifts from the CPU to the GPU front-end. The performance gain is only 25% – and at framerates in the single digits – but conceptually it shows that even Intel can benefit from DX12. Meanwhile the other Intel processors see much smaller, but none the less consistent gains, indicating that there’s at least a trivial benefit from DX12.

Taking a look under the hood at our batch submission times, we can much more clearly see the CPU usage benefits of DX12. The Intel CPUs actually start at a notable deficit here under DX11, with batch submission times worse than the AMD APUs and their relatively weaker CPUs, and 4770R in particular taking nearly 200ms to submit a batch. Enabling DX12 in turn causes the same dramatic reduction in batch submission times we’ve seen elsewhere, with Intel’s batch submission times dropping to below 20ms. Somewhat surprisingly Intel’s times are still worse than AMD’s, though at this point we’re so badly GPU limited on all platforms that it’s largely academic. None the less it shows that Intel may have room for future improvements.

With this data in hand, we can finally make better sense of the results we’re seeing today. Just as with AMD and NVIDIA, using DirectX 12 has a noticeable and dramatic reduction in batch submission times for Intel’s iGPUs. However in the case of Star Swarm the batch counts are so high that it appears GT2 and GT3e are bottlenecked by their GPU front-ends, and as a result the gains from enabling DX12 at very limited. In fact at this point we’re probably at the limits of Star Swarm’s usefulness, since it’s meant more for discrete GPUs.

The end result though is that one way or another Intel ends up shifting from being CPU limited to GPU limited under DX12. And with a weaker GPU than similar AMD parts, performance tops out much sooner. That said, it’s worth pointing out that we are looking at desktop parts here, where Intel goes heavy on the CPU and light on the GPU; in mobile parts where Intel’s CPU and GPU configurations are less lopsided, it’s likely that Intel would benefit more than they do on the desktop, though again probably not as much as AMD has.

As for real world games, just as with our other GPUs we’re in a wait-and-see situation. An actual game designed to be playable on Intel’s iGPUs is very unlikely to push as many batch calls as Star Swarm, so the front-end bottleneck and GT3e’s poor performance are similarly unlikely to recur. But at the same time with Intel generally being the least CPU bottlenecked in the first place, their overall gains under DX12 may be the smallest, particularly when exploiting the API’s vastly improved draw call performance.

In the meantime GDC 2015 will be taking place next week, where we will be hearing more from Microsoft and its GPU partners about DirectX 12. With last year’s unveiling being an early teaser of the API, the sessions this year will be focusing on helping programmers ramp up for its formal launch later this year, and with any luck we’ll find the final details on feature level 12_0 and whether any current GPUs are 12_0 compliant. Along with more on OpenGL Next (aka glNext), it should make for an exciting show for GPU events.

Vik
February 26, 2015
Comments are Closed

DirectX 12 Performance Preview, Part 3: Star Swarm & Intel’s iGPUs

We’re back once again for the 3^rd and likely final part to our evolving series previewing the performance of DirectX 12. After taking an initial look at discrete GPUs from NVIDIA and AMD in part 1, and then looking at AMD’s integrated GPUs in part 2, today we’ll be taking a much requested look at the performance of Intel’s integrated GPUs. Does Intel benefit from DirectX 12 in the same way the dGPUs and AMD’s iGPU have? And where does Intel’s most powerful Haswell GPU configuration, Iris Pro (GT3e) stack up? Let’s find out.

As our regular readers may recall, when we were initially given early access to WDDM 2.0 drivers and a DirectX 12 version of Star Swarm, it only included drivers for AMD and NVIDIA GPUs. Those drivers in turn only supported Kepler and newer on the NVIDIA side and GCN 1.1 and newer on the AMD side, which is why we haven’t yet been able to look at older AMD or NVIDIA cards, or for that matter any Intel iGPUs. However as of late last week that changed when Microsoft began releasing WDDM 2.0 drivers for all 3 vendors through Windows Update on Windows 10, enabling early DirectX 12 functionality on many supported products.

With Intel WDDM 2.0 drivers now in hand, we’re able to take a look at how Intel’s iGPUs are affected in this early benchmark. Driver version 10.18.15.4098, these drivers enable DirectX 12 functionality on Gen 7.5 (Haswell) and newer GPUs, with Gen 7.5 being the oldest Intel GPU generation that will support DirectX 12.

Today we’ll be looking at all 3 Haswell GPU tiers, GT1, GT2, and GT3e. We also have our AMD A10 and A8 results from earlier this month to use as a point of comparison (though please note that this combination of Mantle + SS is still non-functional on AMD APUs). With that said, before starting we’d like to once again remind everyone that this is an early driver on an early OS running an early DirectX 12 application, so everything here is subject to change. Furthermore Star Swarm itself is a very directed benchmark designed primarily to showcase batch counts, so what we see here should not be considered a well-rounded look at the benefits of DirectX 12. At the end of the day this is a test that more closely measures potential than real-world performance.

CPU:	AMD A10-7800 AMD A8-7600 Intel Core i3-4330 Intel Core i5-4690 Intel Core i7-4770R Intel Core i7-4790K
Motherboard:	GIGABYTE F2A88X-UP4 for AMD ASUS Maximus VII Impact for Intel LGA-1150 Zotac ZBOX EI750 Plus for Intel BGA
Power Supply:	Rosewill Silent Night 500W Platinum
Hard Disk:	OCZ Vertex 3 256GB OS SSD
Memory:	G.Skill 2x4GB DDR3-2133 9-11-10 for AMD G.Skill 2x4GB DDR3-1866 9-10-9 at 1600 for Intel
Video Cards:	AMD APU Integrated Intel CPU Integrated
Video Drivers:	AMD Catalyst 15.200 Beta Intel 10.18.15.4098
OS:	Windows 10 Technical Preview 2 (Build 9926)