Vik
March 8, 2017
Comments are Closed

NVIDIA Announces Jetson TX2: Parker Comes To NVIDIA’s Embedded System Kit

For a few years now, NVIDIA has been offering their line of Jetson embedded system kits. Originally launched using Tegra K1 in 2014, the first Jetson was designed to be a dev kit for groups looking to build their own Tegra-based devices from scratch. Instead, what NVIDIA surprisingly found, was that groups would use the Jetson board as-is instead and build their devices around that. This unexpected market led NVIDIA to pivot a bit on what Jetson would be, resulting in the second-generation Jetson TX1, a proper embedded system board that can be used for both development purposes and production devices.

This relaunched Jetson came at an interesting time for NVIDIA, which was right when their fortunes in neural networking/deep learning took off in earnest. Though the Jetson TX1 and underlying Tegra X1 SoC lack the power needed for high-performance use cases – these are after all based on an SoC designed for mobile applications – they have enough power for lower-performance inferencing. As a result, the Jetson TX1 has become an important part of NVIDIA’s neural networking triad, offering their GPU architecture and its various benefits for devices doing inferencing at the “edge” of a system.

Now about a year and a half after the launch of the Jetson TX1, NVIDIA is going to be giving the Jetson platform a significant update in the form of the Jetson TX2. This updated Jetson is not as radical a change as the TX1 before it was – NVIDIA seems to have found a good place in terms of form factor and the platform’s core feature set – but NVIDIA is looking to take what worked with TX1 and further ramp up the performance of the platform.

The big change here is the upgrade to NVIDIA’s newest-generation Parker SoC. While Parker never made it into third-party mobile designs, NVIDIA has been leveraging it internally for the Drive system and other projects, and now it will finally become the heart of the Jetson platform as well. Relative to the Tegra X1 in the previous Jetson, Parker is a bigger and better version of the SoC. The GPU architecture is upgraded to NVIDIA’s latest-generation Pascal architecture, and on the CPU side NVIDIA adds a pair of Denver 2 CPU cores to the existing quad-core Cortex-A57 cluster. Equally important, Parker finally goes back to a 128-bit memory bus, greatly boosting the memory bandwidth available to the SoC. The resulting SoC is fabbed on TSMC’s 16nm FinFET process, giving NVIDIA a much-welcomed improvement in power efficiency.

Paired with Parker on the Jetson TX2 as supporting hardware is 8GB of LPDDR4-3733 DRAM, a 32GB eMMC flash module, a 2×2 802.11ac + Bluetooth wireless radio, and a Gigabit Ethernet controller. The resulting board is still 50mm x 87mm in size, with NVIDIA intending it to be drop-in compatible with Jetson TX1.

Given these upgrades to the core hardware, unsurprisingly NVIDIA’s primary marketing angle with the Jetson TX2 is on its performance relative to the TX1. In a bit of a departure from the TX1, NVIDIA is canonizing two performance modes on the TX2: Max-Q and Max-P. Max-Q is the company’s name for TX2’s energy efficiency mode; at 7.5W, this mode clocks the Parker SoC for efficiency over performance – essentially placing it right before the bend in the power/performance curve – with NVIDIA claiming that this mode offers 2x the energy efficiency of the Jetson TX1. In this mode, TX2 should have similar performance to TX1 in the latter’s max performance mode.

Meanwhile the board’s Max-P mode is its maximum performance mode. In this mode NVIDIA sets the board TDP to 15W, allowing the TX2 to hit higher performance at the cost of some energy efficiency. NVIDIA claims that Max-P offers up to 2x the performance of the Jetson TX1, though as GPU clockspeeds aren’t double TX1’s, it’s going to be a bit more sensitive on an application-by-application basis.

NVIDIA Jetson TX2 Performance Modes
	Max-Q	Max-P	Max Clocks
GPU Frequency	854MHz	1122MHz	1302MHz
Cortex-A57 Frequency	1.2GHz	Stand-Alone: 2GHz w/Denver: 1.4GHz	2GHz+
Denver 2 Frequency	N/A	Stand-Alone: 2GHz w/A57: 1.4GHz	2GHz
TDP	7.5W	15W	N/A

In terms of clockspeeds, NVIDIA has disclosed that in Max-Q mode, the GPU is clocked at 854MHz while the Cortex-A57 cluster is at 1.2GHz. Going to Max-P increases the GPU clockspeed further to 1122MHz, and allows for multiple CPU options; either the Cortex-A57 cluster or Denver 2 cluster can be run at 2GHz, or both can be run at 1.4GHz. Though when it comes to all-out performance, even Max-P mode is below the TX2’s limits; the GPU clock can top out at just over 1300MHz and CPU clocks can reach 2GHz or better. Power states are configurable, so customers can dial in the TDPs and desired clockspeeds they want, however NVIDIA notes that using the maximum clocks goes further outside of the Parker SoC’s efficiency range.

Finally, along with announcing the Jetson TX2 module itself, NVIDIA is also announcing a Jetson TX2 development kit. The dev kit will actually ship first – it ships next week in the US and Europe, with other regions in April – and contains a TX2 module along with a carrier board to provide I/O breakout and interfaces to various features such as the USB, HDMI, and Ethernet. Judging from the pictures NVIDIA has sent over, the TX2 carrier board is very similar (if not identical) to the TX1 carrier board, so like the TX2 itself is should be familiar to existing Jetson developers.

With the dev kit leading the charge for Jetson TX2, NVIDIA will be selling it for $599 retail/$299 education, the same price the Jetson TX1 dev kit launched at back in 2015. Meanwhile the stand-alone Jetson TX2 module will be arriving in Q2’17, priced at $399 in 1K unit quantities. In the case of the module, this means prices have gone up a bit since the last generation; the TX2 is hitting the market at $100 higher than where the TX1 launched.

Gallery: NVIDIA Jetson TX2 Press Deck

Vik
February 23, 2017
Comments are Closed

Samsung Announces Exynos 8895 SoC: 10nm, Mali G71MP20, & LPDDR4x

Even though Mobile World Congress doesn’t kick off for another few days, Samsung isn’t wasting any time in getting started. This morning the company is announcing their latest generation high-end ARM SoC, the Exynos 8895. Their first in-house 10nm SoC, the company isn’t talking about what it will go in, but based on the context of the announcement it’s a safe bet we’re looking at the SoC for at least some SKUs of the next Galaxy S phone.

While Samsung has been in the SoC game with the Exynos series for a number of years now, it’s been in the last few years that they’ve really cemented their positon as a market leader at the high-end. Thanks in part to the company’s 14nm process, the Exynos 7420 proved to be a very capable and powerful SoC from the company. Last year Samsung followed that up with the Exynos 8890, which among other firsts marked Samsung’s entry into designing their own CPU cores with the M1.

Now for 2017 Samsung wants to repeat their success over the past couple of years with the Exynos 9 Series 8895. As you can likely infer from the name, it’s not meant to be radically different from the preceding 8890, but there are still some pretty important changes here that should affect performance across the board.

Samsung Exynos SoCs Specifications
SoC	Exynos 8895	Exynos 8890	Exynos 7420
CPU	4x A53 4x Exynos M2(?)	4x [email protected] 4x Exynos M1 @ 2.3GHz	4x [email protected] 4x [email protected]
GPU	Mali G71MP20	Mali T880MP12 @ 650MHz	Mali T760MP8 @ 770MHz
Memory Controller	2x 32-bit(?) LPDDR4x	2x 32-bit LPDDR4 @ 1794MHz 28.7GB/s b/w	2x 32-bit LPDDR4 @ 1555MHz 24.8GB/s b/w
Storage	eMMC 5.1, UFS 2.1	eMMC 5.1, UFS 2.0	eMMC 5.1, UFS 2.0
Modem	Down: LTE Cat16 Up: LTE Cat13	Down: LTE Cat12 Up: LTE Cat13	N/A
ISP	Rear: 28MP Front: 28MP	Rear: 24MP Front: 13MP	Rear: 16MP Front: 5MP
Mfc. Process	Samsung 10nm LPE	Samsung 14nm LPP	Samsung 14nm LPE

The big deal for Samsung of course is that the Exynos 8895 is their first 10nm SoC, designed by Samsung LSI and fabbed by Samsung. Semantics of what is or isn’t 10nm aside, Samsung’s 10nm LPE process is cutting-edge for a mobile SoC, and relative to the current 14nm process offers better density and better performance characteristics. Samsung has talked about the process a bit in the past, and for the Exynos 8895 announcement they are reiterating that the 10nm LPE process offers “up to 27% higher performance while consuming 40% less power” relative to 14nm. However this may be in error in phrasing on Samsung’s part, as last year it was “27-percent higher performance or 40-percent lower power consumption”, which is a more realistic statement. Either way, for 8895 in particular, Samsung isn’t talking about performance quite yet.

Diving into the specs, the CPU situation looks a great deal like the previous 8890. Samsung has gone with 8 cores – 4 high-power, 4 low-power – with a mix of custom and licensed silicon. The high-power cores are composed of what Samsung is calling a “2^nd generation” custom CPU core. This would presumably be a newer iteration of the M1 (so the M2?), but Samsung isn’t offering up much in the way of details at this time over what’s changed from the M1. What we do know is that Samsung is touting that it offers both better performance and improved energy efficiency. Meanwhile low-power work is once again being provided by ARM’s Cortex-A53. (ed: which on 10nm, must be absolutely tiny, considering that a core was sub-1mm2 on 14nm)

Meanwhile on the GPU side, Samsung has significantly upgraded their graphics capabilities by tapping ARM’s latest-generation Mali-G71 GPU in an MP20 configuration. Based on ARM’s new Bifrost GPU architecture, the G71 radically overhauls the internal workings of the GPU to match the contemporary thread level parallelism (TLP)-centric nature of desktop GPUs and modern workloads. ARM has previously discussed that they expect G71-based devices to offer around 50% better graphics performance than T880 devices, and Samsung is going one step further by touting it as 60% faster performance.

In another first for Samsung, the 8895 is also their first Heterogeneous System Architecture (HSA) compliant SoC. This requires that the CPU, GPU, and interconnect all support HSA, and indeed all of the necessary pieces have come together for 8895. We’ve previously seen that the Mali-G71 GPU is HSA-compliant, and meanwhile for the 8895 Samsung has rolled out a new version of their interconnect (the Samsung Coherent Interconnect) to support HSA. This isn’t a development that I expect will have immediate ramifications, but HSA is ultimately at the core of making it easier for developers to program applications that use the GPU in a compute context, thanks to the common (and common-sense) architecture rules for HSA.

To feed the resulting beast, Samsung has added support for LPDDR4x memory. An extension of the original LPDDR4 standard, LPDDR4x is designed to reduce DRAM power consumption by up to 20% by reducing the output driver power (I/O VDDQ voltage) by 45%, from 1.1 V to 0.6 V. LPDDR4x memory has just started shipping, so along with the previously announced Snapdragon 835, the Exynos 8895 is the other high-performance SoC coming out this year to support the new memory.

The Exynos 8895 is also getting an upgraded ISP. The latest ISP supports 28MP for both the front and rear cameras, while a bit more nebulously, Samsung’s spec sheet also lists support for “28MP+16MP Dual Camera” mode, an unsurprising development given the recent popularity of dual camera phone designs. Diving a bit deeper, we find that the 8895’s ISP is actually two ISPs: a high-performance ISP and a low-power ISP, with the low-power ISP presumably providing the aforementioned 16MP capability. Samsung is touting this combination as allowing them to offer dual camera functionality while still keeping power consumption in check.

On the flip slide of the coin, the Exynos 8895 also gets a new version of Samsung’s video decode block, which the company calls their Multi-Format Codec (MFC). This latest MFC supports all the bells and whistles you’d expect, with both HEVC and VP9 decoding up to 4Kp120. Samsung’s press release also briefly mentions a “video processing technology that enables a higher quality experience by enhancing the image quality” that’s capable of “enhancing the image quality of a specific portion that is perceived more sensitive to the human eye.” Given the VR applications – and Samsung wants to be able to do 4K VR – this sounds a bit like a variation on the idea of foveated rendering, but there aren’t any further details on the technology at this time.

Also appearing for the first time on the Exynos 8895 is Samsung’s Cat16 LTE modem design. With their modem Samsung is using 5x Carrier Aggregation to achieve up to 1Gbps down, while uploading is rated at LTE Cat 13, using 2 carriers to get 150Mbps up. What’s notable here is that, as best as I can tell, this is the first modem using 5x CA; Qualcomm’s equivalent modem, the X16, uses 3 or 4x CA depending on the scenario. Unfortunately with the limited details Samsung offers right now, I’m not sure whether they have to use 5x CA to get Cat 16 bandwidth, or this is just another optional mode.

Finally, the Exynos 8895 also includes what Samsung is calling an “enhanced security sub-system with a separate security processing unit” for use with user authentication, mobile payments, and the like. Based on Samsung’s description this sounds a heck of a lot like Apple’s Secure Enclave, which would be a very welcome development, as in Apple’s case it has made their phones a lot harder to break into.

Wrapping things up, along with today’s product announcement of the Exynos 8895, Samsung is also announcing that the SoC is in mass production; and indeed I would be surprised if this isn’t the SoC they announced back in October, which would mean it’s been in production for some time now. We still don’t know when we’re going to see the next Samsung Galaxy S phone, but given how Samsung is announcing the SoC in this fashion, clearly it’s going to be sooner than later. In the meantime, hopefully we’ll get some additional SoC details next week at MWC.

Vik
February 23, 2017
Comments are Closed

Samsung Announces Exynos 8895 SoC: 10nm, Mali G71MP20, & LPDDR4x

Even though Mobile World Congress doesn’t kick off for another few days, Samsung isn’t wasting any time in getting started. This morning the company is announcing their latest generation high-end ARM SoC, the Exynos 8895. Their first in-house 10nm SoC, the company isn’t talking about what it will go in, but based on the context of the announcement it’s a safe bet we’re looking at the SoC for at least some SKUs of the next Galaxy S phone.

While Samsung has been in the SoC game with the Exynos series for a number of years now, it’s been in the last few years that they’ve really cemented their positon as a market leader at the high-end. Thanks in part to the company’s 14nm process, the Exynos 7420 proved to be a very capable and powerful SoC from the company. Last year Samsung followed that up with the Exynos 8890, which among other firsts marked Samsung’s entry into designing their own CPU cores with the M1.

Now for 2017 Samsung wants to repeat their success over the past couple of years with the Exynos 9 Series 8895. As you can likely infer from the name, it’s not meant to be radically different from the preceding 8890, but there are still some pretty important changes here that should affect performance across the board.

Samsung Exynos SoCs Specifications
SoC	Exynos 8895	Exynos 8890	Exynos 7420
CPU	4x A53 4x Exynos M2(?)	4x [email protected] 4x Exynos M1 @ 2.3GHz	4x [email protected] 4x [email protected]
GPU	Mali G71MP20	Mali T880MP12 @ 650MHz	Mali T760MP8 @ 770MHz
Memory Controller	2x 32-bit(?) LPDDR4x	2x 32-bit LPDDR4 @ 1794MHz 28.7GB/s b/w	2x 32-bit LPDDR4 @ 1555MHz 24.8GB/s b/w
Storage	eMMC 5.1, UFS 2.1	eMMC 5.1, UFS 2.0	eMMC 5.1, UFS 2.0
Modem	Down: LTE Cat16 Up: LTE Cat13	Down: LTE Cat12 Up: LTE Cat13	N/A
ISP	Rear: 28MP Front: 28MP	Rear: 24MP Front: 13MP	Rear: 16MP Front: 5MP
Mfc. Process	Samsung 10nm LPE	Samsung 14nm LPP	Samsung 14nm LPE