ARM Launches DynamIQ: big.Little to Eight Cores Per Cluster
Most users delving into SoCs know about ARM core designs over the years. Initially we had single CPUs, then paired CPUs and then quad-core processors, using early ARM cores to help drive performance. In October 2011, ARM introduced big.Little – the ability to use two different ARM cores in the same design by typically pairing a two or four core high-performance cluster with a two or four core high-efficiency cluster design. From this we have offshoots, like MediaTek’s tri-cluster design, or just wide core mesh designs such as Cavium’s ThunderX. As the tide of progress washes against the shore, ARM is today announcing the next step on the sandy beach with DynamIQ.
The underlying theme with DynamIQ is heterogeneous scalability. Those two words hide a lot of ecosystem jargon, but as ARM predicts that another 100 billion ARM chips will be sold in the next five years, they pin key areas such as automotive, artificial intelligence and machine learning at the interesting end of that growth. As a result, performance, efficiency, scalability, and latency are all going to be key metrics moving forward that DynamIQ aims to facilitate.
The first stage of DynamIQ is a larger cluster paradigm – which means up to eight cores per cluster. But in a twist, there can be a variable core design within a cluster. Those eight cores could be different cores entirely, from different ARM Cortex-A families in different configurations.
Many questions come up here, such as how the cache hierarchy will allow threads to migrate between cores within a cluster (perhaps similar to how threads migrate between clusters on big.Little today), even when cores have different cache arrangements. ARM did not yet go into that level of detail, however we were told that more information will be provided in the coming months.
Each variable core-configuration cluster will be a part of a new fabric, with uses additional power saving modes and aims to provide much lower latency. The underlying design also allows each core to be controlled independently for voltage and frequency, as well as sleep states. Based on the slide diagrams, various other IP blocks, such as accelerators, should be able to be plugged into this fabric and benefit from that low latency. ARM quoted elements such as safety critical automotive decisions can benefit from this.
One of the focus areas from ARM’s presentation was one of redundancy. The new fabric will allow a seemingly unlimited number of clusters to be used, such that if one cluster fails the others might take its place (or if an accelerator fails). That being said, the sort of redundancy that some of the customers of ARM chips might require is fail-over in the event of physical damage, such as automotive car control is retained if there are >2 ‘brains’ in the vehicle and there is an impact which disables one. It will be interesting to see if ARM’s vision for DynamIQ extends to that level of redundancy at the SoC level, or if it will be up to ARM’s partners to develop on the top of DynamIQ.
Along with the new fabric, ARM stated that a new memory sub-system design is in place to assist with the compute capabilities, however nothing specific was mentioned. Along the lines of additional compute, ARM did state that new dedicated processor instructions (such as limited precision math) for artificial intelligence and machine learning will be integrated into a variant of the ARMv8 architecture. We’re unsure if this is an extension of ARMv8.2-A, which introduced half-precision for data processing, or a new version. ARMv8.2-A also adds in RAS features and memory model enhancements, which coincides with the ‘new memory sub-system design’ mentioned earlier. When asked about which cores can use DynamIQ, ARM stated that new cores would be required. Future cores will be ARMv8.2-A compliant and will be able to be part of DynamIQ.
ARM’s presentation focused mainly on DynamIQ for new and upcoming technologies, such as AI, automotive and mixed reality, although it was clear that DynamIQ can be used with other existing edge-case use models, such as tablets and smartphones. This will depend on how ARM supports current core designs in the market (such as updates to A53, A72 and A73) or whether DynamIQ requires separate ARM licenses. We fully expect any new cores announced from this point on will support the technology, in the same way that current ARM cores support big.Little.
So here’s some conjecture. A future tablet SoC uses DynamIQ, which consists of two high-powered cores, four mid-range cores, and two low-power cores, without a dual cluster / big.Little design. Either that or all three types of cores are on different clusters altogether using the new topology. Actually, the latter sounds more feasible from a silicon design standpoint, as well as software management. That being said, the spec sheet of any future design using DynamIQ will now have to list the cores in each cluster. ARM did state that it should be fairly easy to control which cores are processing which instruction streams in order to get either the best power or the best efficiency as needed.
ARM states that more information is to come over the next few months.