big.LITTLE 处理的设计旨在为适当的作业分配恰当的处理器。Cortex-A15 处理器是目前已开发的性能最高的低功耗 ARM 处理器，而 Cortex-A7 处理器是目前已开发的最节能的 ARM 应用程序处理器。可以利用 Cortex-A15 处理器的性能来承担繁重的工作负载，而 Cortex-A7 可以最有效地处理智能手机的大部分工作负载。这些操作包括操作系统活动、用户界面和其他持续运行、始终连接的任务。
For big.LITTLE processing to be invisible to software and fast enough to migrate execution opportunistically to the right sized core, the big and LITTLE processors being paired must be fully architecturally compatible - they must run all the same instructions and support the same extensions such as virtualization, large physical addressing, etc.
The first such pairing is between the Cortex-A15 and the Cortex-A7 processors, where the big cluster of CPUs and the LITTLE CPUs can contain one to four CPUs in each, enabling big.LITTLE eight core designs, smart quad core designs with two of each processor type, or an asymmetric mix like four LITTLE cores and two big cores.
Big.LITTLE System Diagram
Both the Cortex-A15 and Cortex-A7 processors are available to partners now, and available in production separately with first big.LITTLE silicon now being demonstrated by lead licensees. The second big.LITTLE pairing is between the Cortex-A57 and the Cortex-A53 processors, successors to the Cortex-A15 and Cortex-A7 processors respectively. These cores, announced in 2012, will be available to ARM lead licensees in mid 2013, and can be combined over ARM CoreLink™ CCI-400 or other cache coherent interconnect in the same way. They both increase performance while retaining the same power efficiency as their predecessor, and both introduce 64-bit support via the ARMv8 architecture, in addition to full backwards compatibility to 32-bit ARMv7 architecture with the virtualization and large addressability extensions of the latest version of ARMv7.
Future ARM cores will also be capable of combining with these first four in big.LITTLE processor SoCs.
Software can control the allocation of threads of execution to the appropriate core, or in some versions of the software simply move the whole processor context up to big or down to LITTLE based on measured load. There are two software approaches to handling the CPU selection decision, described below. In both software approaches, cache coherence is required to enable the software to quickly move execution from LITTLE to big and from big to LITTLE as appropriate. Cache coherence allows one CPU cluster to look up in the caches of the other CPU cluster, and full hardware cache coherence between the two clusters is key to making big.LITTLE software fast and transparent. Cache coherence can be provided by the ARM CCI-400 cache coherent interconnect or any interconnect that follows the AMBA4 ACE protocol.
In a big.LITTLE SoCs, the OS kernel dynamically and seamlessly moves tasks between the 'big' and 'LITTLE' CPUs. In reality this is an extension of the operating system power management software in wide use today on mobile phone SoCs.
Most OS kernels already support Symmetric Multi-core Processing (SMP) and those techniques can easily be extended to support big.LITTLE systems. There are two main variants of big.LITTLE software scheduling.
big.LITTLE CPU Migration
In CPU migration a whole workload of a CPU gets move to a differently CPU, once the OS detects it requires more or less performance. This builds on generic techniques in an OS to wake up and put to sleep CPUs in an SMP system. The key extension is around the detection that a CPU is running at maximum frequency while still requesting further performance and thus the workload needs to be moved to a ‘bigger’ CPU. Once the workload has reduced, it can moved back to a ‘smaller’ CPU.
This CPU migration software is available today from Linaro, and is being actively developed by multiple ARM partners.
Task migration (aka big.LITTLE MP) detects a high intensity task and will schedule that onto a ‘big’ CPU. Similarly it will detect a low intensity task and move this back to a ‘LITTLE’ core.
The advantage of task migration over CPU migration is that a system can benefit from all its CPU at the same time, if the processing demands are extremely high. For example in a 2x ‘big’ + 2x ‘LITTLE’ system all 4 CPUs can be used at peak demand times, where as CPU migration would only be able to use 2 CPUs.
ARM and Linaro have been developing Linux support for both migration models. For more information go to:
big.LITTLE CPU migration - https://wiki.linaro.org/WorkingGroups/KernelArchived/Big.Little.Switcher
big.LITTLE task migration, see https://wiki.linaro.org/WorkingGroups/PowerManagement/Big.Little.MP
The ARM CoreLink™ CCI-400 Cache Coherent Interconnect provides full cache coherency between two clusters of multi-core CPUs, such as the ARM Cortex-A15, and Cortex-A7 processors enabling big.LITTLE.
The CoreLink CCI-400 enables system coherency in heterogeneous multicore and multi-cluster CPU/GPU systems, such as those required for the networking and high-performance computation markets, by enabling each processor in the system to access the other processor caches. This reduces the need to access off-chip memory, saving time and energy, which is a key enabler in systems based on ARM big.LITTLE™ processing.
The ARM CoreLink CCN-504 Cache Coherent Network offers scaling to 16 processor cores to give system architects an optimal solution for enterprise applications including servers and network infrastructure.
CoreLink CCN-504 can deliver up to one Terabit of usable system bandwidth per second. It will enable designers to provide high-performance, cache coherent interconnect for ‘many-core’ enterprise solutions built using the ARM Cortex-A15 MPCore processor and the latest ARM Cortex-A50 series processors with 64-bit support.
The ARM Development Studio 5 (DS-5™) toolchain is a suite of professional software development tools for ARM processors and extends its world-leading capabilities to the big.LITTLE performance analysis and debug.
The DS-5™ toolchain enables engineers to develop robust and highly optimized embedded software for ARM application processors, and comprises tools such as the best-in-class ARM C/C++ Compiler, a powerful Linux/Android™/RTOS-aware debugger, the ARM Streamline™ system-wide performance analyzer and real-time system model simulators
ARM Fast Models provide the necessary models for constructing virtual platforms of ARM big.LITTLE processing-based systems along with templates of popular configurations. Customization of model content and configuration of items such as memory map and interrupt map, and the ability to export the platform to SystemC/TLM environments are supported.
Fast models are available for the Cortex-A15 and Cortex-A7 processors and the CoreLink CCI-400
基于 Cortex-A15 和 Cortex-A7 处理器的 big.LITTLE 处理 (150Kb PDF)