Cortex-A57 Processor

Cortex-A57 Processor Image (View Larger Cortex-A57 Processor Image)
The ARM® Cortex®-A57 processor is ARM’s highest performing processor, designed to further extend the capabilities of future mobile and enterprise computing applications including compute intensive 64-bit applications such as high end computer, tablet and server products.

The processor can be implemented individually or paired with the Cortex-A53 processor into an ARM big.LITTLE configuration that enables scalable performance and optimal energy-efficiency.


Smartphones are transitioning from content consumption devices to content creation devices. Now that smartphones are able to capture high-quality video and photographs, consumers want to edit and share this content, driving the need for further processing power delivered by the Cortex-A57 processor. Content creation is not limited to multimedia, but also documents. 

A Cortex-A57 processor-based smartphone, wirelessly connected to a screen, keyboard and mouse, delivers a full laptop experience that consumers receive from their typical laptop today.

The Cortex-A57 processor:

  • Can deliver all the compute capability a typical consumer needs, from replacing your gaming console to your laptop in innovative portable form factors
  • Efficiently runs legacy ARM 32-bit applications
  • Features cache coherent interoperability with ARM Mali™ family graphics processing units (GPUs) for GPU compute applications
  • Offers optional reliability and scalability features for high-performance enterprise applications
  • Connects seamlessly to ARM interconnect IP with up to 16 core configurations, and more in the future

The Cortex-A57 delivers significantly more performance than the Cortex-A15, at a higher level of power efficiency. The improvement on web browsing and integer workloads are shown below. For memory-intensive workloads, performance is further improved by the enhanced microarchitecture of the Cortex-A57. The inclusion of cryptography extensions improves performance on cryptography algorithms by 10 times over the current generation of processors.

Cortex-A57 performance  

Cortex-A57 MPCore
Architecture ARMv8-A
  • 1-4X SMP within a single processor cluster
  • Multiple coherent SMP processor clusters through AMBA® 5 CHI or AMBA® 4 ACE technology
ISA Support
  • AArch32 for full backward compatibility with ARMv7
  • AArch64 for 64-bit support and new architectural features
  • TrustZone® security technology
  • NEON™ Advanced SIMD
  • DSP & SIMD extensions
  • VFPv4 Floating point
  • Hardware virtualization support
Debug & Trace CoreSight™ DK-A57

Cortex-A57 Architectural Features
Feature Benefits AArch32 AArch64
ARMv8 architecture 64 and 32-bit execution states for scalable high performance Yes Yes
Hardware-accelerated cryptography 3-10x better software encryption performance Useful for small granule decrypt/encrypt too small to efficiently offload to HW accelerator (e.g. https) Yes Yes
Floating Point Hardware support for floating point operations in half-, single- and double-precision floating point arithmetic. Now with IEE754-2008 enhancements Yes Yes
Load Acquire, Store Release instructions Designed for C++11, C11, Java memory models. Improves performance of thread-safe code by eliminating explicit memory barrier instructions Yes Yes
Hardware Virtualization Enables multiple software environments and their applications to simultaneously access the system capabilities Yes Yes
Large Physical Address Reach Enables the processor to access beyond 4GB of physical memory. Yes Yes
Automatic event signalling For power-efficient, high-performance spinlocks Yes Yes
Double Precision Floating Point SIMD Allows SIMD vectorization to be applied to a much wider set of algorithms (e.g. scientific / High Performance Computing (HPC) and supercomputer) No Yes
64b Virtual address reach Enables virtual memory beyond 4GB 32b limit. Important for modern desktop and server software using memory mapped file i/o, sparse addressing. No Yes
Larger register files 31 x 64-bit general purpose registers: increases performance, reduces stack use. Fewer stack spills, enabling more aggressive compilers. SIMD usable for more applications, e.g. HPC No Yes
Efficient 64-bit immediate generation Less need for literal pools No Yes
Large PC-relative addressing range (+/-4GB) for efficient data addressing within shared libraries and position-independent executables No Yes
Tagged Pointers Useful for dynamically typed languages such as JavaScript, and for garbage collection No Yes
64k pages Reduce TLB miss rates and depth of page walks No Yes
New exception model Reduces OS and Hypervisor software complexity No Yes
Enhanced Cache management User space cache operations improve dynamic code generation efficiency, Data Cache Zero for fast clear No Yes

Cortex-A57 Microachitectural Features
Feature Benefit
Deeply Out of Order Pipeline Increased actual instruction throughput in broader range of scenarios; in cases where instructions are blocked on a dependency the processor can look for other instructions to run. Full out-of-order scheduling on all execution paths allows more types of instructions to be re-ordered, keeping the back end of the pipeline full more of the time. Support for high-bandwidth out of order back-end, 128 in-flight instructions, instruction-result handling optimized for 32b/64b operands
Wide multi-issue capability Increased peak instruction throughput via duplication of execution resources. Power-optimized instruction decode with localized decoding, 3-wide decode bandwidth High-capacity register renaming provides 3-wide, large-instruction rename bandwidth. Support for 8 issue slots and up to 128 instructions in flight
16-way associative, banked L2 cache Performance optimized L2 cache design allows more than one CPU in the cluster to access the L2 at the same time. Sophisticated per-core hardware prefetch units improve memory loads into L2. A balanced design approach allows reduced latency and lower power in the L2 subsystem.
1024 entry mail TLB Improved performance on code with complex memory access patterns, e.g. web browsing.
Large uTLBs 48 entry I-side uTLB allows large set of pages to be handled very quickly by the memory management unit. 32-entry fully-associative D-TLBs (with large-page support) are more responsive to modern memory access patterns.
Advanced Branch Predictor 2K-4K Branch Target Buffer (BTB) with zero-cycle taken-branch penalty minimizes pipeline flushes. Sophisticated indirect-predictor w/ path-history increases branch hit rate. Dedicated branch resolution unit enables fully out of order branch execution. Also includes a high-performance mispredict-recovery microarchitecture.
Optimized D-Size memory system Sophisticated multi-stream L1 hardware prefetcher, exhaustive store/data-forwarding capabilities increase data throughput to the main datapath.
Extensive power-saving features Way-prediction, tag-reduction, cache-lookup suppression, and other features minimize dynamic power.

Advanced MultiCore Features
The processor also utilizes the widely established ARM MPCore multicore technology, enabling performance scalability and control over power consumption to exceed the performance of today's comparable high-performance devices while remaining within tight mobile power constraints. Multicore processing provides the ability for any of the four component processors, within a cluster, to shut down when not in use, for instance when the device is in standby mode, to save power. When higher performance is required, every processor is in use to meet the demand while still sharing the workload to keep power consumption as low as possible.
Snoop Control Unit The SCU is responsible for managing the interconnect, arbitration, communication, cache-2-cache and system memory transfers, cache coherence and other capabilities for the processor. The Cortex-A57 processor also exposes these capabilities to other system accelerators and non-cached DMA driven peripherals to increase performance and reduce system wide power consumption. This system coherence also reduces software complexity involved in maintaining software coherence within each OS driver.
Accelerator Coherence Port This AMBA 4 AXI™ compatible slave interface on the SCU provides an interconnect point for masters that are interfaced directly with the Cortex-A15 processor. This interface supports all standard read and write transactions without additional coherence requirements. However, any read transactions to a coherent region of memory will interact with the SCU to test whether the information is already stored in the L1 caches. The SCU will enforce write coherence before the write is forwarded to the memory system and may allocate into the L2 cache, removing the power and performance impact of writing directly to off chip memory
Generic Interrupt Controller Implementing the standardized and architected interrupt controller, the GIC provides a rich and flexible approach to inter-processor communication and the routing and prioritization of system interrupts. Supporting up to 224 independent interrupts, under software control, each interrupt can be distributed across CPU, hardware prioritized, and routed between the operating system and TrustZone software management layer. This routing flexibility and the support for virtualization of interrupts into the operating system, provides one of the key features required to enhance the capabilities of a solution utilizing a hypervisor.

The Cortex-A57 MPCore processor incorporates a broad range of ARM technology including System IP, Physical IP, and development tools that also provide support. A broad range of SoC and software design solutions, tools and services from the ARM Connected Community™ compliment this technology. That provides ARM Partners with a smooth path through the development, verification and production of full function, compelling devices while significantly reducing time-to-market.

System IP

The ARM CoreLink™ interconnect and memory controller system IP addresses the critical challenge of efficiently moving and storing data between up to 16 Cortex-A series processors, high-performance media processors and dynamic memories to optimize the system performance and power consumption of the SoC. The CoreLink system IP enables SoC designers to maximize the utilization of system memory bandwidth and reduce static and dynamic latencies. While the ARM CoreSight technology provides complete on-chip debug and correlated, real-time trace visibility for all cores of the Cortex-A57 MPCore processor, reducing risk and speeding development of high-quality multiprocessing software. The new ARM CoreLink CCN-504 Cache Coherent Network provides optimum system bandwidth and latency. The CCN-504 provides AMBA 4 AXI™ Coherency Extensions (ACE) compliant ports for full coherency between multiple Cortex-A series processors, better utilizing caches and simplifying software development. This feature is essential for high-bandwidth applications including gaming, servers and networking that require clusters of coherent single and multicore processors. Combined with the ARM CoreLink network interconnect and memory controller IP, the CCN increases system performance and power efficiency.

The CoreLink CCI-400 Cache Coherent Interconnect provides the big.LITTLE interconnect for use with Cortex-A53 for client applications including smartphone and tablet application processors.

Physical IP

ARM Physical IP Platforms deliver process optimized IP, for best-in-class implementations of the Cortex-A57 processor at 20nm and below. A set of high-performance POP™ IP containing advanced ARM Physical IP for 28nm technologies supports the Cortex-A57, to enable rapid development of leadership physical implementations. ARM is also working early to assure a roadmap to 20nm optimizations. POP IP supports the ARM strategy of offering specifically targeted Physical IP to enable Partners to achieve tuned implementations of ARM cores. ARM is uniquely able to design the optimization packs in parallel with the Cortex-A57 MPCore processor architecture, enabling the processor and physical IP combination to deliver workstation-class performance in a mobile power envelope while facilitating rapid time-to-market.

Tools Support

ARM DS-5 Development Studio fully supports all ARM processors as well as a wide range of third party tools, operating systems and EDA flows. DS-5 is unique in its ability to provide solutions that take full advantage of the complete ARM technology portfolio, offering a comprehensive range of software tools to create, debug and optimize systems based on the Cortex-A57 MPCore processor.

It incorporates DS-5 Debugger, whose powerful and intuitive graphical environment enables fast debugging of bare-metal, Linux and Android native applications. DS-5 Debugger provides pre-defined configurations for Fixed Virtual Platforms and ARM Versatile Express boards, enabling early software development before silicon availability.

ARM Compiler 6, available in DS-5 Ultimate Edition, provides a next-generation LLVM-based toolchain for ARMv8 development.

In addition, Streamline performance analyzer simplifies the identification of hot spots in software and load balancing between cores and clusters with a brilliantly intuitive graphical display.

Graphics Processors

The Mali™ family of products combine to provide the complete graphics stack for all embedded graphics needs, enabling device manufacturers and content developers to deliver the highest quality, cutting-edge graphics solutions across the broadest range of consumer devices.


ARM training courses and Active Assist on-site system-design advisory services enable licensees to integrate efficiently the Cortex-A57 MPCore processor into their design to realize maximum system performance with lowest risk and fastest time-to-market.


We use cookies to give you the best experience on our website. By continuing to use our site you consent to our cookies.

Change Settings

Find out more about the cookies we set