Cortex-M4 Processor

Cortex-M4 Processor Image (View Larger Cortex-M4 Processor Image)
The ARM® Cortex®-M4 processor is the latest embedded processor by ARM specifically developed to address digital signal control markets that demand an efficient, easy-to-use blend of control and signal processing capabilities.

The combination of high-efficiency signal processing functionality with the low-power, low cost and ease-of-use benefits of the Cortex-M family of processors is designed to satisfy the emerging category of flexible solutions specifically targeting the motor control, automotive, power management, embedded audio and industrial automation markets.


Energy efficient digital signal control

The Cortex-M4 processor has been designed with a large variety of highly efficient signal processing features applicable to digital signal control markets.  The Cortex-M4 processor features extended single-cycle multiply accumulate (MAC) instructions, optimized SIMD arithmetic, saturating arithmetic instructions and an optional single precision Floating Point Unit (FPU).  These features build upon the innovative technology that characterizes the ARM Cortex-M processor family.

Responsiveness and low power

In common with the other members of the Cortex-M family of processors, the Cortex-M4 has integrated sleep modes and optional state retention capabilities which enable high performance at a low level of power consumption. The processor executes the Thumb®-2 instruction set for optimal performance and code size, including hardware division, single cycle multiply, and bit-field manipulation. The Cortex-M4 Nested Vectored Interrupt Controller is highly configurable at design time to deliver up to 240 system interrupts with individual priorities, dynamic reprioritization and integrated system clock.       

Easy-to-use technology

The Cortex-M4 makes signal processing algorithm development easy through an excellent ecosystem of software tools and the Cortex Microcontroller Software Interface Standard (CMSIS) .



ARM Cortex-M4 Specification

ARM Cortex-M4 Features
ISA Support Thumb® / Thumb-2
DSP Extensions Single cycle 16,32-bit MAC
Single cycle dual 16-bit MAC
8,16-bit SIMD arithmetic
Hardware Divide (2-12 Cycles)
Floating Point Unit Single precision floating point unit
IEEE 754 compliant
Pipeline 3-stage + branch speculation
Performance Efficiency 3.40 CoreMark/MHz*
Performance Efficiency

Without FPU: 1.25 / 1.52 / 1.91 DMIPS/MHz**
With FPU: 1.27 / 1.55 / 1.95 DMIPS/MHz**

Memory Protection Optional 8 region MPU with sub regions and background region
Interrupts Non-maskable Interrupt (NMI) + 1 to 240 physical interrupts
Interrupt Priority Levels 8 to 256 priority levels
Wake-up Interrupt Controller Up to 240 Wake-up Interrupts
Sleep Modes Integrated WFI and WFE Instructions and Sleep On Exit capability.
Sleep & Deep Sleep Signals.
Optional Retention Mode with ARM Power Management Kit
Bit Manipulation Integrated Instructions & Bit Banding
Debug Optional JTAG & Serial-Wire Debug Ports. Up to 8 Breakpoints and 4 Watchpoints.
Trace Optional Instruction Trace (ETM), Data Trace (DWT), and Instrumentation Trace (ITM)

* see: http://www.eembc.org/benchmark/reports/benchreport.php?benchmark_seq=1448&suite=CORE

** The first result abides by all of the “ground rules” laid out in the Dhrystone documentation, the second permits inlining of functions, not just the permitted C string libraries, while the third additionally permits simultaneous (”multi-file”) compilation. All are with the original (K&R) v2.1 of Dhrystone

ARM Cortex-M4 Implementation Data***
7-track, typical 1.8v, 25C)
7-track, typical 1.2v, 25C)
9-track, typical 0.9v, 25C)
Dynamic Power 157 µW/MHz 33 µW/MHz 8 µW/MHz
Floorplanned Area 0.56 mm2 0.17 mm2 0.04 mm2

*** Base usable configuration includes DSP extensions, 1 IRQ + NMI, excludes ETM, MPU, FPU and debug

ARM Cortex-M technologies

Each Cortex-M series processor delivers specific benefits, underpinned by fundamental technologies that make Cortex-M processors ideal for a broad range of embedded applications.

RISC processor core Thumb-2® technology
  • High performance 32-bit CPU
  • Deterministic operation
  • Compact, low latency pipeline
  • Optimal blend of 16/32-bit instructions
  • 30% smaller code size than 8-bit devices
  • No compromise on performance

Tools and RTOS support CoreSight debug and trace

Low power modes Nested Vectored Interrupt Controller (NVIC)
  • Integrated sleep state support
  • Multiple power domains
  • Architected software control
  • Low latency, low jitter interrupt response
  • No need for assembly programming
  • Interrupt service routines in pure C


The ARM Cortex Microcontroller Software Interface Standard (CMSIS) is a vendor-independent hardware abstraction layer for the Cortex-M processor series.The CMSIS enables consistent and simple software interfaces to the processor for interface peripherals, real-time operating systems, and middleware, simplifying software re-use. With a reduced learning curve for new microcontroller developers, CMSIS shortens the time to market for new products.

In-depth: Nested Vectored Interrupt Controller (NVIC)

The NVIC is an integral part of all Cortex-M processors and provides the processors' outstanding interrupt handling abilities. In the Cortex-M0, Cortex-M0+ and Cortex-M1 processors, the NVIC support up to 32 interrupts (IRQ), a Non-Maskable Interrupt (NMI) and various system exceptions. The Cortex-M3 and Cortex-M4 processors extend the VIC to support up to 240 IRQs, 1 NMI and further system exceptions.

Most of the NVIC settings are programmable. The configuration registers are part of the memory map and can be accessed as C pointers. The CMSIS library also provided various helper functions to make interrupt control easier.
Inside the NVIC, each interrupt source is assigned an interrupt priority. A few of the system exceptions like such as NMI haves a fixed priority level, and others hashave programmable priority levels. By assigning different priorities to each interrupt, the NVIC can support Nested Interrupts automatically without any software intervention.

The architecture provides 8-bits of priority level settings for each programmable interrupt or exception. To reduce gate count, only parts of these registers are implemented. In the Cortex-M0, Cortex-M0+ and Cortex-M1 processors (ARMv6-M architecture), 4 programmable levels are provided. In the Cortex-M3 and Cortex-M4 processors (ARMv7-M architecture), the designs allow from 8 priority levels to 256 levels.

To make the Cortex-M processors easier to use, the Cortex-M processor uses a stack based exception model. When an exception takes place a number of registers are pushed on to the stack. These registers are restored to their original values when the exception handler completes. This allows the exception handlers to be written as normal C functions, and also reduce the hidden software overhead ofin interrupt processing.

In addition, the Cortex-M processors use a vector table that contains the address of the function to be executed for eacha particular interrupt handler. On accepting an interrupt, the processor fetches the address from the vector table. Again, this avoids software overhead and reduces interrupt latency.

Various optimization techniques are also used in the Cortex-M processor implementationss to make interrupt processing more efficiency and make the system more responsive:

Tail chaining – If another exception is pending when an ISR exits, the processor does not restore all saved registers from the stack and instead moves on to the next ISR. This reduces the latency when switching from one exception handler to another.

Stack pop pre-emption – If another exception occurs during the unstacking process of an exception, the processor abandons the stack Pop and services the new interrupt immediately as shown above. By pre-empting and switching to the second interrupt without completing the state restore and save, the NVIC achieves lower latency in a deterministic manner.

Late arrival – If a higher priority interrupt arrives during the stacking of a lower priority interrupt, the processor fetches a new vector address and processes the higher priority interrupt first.

With these optimizations, the interrupt overhead reduces as the interrupt loading increases, allowing high interrupt processing throughput in embedded systems.

System IP

System IP components are essential for building complex system on chips and by utilizing System IP components developers can significantly reduce development and validation cycles, saving cost and reducing time to market.

Description AMBA Bus System IP Components
AMBA DMA Controllers AHB DMA Controller

Physical IP
ARM® Physical IP Platforms deliver process optimized IP, for best-in-class implementations of the Cortex-M4 processor.
Standard Cell Logic Libraries Available in a variety of different architectures ARM Standard Cell Libraries support a wide performance range for all types of designs. Designers can choose between different libraries and optimize their designs for speed, power and/or area
Memory Compilers and Registers A broad array of silicon proven SRAM, Register File and ROM memory compilers for all types of designs ranging from performance critical to cost sensitive and low power applications
Interface Libraries A broad portfolio of silicon-proven Interface IP designed to meet varying system architectures and standards. General Purpose I/O, Specialty I/O, High Speed DDR and Serial Interfaces are optimized to deliver high data throughput performance with low pin counts.

Tools Support for Cortex-M4

ARM DS-5 Development Studio supports all ARM processors and IP, including the Cortex-M4. Incorporating a powerful debugger, whose intuitive graphical environment enables fast debugging of bare-metal, Linux and Android software, as well as providing real-time operating system awareness for a range of popular RTOSs.

Ideal for SoC software development where an MCU is present alongside Cortex-A series processors, DS-5 also provides the Streamline performance analyzer for identifying code hot spots and bottlenecks.

DS-5 includes ARM Compiler 5, allowing efficient and highly optimized code generation for the Cortex-M4.

For software development on off-the-shelf Cortex-M4 MCUs, Keil offers a comprehensive tool suite, including the uVision IDE/debugger, ARM Compiler 5 and essential middleware components.

In this section you will find useful documentation, white papers and tutorials on ARM Cortex-M processors and related technologies. For further information including information on development tools, software, boards, and a device database, CMSIS and mBed visit the ARM Embedded Microsite.

ARM Connected Community

Cortex-M0 related blogs, discussions, technical content


Definitive Guide to the ARM Cortex-M0
A comprehensive guide to programming and implementing the groundbreaking ARM Cortex-M0 processor
Definitive Guide to the ARM Cortex-M3
A comprehensive guide to programming and implementing the groundbreaking ARM Cortex-M3 processor

Documentation for Cortex-M device users

Software development tools for Cortex-M device users

Find Cortex-M based microcontrollers


Related User Guides and App Notes

Cortex-M0/3/4 Devices Generic User Guides

Instruction Timing Information

Architecture (requires registration)

ARM Application Notes

Keil Application Notes

DesignStart for Processor IP



We use cookies to give you the best experience on our website. By continuing to use our site you consent to our cookies.

Change Settings

Find out more about the cookies we set