ARM The Architecture For The Digital World  

Cortex-M3 Processor

Cortex-M3 Processor Image
The ARM Cortex™-M3 processor is the industry-leading 32-bit processor for highly deterministic real-time applications and has been specifically developed to enable partners to develop high-performance low-cost platforms for a broad range of devices including microcontrollers, automotive body systems, industrial control systems and wireless networking and sensors. The processor delivers outstanding computational performance and exceptional system response to events while meeting the challenges of low dynamic and static power constraints. The processor is highly configurable enabling a wide range of implementations from those requiring memory protection and powerful trace technology through to extremely cost sensitive devices requiring minimal area.
 


Why Cortex-M3

Delivering higher performance and richer features

Introduced in 2004 and recently updated with new technologies and configurability, the Cortex-M3 is the mainstream ARM processor developed specifically with microcontroller applications in mind.

Performance and Energy Efficiency 

With high performance and low dynamic power consumption the Cortex-M3 processor delivers leading power efficiency 12.5 DMIPS/mW based on 90nmG. Coupled with integrated sleep modes and optional state retention capabilities the Cortex-M3 processor ensures there is no compromise for applications requiring low power and excellent performance.

Full featured

The processor executes Thumb®-2 instruction set for optimal performance and code size, including hardware division, single cycle multiply, and bit-field manipulation. The Cortex-M3 NVIC is highly configurable at design time to deliver up to 240 system interrupts with individual priorities, dynamic reprioritization and integrated system clock.

Rich connectivity

The combination of features and performance enables Cortex-M3 based devices to efficiently handle with multiple I/O channels and protocol standards such as USB OTG (On-The-Go). 


Cortex-M3 Features
ArchitectureARMv7-M (Harvard)
ISA SupportThumb® / Thumb-2
Pipeline3-stage + branch speculation
Dhrystone1.25 DMIPS/MHz
Memory ProtectionOptional 8 region MPU with sub regions and background region
InterruptsNon-maskable Interrupt (NMI) + 1 to 240 physical interrupts
Interrupt Latency12 cycles
Inter-Interrupt Latency6 cycles
Interrupt Priority Levels8 to 256 priority levels
Wake-up Interrupt ControllerUp to 240 Wake-up Interrupts
Sleep Modes

Integrated WFI and WFE Instructions and Sleep On Exit capability.

Sleep & Deep Sleep Signals.

Optional Retention Mode with ARM Power Management Kit

Bit ManipulationIntegrated Instructions & Bit Banding
Enhanced InstructionsHardware Divide (2-12 Cycles) & Single-Cycle (32x32) Multiply.
DebugOptional JTAG & Serial-Wire Debug Ports. Up to 8 Breakpoints and 4 Watchpoints.
TraceOptional Instruction Trace (ETM),  Data Trace (DWT), and Instrumentation Trace (ITM)

Performance characteristics quoted for a 100MHz target implementation on the TSMC 0.18G process

* Does not include optional system peripherals (MPU & ETM) or integration level components


Cortex-M3 Performance, Power & Area
ProcessTSMC 180nm GTSMC 90nm G

Optimization Type

Speed OptimizedArea OptimizedSpeed OptimizedArea Optimized
Standard Cell LibraryARM SC7ARM SC7ARM SC9ARM SC9
Performance (Total DMIPS)1257534075
Frequency (MHz)1005027550
Power Efficiency (DMIPS/mW)3.756.25TBD12.5
Area (mm2)0.370.250.0830.047

Core area, frequency range and power consumption are dependent on process, libraries and optimizations. The numbers quoted above are illustrative of synthesized cores using general purpose TSMC process technologies and ARM Physical IP standard cell libraries and RAMs. Area numbers include the CM3Core, the Nested Vectored Interrupt Controller(NVIC) and Bus Matrix but not the optional components including the Memory Protection Unit, Embedded Trace Macrocell, Breakpoint Unit, Data Watchpoint Unit and Trace Port Interface Unit.

The speed optimized implementations refer to the library choices and synthesis flow decisions and tradeoffs made in order to achieve the target frequency performance. The area optimized implementations refer to the library choices and synthesis flow decisions and tradeoffs made in order to achieve a target area density.

Frequency and Area measured for worst case conditions – 0.18µm process - 1.62V, 125C, slow silicon ; 0.13µm process - 1.08V, 125C, slow silicon

Power measured for typical case conditions– 0.18µm process–1.8V, 25C, typical silicon ; 0.13µm process - 1.2V, 25C, typical silicon.

 


ARM Cortex-M technologies

Each Cortex-M series processor delivers specific benefits, but all are underpinned by fundamental technologies than make Cortex-M processors ideal for a broad range of embedded applications.

 

 RISC processor core 

Thumb-2® technology

  • High performance 32-bit CPU
  • Deterministic operation
  • Low latency 3-stage pipeline
  • Optimal blend of 16/32-bit instructions
  • 3x smaller code size than 8-bit devices
  • No compromise on performance
 Low power modes

Nested Vectored Interrupt Controller (NVIC)

  • Integrated sleep state support
  • Multiple power domains
  • Architected software control
  • Low latency, low jitter interrupt response
  • No need for assembly programming 
  • Interrupt service routines in pure C
 Tools and RTOS supportCoreSight debug and trace

CMSIS

The ARM Cortex Microcontroller Software Interface Standard (CMSIS) is a vendor-independent hardware abstraction layer for the Cortex-M processor series.  The CMSIS enables consistent and simple software interfaces to the processor for interface peripherals, real-time operating systems, and middleware, simplifying software re-use. With CMSIS the learning curve for new microcontroller developers is reduced, shortening time to market for new products.

In-depth: Nested Vectored Interrupt Controller (NVIC)

The NVIC is an integral part of Cortex-M processors and provides the processors' outstanding interrupt handling abilities.

The Cortex-M processor uses a vector table that contains the address of the function to be executed for a particular interrupt handler. On accepting an interrupt, the processor fetches the address from the vector table.

To reduce gate count and enhance system flexibility, the Cortex-M processor uses a stack based exception model. When an exception takes place critical general purpose registers are pushed on to the stack. Once the stacking and instruction fetch are completed, the interrupt service routine or fault handler is executed, followed by the automatic restoration of the registers to enable the interrupted program to resume normal execution. This approach removes the need to write assembler wrappers that are required to perform stack manipulation for traditional C-based interrupt service routines, making application development significantly easier. The NVIC supports nesting (stacking) of interrupts, allowing an interrupt to be serviced earlier by exerting higher priority.

Complete response to interrupts in hardware

The interrupt response of Cortex-M series processor is the number of cycles from interrupt signal to execution of interrupt service routine. It includes: 

  • Detecting the interrupt
  • Optimal handling of back-to-back or late arriving interrupts (see below)
  • Fetching the vector address
  • Stacking corruptible registers
  • Branching to the interrupt handler

These are tasks that are performed in hardware and included in the interrupt response cycle time quoted for Cortex-M processors. In many other architectures these tasks must be performed in a software in the interrupt handler, introducing latency and complexity.

 

Tail chaining in the NVIC

Back to back interrupt  time diagram

In the case of back-to-back interrupts, traditional systems would repeat the complete state save and restore cycle twice, resulting in higher latency. The Cortex-M processors simplify moving between active and pending interrupts by implementing tail-chaining technology in the NVIC hardware. The processor state is automatically saved on interrupt entry, and restored on interrupt exit, in fewer cycles than a software implementation, significantly enhancing performance in low MHz systems.

 

Response of the NVIC to late arrival of higher priority interrupts

Late Interrupt arrival  time diagram

In case of the late arrival of a higher priority interrupt during the execution of the stack Push for a previous interrupt, the NVIC immediately fetches a new vector address to service the pending interrupt, as shown above. The Cortex-M NVIC provides deterministic response to these possibilities with support for late arrival and pre-emption.

 

Stack pop pre-emption by the NVIC

Preemption time diagram

Similarly, the NVIC abandons a stack Pop if an exception arrives and services the new interrupt immediately as shown above. By pre-empting and switching to the second interrupt without completing the state restore and save, the NVIC achieves lower latency in a deterministic manner.

 


Moving from 8/16-bit to ARM Cortex-M

ARM Cortex-M code size advantage explained

ARM Cortex-M processors offer superior code density to 8-bit and 16-bit architectures. This has significant advantages in terms of reduced memory requirements and maximizing the usage of precious on-chip Flash memory. In this section we examine the reasons for this advantage.

Code size comparison graph

Code size comparison using relative EEMBC CoreMark test size.

Instruction width

It is a common misconception that 8-bit microcontrollers use 8-bit instructions and ARM Cortex-M processor-based microcontrollers use 32-bit instructions. In reality, the PIC18 and PIC16 instruction sizes are 16-bit and 14-bit respectively. For the 8051 architecture, although some instructions are 1 byte long, many others are 2 or 3 bytes long.  The same generally applies to 16-bit architectures, where some instructions can take 6 bytes or more of memory.

The ARM Cortex-M3 and Cortex-M0 processors utilize the ARM Thumb®-2 technology which provides excellent code density. With Thumb-2 technology, the Cortex-M processors support a fundamental base of 16-bit Thumb instructions, extended to include more powerful 32-bit instructions. In many cases a C compiler will use the 16-bit version of the instruction unless the operation can be carried out more efficiently using a 32-bit version.

Instruction efficiency

This picture is not complete without also considering that ARM Cortex-M processor instructions are more powerful. There are many circumstances where a single Thumb instruction equates to several 8/16-bit microcontroller instructions; this means that Cortex-M devices have smaller code and achieve the same task at lower bus speed.

Comparing 16-bit multiply operations across processor architectures
 8-bit example16-bit exampleARM Cortex-M

MOV A, XL ; 2 bytes

MOV B, YL ; 3 bytes

MUL AB; 1 byte

MOV R0, A; 1 byte

MOV R1, B; 3 bytes

MOV A, XL ; 2 bytes

MOV B, YH ; 3 bytes

MUL AB; 1 byte

ADD A, R1; 1 byte

MOV R1, A; 1 byte

MOV A, B ; 2 bytes

ADDC A, #0 ; 2 bytes

MOV R2, A; 1 byte

MOV A, XH ; 2 bytes

MOV B, YL ; 3 bytes

 

MUL AB; 1 byte

ADD A, R1; 1 byte

MOV R1, A; 1 byte

MOV A, B ; 2 bytes

ADDC A, R2 ; 1 bytes

MOV R2, A; 1 byte

MOV A, XH ; 2 bytes

MOV B, YH ; 3 bytes

MUL AB; 1 byte

ADD A, R2; 1 byte

MOV R2, A; 1 byte

MOV A, B ; 2 bytes

ADDC A, #0 ; 2 bytes

MOV R3, A; 1 byte  

 

MOV R4,&0130h

MOV R5,&0138h

MOV SumLo,R6

MOV SumHi,R7

(Operands are moved to and from a memory mapped hardware multiply unit)

 

 MULS r0,r1,r0

 N.B. The Cortex-M multiply in fact performs a 32-bit multiply, here we assume r0 and r1 contain 16-bit data.

Compact data footprint

It is important to note that Cortex-M processors have support for 8-bit and 16-bit data tranfers, making efficient use of data memory. This means programmers can continue to use the same data-types as they have in 8/16-bit targeted software.

 

Energy efficiency advantage

The demand for ever lower-cost products with increasing connectivity (e.g. USB, Bluetooth, IEEE 802.15) and sophisticated analog sensors (e.g. accelerometers, touch screens) has resulted in the need to more tightly integrate analog devices with digital functionality to pre-process and communicate data. Most 8-bit devices do not offer the performance to sustain these tasks without significant increases in MHz and therefore power, and so embedded developers are required to look for alternative devices with more advanced processor technology. The 16-bit devices have previously been used to address energy efficiency concerns in microcontroller applications. However, the relative performance inefficiencies of 16-bit devices mean they will generally require a longer active duty cycle or higher clock frequency to accomplish the same task as a 32-bit device.

 

Ease of software development

Software development for ARM Cortex processor-based microcontrollers can be much easier than for 8-bit microcontroller products.  Not only is the Cortex processor fully C programmable, it also comes with various enhanced debug features to help locating problems in software.  There are also plenty of examples and tutorials on the internet, including many from ARM processor-based MCU vendor’s websites, alongside any additional resources included in MCU development kits.


Resources

In this section you will find useful documentation, white papers and tutorials on ARM Cortex-M processors and related technologies.

  

Documentation for Cortex-M device users

Software development tools for Cortex-M device users

Find Cortex-M based microcontrollers

Universities

» 
Cortex-M Series Forum Posts

Maximise