Login

ARM The Architecture For The Digital World  

Cortex-R4 Processor

Cortex-R4 Processor Image

The Cortex™-R4 processor is the first deeply embedded real-time processor to be based on the ARMv7-R architecture. It is intended for use in high-volume deeply-embedded System-on-Chip applications such as hard disk drive controllers, wireless baseband processors, consumer products and electronic control units for automotive systems.

Cortex-R4 delivers substantially higher performance, real-time responsiveness and more features than other processors in its class. This processor offers excellent energy efficiency and cost effectiveness for ASIC, ASSP and MCU embedded applications. Furthermore, the Cortex-R4 processor can be configured at synthesis time to optimize its feature set for a precise match with application requirements.

 


Cortex-R4 is a mature processor, launched in May 2006 and shipping today in millions of ASIC, ASSP and MCU devices. It is the standard for high performance real-time SoCs, superseding many ARM9 and ARM11 processor-based designs.

Cortex-R4 was designed for implementation on advanced silicon processes from 90 nm down to 28 nm and beyond with an emphasis on improved energy efficiency, real-time responsiveness, advanced features and ease of system design. On a 40 nm G process the Cortex-R4 can be implemented to run at almost 1 GHz when it delivers over 1,500 Dhrystone MIPS performance. The processor provides a highly flexible and efficient two-cycle local memory interface, enabling SoC designers to minimize system cost and energy consumption.

The figure below compares Dhrystone benchmark performance of Cortex-R4 with classic ARM processors implemented on a 90 nm G process. Cortex-R4's configuration options can be chosen to minimize the processor's die area, which importantly also minimizes leakage power.

Cortex-R4 More Performance, More Power Efficient

Cortex-R PPA chart

Cortex-R4 has many other significant advantages over previous ARM9 and ARM11 processors:

Cores

ARM946E-S

ARM1156T2-S

Cortex-R4

Architecture

ARMv5TE

ARMv6T2

ARMv7-R

Pre-Fetch Unit

No

Instruction pre-fetch and branch prediction

Super-scalar execution

No

Dual-issue instructions

Thumb-2 instructions

No

Yes

Floating point support

VFP9

VFP11

Integrated (Cortex-R4F)

Bus interface

AMBA AHB

AMBA3 AXI

Tightly-Coupled Memory (TCM)

Basic

Code and data separate

Completely flexible

Interrupts

ARMv5

ARMv6 enhancements, NMI

Soft error management

No

Optional Parity and ECC on all RAMs

Memory Protection Unit (MPU)

8 regions

16 regions

12 regions

Minimum region size

4k Bytes

32 Bytes, overlapping regions

Synthesis configurability

No

I and D caches. 0 or 2 TCMs. Soft error handling. MPU

I and D caches. 0, 1, 2 or 3 TCMs. FPU. Soft error handling. MPU. AXI slave


ARM Cortex-R4 processor

FeatureDescription
Micro-architectureEight-stage pipeline with instruction pre-fetch, branch prediction and selected dual-issue execution. Parallel execution paths for load-store, MAC, shift-ALU, divide and floating point. 1.66 Dhrystone MIPS/MHz. Hardware divider. Binary compatibility with classic ARM9 and ARM11 embedded processors.
Instruction SetARMv7-R architecture with Thumb-2 and thumb. DSP extensions. Optional floating point unit.
Cache controllersHarvard memory architecture with optional integrated Instruction and Data cache controllers. Cache sizes configurable from 4 to 64 KB. Cache lines are either write-back or write-through.
Tightly-Coupled MemoriesOptional Tightly-Coupled Memory interfaces. TCMs are used for highly deterministic or low-latency applications that may not respond well to caching, e.g. instruction code for interrupt service routines and data that requires intense processing. One or two logical TCMs, A and B, can be used for any mix of code and data.  TCM size can be up to 8 MB. TCM B has two physical ports, B0 and B1, for interleaving incoming DMA data streams.
Interrupt interfaceStandard interrupt, IRQ, and non-maskable fast interrupt, FIQ, inputs are provided together with a VIC interrupt controller vector port. The GIC interrupt controller can also be used if more complex priority-based interrupt handling is required. The processor includes low-latency interrupt technology which allows long multi-cycle instructions to interrupted and restarted. Lengthy memory accesses are also deferred in certain circumstances. Worst case interrupt response can be as low as 20-cycles using the FIQ alone.
Memory Protection UnitOptional MPU configures attributes for either eight or twelve regions, each with resolution down to 32 Bytes. Regions can overlap, and the highest numbered region has highest priority.
Floating Point UnitOptional Floating Point Unit (FPU) implements the ARM Vector Floating Point architecture VFPv3 with 16 double-precision registers, compliant with IEEE754. The FPU performance is optimized for single-precision calculations and it also has full support for double-precision. Operations include add, subtract, multiply, divide, multiply and accumulate, square root, conversions between fixed and floating-point, and floating-point constant instructions.
ECCOptional single-bit error correction and two-bit error detection for cache and/or TCM memories with ECC bits. Single-bit soft errors are automatically corrected by the processor.
ParityOptional support for parity bit error detection in caches and/or TCMs.
Master AXI bus64-bit AMBA AXI bus master for Level-2 memory and peripheral access.
Slave AXI busOptional 64-bit AMBA AXI bus slave port allows DMA masters to access the dual-port TCM B interface for high speed streaming of data in and out of the processor.
DebugDebug Access Port is provided. Its functionality can be extended with DK-R4.
TraceAn interface suitable for connection to CoreSight Embedded Trace Module is present.
Dual coreA dual processor configuration implements a redundant Cortex-R4 CPU in lock-step with offset clocks and comparison logic for fault tolerant/fault detecting dependable systems.
Configuration

Synthesizable Verilog RTL with facility to configure options for synthesis.


Cortex-R4 Performance Power and Area

Processor area, frequency and power consumption are highly dependent on process, libraries and optimizations. The table illustrates implementations on mainstream process technologies with high-density, standard-performance cell libraries and RAMs.

Implementation Target

Performance optimized1Power optimized2 Area optimized2

Process technology

65 nm GP

65 nm LP

65 nm GP

Standard cell library

Artisan™ SC10

Artisan SC10

Artisan SC10

Clock frequency

620 MHz3

270 MHz4

380 MHz4

Performance

1,030 DMIPS

450 DMIPS

630 DMIPS

Core dynamic power5

0.12 mW/MHz

0.17 mW/MHz6

0.09 mW/MHz

Core leakage power5

4.4 mW

0.02 mW

1.4 mW

Core layout area5

0.8 sq mm

0.5 sq mm

0.4 sq mm

Core efficiency

13.8 DMIPS/mW

9.8 DMIPS/mW

18.4 DMIPS/mW

  1. Configured with  controllers for 8 kByte I and D caches, three TCM ports, MPU with eight regions, no FPU, parity checking on Level-1 memories and AXI busses, debug with one watchpoint and two breakpoints.
  2. Minimal configuration with controllers for 8 kByte I and D caches. No TCM ports, AXI slave bus, MPU, FPU, , ECC or parity and minimum debug capability.
  3. Targeting maximum clock frequency under worst case conditions, i.e. low voltage (nominal less 10%), high temperature (125 ºC), slow silicon.
  4. Clock frequency target is relaxed and under typical conditions, i.e. nominal voltage, 25 ºC, typical silicon.
  5. Memory area and power is not included.
  6. Low leakage process consumes significant dynamic power but system-wide energy consumption is much reduced over time.

The floorplan of a fully configured Cortex-R4 processor is illustrated here:

Cortex-R4 Configuration Options Summary

Cortex-R4 Configuration Options


ARM System IP, Development Tools and Physical IP are used to implement complete Cortex-R4 systems.

CoreLink and CoreSight System IP

NIC-301 Configurable hierarchic low latency interconnect for AMBA 3 AXI, AHB-Lite and APB components. Configurations can range from a single bridge component, such as an AHB to AXI protocol bridge, to a large infrastructure of 128 masters and 64 slaves in combinations of different AMBA protocols.
QOS-301Added to NIC-301 to minimize average latency and guarantee worst-case latency and bandwidth of critical interfaces such as DDR memory.
DMC-34xDynamic memory controllers providing highly efficient interfaces to DRAM by leveraging AXI interconnect features to optimize memory request scheduling and using built-in Quality of Service controls to manage the initiator’s latency and bandwidth requirements. Memory types supported include SDR, DDR, LPDDR (Mobile DDR), eDRAM, DDR2 and LPDDR2 (Mobile DDR2).
SMC-35xStatic memory controllers interface AXI interconnects to a range of non-volatile memories with highly configurable parameters. Memory types supported include SRAM, NAND Flash and NOR Flash.
L2C-310Level-2 cache controller designed to boost performance while reducing overall traffic to system memory and therefore SoC energy consumption. Reducing demands on off-chip memory bandwidth frees up resources for other masters.
DMA-330A highly flexible micro-programmable Direct Memory Access controller for high-end high-performance energy-efficient AXI-based processing systems.
PL192An AMBA AHB advanced Vectored Interrupt Controller (VIC) supporting up to 32 vectored interrupts with programmable priority level and masking.
GIC390An AMBA AHB and AXI scalable, configurable, low gate count Interrupt Controller which stores vector addresses in memory. Options include multi-processor and TrustZone support.
ETM-R4The Embedded Trace Macrocell provides real-time instruction and data trace and is configured to capture information before and after a specified sequence of events with the processor at full speed.
DK-R4A complete Debug Kit including ETM-R4 and a fully-featured Debug Access Port (DAP) to complement the DAP-Lite shipped with every Cortex-R4. DK components include DAP, cross trigger, ETM, AMBA bus trace, serial wire debug, trace funnel, trace buffer, trace port interface and serial wire viewer.

 

Development Tools

All Cortex-R processors are fully supported in the ARM Development Suite 5 (DS-5™) tool suite, as well as a wide range of third party tools, operating systems and EDA flows. ARM DS-5 software development tools are unique in their ability to provide solutions that take full advantage of the complete ARM technology portfolio. Tools specific to Cortex-R4 are:

ARM DS-5ARM Compiler 5.0 with Thumb-2 optimized for Cortex-R4.
CoreTileCT-R4M-BD-0243A for performance evaluation and pre-silicon application development.
VersaTile EBVersatile Emulation Baseboard VEREB-BD-0228A is required to host the CoreTile.
MCBTMS570 evaluation board (Keil)External SiteThis Keil MCU and IO combination hosts a Texas Instruments TMS570 Cortex-R4-based microcontroller. It carries SRAM and Flash memory and interfaces including USB, automotive CAN and FlexRay, Ethernet, a touch screen display, JTAG and ETM.
RTX Real-Time Kernel (Keil)External SiteRoyalty-free, deterministic RTOS with source code for high-speed real-time operation with low interrupt latency and flexible Scheduling. Small footprint for resource constrained systems, multithreading and thread-safe operation, kernel aware debug support in MDK-ARM.

 

Physical IP

ARM optimized Physical IP platforms for best-in-class implementations of Cortex-R4 on leading semiconductor process technologies.

Standard cell logic libraries Available in a variety of different architectures, ARM Standard Cell Libraries support a wide performance range for all types of SoC designs. Designers can choose between different libraries and optimize their designs for speed, power and/or area.
Memory compilers and registersA broad array of silicon proven SRAM, Register File and ROM memory compilers for all types of SoC designs ranging from performance critical to cost sensitive and low power applications.
Interface IPA broad portfolio of silicon-proven Interface IP designed to meet varying system architectures and standards. General Purpose I/O, Specialty I/O, High Speed DDR and Serial Interfaces are optimized to deliver high data throughput performance with low pin counts.


White Papers 



 
» 
Latest Forum Posts
» 
Blogs
 
» 
Cortex-R4 Powered Products
Go Left
Go Right

Maximise