ARM NEONTM technology presents an extensive set of new instructions for future ARM processors that provides powerful, flexible acceleration of media and DSP applications on ARM core-based platforms. Developed with vectorizing compiler technology in mind, NEON technology provides an acceleration solution that can be exploited through high level code without the need for intrinsic functions or hand written assembly.
NEON technology is a 64/128-bit hybrid SIMD architecture developed by ARM to accelerate the performance of multimedia and signal processing applications including video encode/decode, 3D graphics, speech processing, compressed audio decoding, image processing, telephony, and sound synthesis. The NEON architecture provides at least 3x the performance of ARMv5 and 2x the performance of ARMv6 SIMD on a range of media and DSP applications.
NEON technology is an optimally defined architecture that works seamlessly with its own independent pipeline and register file. NEON technology will be implemented in selected next generation ARM processors. Key features include aligned and unaligned data access, support for integer and floating point data types, tight coupling to the ARM core, and a large register file with multiple views. A balance of aligned and unaligned data access allows for efficient vectorization of SIMD operations. The ability to operate on both integer and floating point data types ensures adaptability to a broad range of applications, from compression decoding to 3D graphics. Tight coupling to the ARM core provides for a single instruction stream with a unified view of memory, allowing a single development platform target with a simpler tool flow. NEON’s large register file with its multiple views enables efficient handling of data and minimizes access to memory, enhancing data throughput performance.
Easy To Use A NEON technology enabled core’s single platform target allows developers to use familiar ARM RealView® compiler, debug and trace tools. The NEON architecture takes into account code generation strategies and the properties of compiled code. The result is a SIMD architecture which enables a simple, consistent mapping of algorithms that can be targeted effectively from a compiler.
NEON technology is also a planned target of the OpenMAX multimedia APIs. Code written using the OpenMAX API can easily be ported to NEON technology enabled processors by simply changing the libraries. Finally, NEON technology speeds development time by eliminating the need to verify the integration of external media accelerators and allowing a single design to serve multiple markets/customers.
| Feature | Benefit | | 64/128-bit Hybrid SIMD Architecture | NEON SIMD instructions allow up to 16 elements to be processed in parallel thus accelerating media and signal processing applications | | Powerful, flexible performance | NEON provides at least 3x the performance of ARMv5 and 2x ARMv6 SIMD at the same frequency on a range of media and signal processing applications. Different implementations can target different performance points | | Tightly coupled to core | Integration gives a unified view of memory which is shared with the ARM core. The resulting ability to use a single instruction stream gives a single platform target which speeds development | | Support for aligned and unaligned data access | Enables efficient data vector loading in compiled code | | SIMD Structure load/store architecture | Eliminates data arrangement overhead and optimizes data memory access to interleaved data | | Dual View register file | Instructions defined across both views allow for efficient promotion and demotion of data for maximum efficiency of compiled code. Dual views also provide the ability to make tradeoffs between vector length and the number of registers available. | | Support for integer, fixed point, and single precision floating point data types | Ensures suitability for use in a wide range of applications from voice / audio compression to 3D graphics | | Integer data sizes of 8, 16, 32 and 64-bits | Enables efficient packing of data vectors to maximize data processing cycles | Independent register file | Large register file enables many intermediate results to be stored internally, decreasing the number of data accesses to memory and increasing processing performance | | Encoded in ARM and Thumb®-2 | Ensures high performance with optimum code density | | OpenMAX target | Using NEON technology as simple as programming to the OpenMAX API |
|