OVERVIEW

Accelerate On Device AI with Arm SME2

AI Summary

SME2 is the latest CPU extension on Arm Lumex CSS, the advanced subsystem for next-gen devices, designed to accelerate matrix-oriented compute workloads directly on device. It improves performance for AI and ML models, especially those that rely on operations like matrix multiplication, common in transformers, convolutional neural networks (CNNs), and large language models (LLMs).

BENEFITS

Why SME2 Matters?

  5x AI Performance Uplift*

Delivering more seamless, responsive on-device user experiences.

Fully On-Device AI Inference

Reducing reliance on cloud services, resulting in improved user privacy and minimum latency.

Seamless Framework Support

Works natively through Arm KleidiAI in PyTorch, LiteRT, ExecuTorch, ONNX, MNN—no code changes.

Cross-Platform Ready

*Compared to the previous generation CPU cluster under the same conditions.

FEATURES

Built for Modern AI

Advanced Matrix Processing

Accelerates complex AI computations, such as matrix-matrix multiplication and outer-product operations critical for real-time inference.

Enhanced Data Handling

Introduces a dedicated predicate-as-counter mechanism, optimizing vector-register use and improving data throughput.

Compressed Neural Network Support

Efficiently processes compressed, low-precision neural network data formats, reducing memory bandwidth and improving performance.

Scalable by Design

Delivers flexible performance from entry-tier to flagship mobile devices, ensuring consistent developer and user experiences across devices.

See SME2 Documentation
DEVELOPER

Built for Developers

SME2 is already live on iPhone 16 and M4-based devices, allowing developers to begin optimizing AI applications today.

 

Thanks to native SME2 support across leading AI frameworks and runtime libraries—including PyTorch, ONNX Runtime, XNNPACK, and llama.cpp—developers can access SME2 benefits without changing a single line of code. SME2-enhanced performance is also portable across Arm-based platforms, from iOS and iPadOS to macOS and, soon, Android.

 

Explore the new Arm Developer Launchpad for SME2 to understand SME2 acceleration and use cases, step-by-step tutorials, and hands-on learning paths.

Start Building With SME2Read Developer Blog
Google Android AI SME2 Promo banner
Success Story

Boosting Mobile AI with Arm SME2 and Google Android

Explore how Arm SME2 powers faster, more efficient AI across Android smartphones, enabling low-latency, real-time applications in vision, voice, and generative AI.

Read Full Story

Stay Connected

Subscribe to stay up to date on the latest news, case studies, and insights.

Newsletter Signup

Frequently Asked Questions: SME2

What is SME2?

SME2 (Scalable Matrix Extension 2) is an advanced set of CPU instructions in the Armv9.3-A architecture designed to accelerate AI and ML workloads, particularly matrix-heavy tasks like LLMs and computer vision. It integrates seamlessly with popular AI frameworks via Arm KleidiAI, delivering higher performance and efficiency without code changes.

How does SME2 improve AI performance on devices?

By executing matrix operations directly on the CPU, SME2 enables up to 6X faster inference for large language models and 3X improvements in vision and audio processing—without requiring separate NPUs or cloud resources.

Which devices will support SME2?

SME2 is already deployed in iPhone 16 and Apple M4 chips, with Android device support coming soon. It scales from entry-tier to flagship devices, ensuring consistent performance.

How does SME2 benefit developers?

SME2 integrates automatically with frameworks like Pytorch, ONNX Runtime, and XNNPACK, so developers can accelerate AI workloads without rewriting code. Developers can explore Arm AI on mobile resources for toolchains, SDKs, and training to get started quickly.

Can SME2 help with generative AI applications?

Absolutely. SME2 accelerates generative AI tasks, such as real-time translation, photo/video enhancement, audio generation, and motion analysis, directly on-device. This enables faster, more private, and more energy-efficient user experiences. Developers can learn how to implement these capabilities with Arm AI on mobile resources.