Accelerate On Device AI with Arm SME2
AI Summary
SME2 is the latest CPU extension on Arm Lumex CSS, the advanced subsystem for next-gen devices, designed to accelerate matrix-oriented compute workloads directly on device. It improves performance for AI and ML models, especially those that rely on operations like matrix multiplication, common in transformers, convolutional neural networks (CNNs), and large language models (LLMs).
Why SME2 Matters?
Delivering more seamless, responsive on-device user experiences.
Reducing reliance on cloud services, resulting in improved user privacy and minimum latency.
Works natively through Arm KleidiAI in PyTorch, LiteRT, ExecuTorch, ONNX, MNN—no code changes.
*Compared to the previous generation CPU cluster under the same conditions.
Built for Modern AI
Accelerates complex AI computations, such as matrix-matrix multiplication and outer-product operations critical for real-time inference.
Introduces a dedicated predicate-as-counter mechanism, optimizing vector-register use and improving data throughput.
Efficiently processes quantized, low-precision neural network data formats (including 4 bit and 2 bit), reducing memory bandwidth and improving performance and efficiency.
Delivers flexible performance from entry-tier to flagship mobile devices, ensuring consistent developer and user experiences across devices.
SME2 in Action
SME2 powers intelligent, low-latency, private on-device AI workloads, enabling use cases such as agentic calling, personalized workout coaching, immersive NPC interactions, and neural imaging.
On-Device AI for Enhanced User Experiences: AI Yoga Tutor
Powered by Arm Lumex CSS with SME2, smart TVs and smartphones deliver real-time movement analysis for activities like yoga or Tai Chi, giving accurate feedback, with 2.5x speed-up in the full pipeline time. Users get high-performance, low-latency guidance with full on-device privacy and security.
Agentic AI Call Handling
When Alex tries to call Mike and he’s unavailable, Mike’s AI agent steps in. Using on-device mobile AI, it sends a text confirming Mike can’t take the call, offers to schedule a callback, and automatically creates a calendar event.
Live Translation
Real-time speech translation with live captions now runs entirely on-device, delivering high-performance, low-latency communication with full privacy and security.
Music Generation
SME2-enabled smartphones generate music on-device from simple prompts, enabling fast iteration with low latency and secure, local processing.
Neural Camera Denoising
Powered by SME2, neural camera denoising runs AI-based image restoration directly on the CPU, achieving 4K at around 30 fps on SME2-enabled C1 CPUs, while only requiring around 1W to perform the enhancement. It delivers sharp, low-noise images even at 1 lux while keeping power use low. Implemented via Arm C Language Extensions, SME2 gives developers a flexible, CPU-only path to ISP-class imaging without relying on NPUs or fixed-function hardware.
Built for Developers
SME2 is already live on iPhone 17 and M4-based devices, allowing developers to begin optimizing AI applications today.
Thanks to native SME2 support across leading AI frameworks and runtime libraries—including PyTorch, ONNX Runtime, XNNPACK, and llama.cpp—developers can access SME2 benefits without changing a single line of code. SME2-enhanced performance is also portable across Arm-based platforms, from iOS and iPadOS to macOS and, soon, Android.
Explore the new Arm Developer Launchpad for SME2 to understand SME2 acceleration and use cases, step-by-step tutorials, and hands-on learning paths.
Boosting Mobile AI with Arm SME2 and Google Android
Explore how Arm SME2 powers faster, more efficient AI across Android smartphones, enabling low-latency, real-time applications in vision, voice, and generative AI.
Latest News and Resources
- NEWS and BLOGS
Stay Connected
Subscribe to stay up to date on the latest news, case studies, and insights.
Frequently Asked Questions: SME2
What is Arm SME2?
SME2 (Scalable Matrix Extension 2) is an advanced set of CPU instructions in the Armv9.3-A architecture designed to accelerate AI and ML workloads, particularly matrix-heavy tasks like LLMs and computer vision. It integrates seamlessly with popular AI frameworks via Arm KleidiAI, delivering higher performance and efficiency without code changes.
How does SME2 improve AI performance on devices?
By executing matrix operations directly on the CPU, SME2 enables up to 6X faster inference for large language models and 3X improvements in vision and audio processing—without requiring separate NPUs or cloud resources.
Which devices will support SME2?
Available now on iPhone 17 (A19), Apple M series devices, and flagship Android phones.
How does SME2 benefit developers?
SME2 integrates automatically with frameworks like Pytorch, ONNX Runtime, and XNNPACK, so developers can accelerate AI workloads without rewriting code. Developers can explore Arm AI on mobile resources for toolchains, SDKs, and training to get started quickly.
Can SME2 help with generative AI applications?
Absolutely. SME2 accelerates generative AI tasks, such as real-time translation, photo/video enhancement, audio generation, and motion analysis, directly on-device. This enables faster, more private, and more energy-efficient user experiences. Developers can learn how to implement these capabilities with Arm AI on mobile resources.