Accelerate On Device AI with Arm SME2
AI Summary
SME2 is the latest CPU extension on Arm Lumex CSS, the advanced subsystem for next-gen devices, designed to accelerate matrix-oriented compute workloads directly on device. It improves performance for AI and ML models, especially those that rely on operations like matrix multiplication, common in transformers, convolutional neural networks (CNNs), and large language models (LLMs).
Why SME2 Matters?
Delivering more seamless, responsive on-device user experiences.
Reducing reliance on cloud services, resulting in improved user privacy and minimum latency.
Works natively through Arm KleidiAI in PyTorch, LiteRT, ExecuTorch, ONNX, MNN—no code changes.
Available now on iPhone 16 and M4 chips, with Android support coming soon.
*Compared to the previous generation CPU cluster under the same conditions.
Built for Modern AI
Accelerates complex AI computations, such as matrix-matrix multiplication and outer-product operations critical for real-time inference.
Introduces a dedicated predicate-as-counter mechanism, optimizing vector-register use and improving data throughput.
Efficiently processes compressed, low-precision neural network data formats, reducing memory bandwidth and improving performance.
Delivers flexible performance from entry-tier to flagship mobile devices, ensuring consistent developer and user experiences across devices.
Built for Developers
SME2 is already live on iPhone 16 and M4-based devices, allowing developers to begin optimizing AI applications today.
Thanks to native SME2 support across leading AI frameworks and runtime libraries—including PyTorch, ONNX Runtime, XNNPACK, and llama.cpp—developers can access SME2 benefits without changing a single line of code. SME2-enhanced performance is also portable across Arm-based platforms, from iOS and iPadOS to macOS and, soon, Android.
Explore the new Arm Developer Launchpad for SME2 to understand SME2 acceleration and use cases, step-by-step tutorials, and hands-on learning paths.
Boosting Mobile AI with Arm SME2 and Google Android
Explore how Arm SME2 powers faster, more efficient AI across Android smartphones, enabling low-latency, real-time applications in vision, voice, and generative AI.
Latest News and Resources
- NEWS and BLOGS
Stay Connected
Subscribe to stay up to date on the latest news, case studies, and insights.
Frequently Asked Questions: SME2
What is SME2?
SME2 (Scalable Matrix Extension 2) is an advanced set of CPU instructions in the Armv9.3-A architecture designed to accelerate AI and ML workloads, particularly matrix-heavy tasks like LLMs and computer vision. It integrates seamlessly with popular AI frameworks via Arm KleidiAI, delivering higher performance and efficiency without code changes.
How does SME2 improve AI performance on devices?
By executing matrix operations directly on the CPU, SME2 enables up to 6X faster inference for large language models and 3X improvements in vision and audio processing—without requiring separate NPUs or cloud resources.
Which devices will support SME2?
SME2 is already deployed in iPhone 16 and Apple M4 chips, with Android device support coming soon. It scales from entry-tier to flagship devices, ensuring consistent performance.
How does SME2 benefit developers?
SME2 integrates automatically with frameworks like Pytorch, ONNX Runtime, and XNNPACK, so developers can accelerate AI workloads without rewriting code. Developers can explore Arm AI on mobile resources for toolchains, SDKs, and training to get started quickly.
Can SME2 help with generative AI applications?
Absolutely. SME2 accelerates generative AI tasks, such as real-time translation, photo/video enhancement, audio generation, and motion analysis, directly on-device. This enables faster, more private, and more energy-efficient user experiences. Developers can learn how to implement these capabilities with Arm AI on mobile resources.