Enabling Generative AI at Scale
The explosion in generative AI is only just beginning. Boston Consulting Group predicts AI will drive an estimated three-times energy increase with generative AI alone expecting to account for 1% of this, challenging today’s electrical grids. Meanwhile, large language models (LLMs) will become more efficient over time and inference deployed at the edge at scale is expected to increase exponentially. This growth has already started, and to face the challenges ahead, the technology ecosystem is deploying generative AI on Arm.
The Future of Generative AI is Built on Arm
Efficient Code Generation Enabled by Small Language Models (SLMs)
Small language models (SLMs) offer tailored AI solutions with reduced costs, increased accessibility, and efficiency. They are easy to customize and control, making them ideal for a range of applications, such as content and code generation.
Generative AI on Smartphones
Innovative Voice Note Summarization
This demo shows how an LLM and speech-to-text model can work together in a pipeline to summarize and transcribe voice notes. Ideal for timesaving in situations where it is not possible to listen to audio, such as in a particularly loud environment.
Real-World Text Summarization Use Case
In this demo, group chat messages with multiple participants are quickly distilled into the key points in an easily digestible format. This can also be useful for summarizing emails or multimodal use cases that include pictures as part of the summarization.
Evolving Chatbots to Real-Time Assistants
By combining an LLM with automatic speech recognition and speech generation models, it is possible to have real-time conversations with context retention. Running this virtual assistant demo in flight mode shows the capabilities of the Arm CPU to process generative AI workloads entirely on-device.
Generative AI Starts with the CPU
Arm technology offers an efficient foundation for AI acceleration at scale, which enables generative AI to run on phones, PCs, and in datacenters. This is the result of two decades of architectural innovation in vector and matrix processing on our CPU architecture.
These investments in innovation have helped improve accelerated-AI compute and provide security that helps protect valuable models and enable low-friction deployment for developers.
Heterogeneous Solutions for GenAI Inference
For generative AI to scale at pace, we must ensure that AI is considered at the platform level, enabling all computation workloads.
Learn more about our leading AI compute platform, which includes our portfolio of CPUs and accelerators, such as GPUs and NPUs.
Software Collaboration Key for GenAI Innovation
Arm is engaged in several strategic partnerships to fuel AI-based experiences, while providing extensive software libraries and tools, and working on integration with all major operating systems and AI frameworks. Our goal is to help ensure developers can optimize without wasting valuable resources.
Seamless Acceleration for AI Workloads
Discover more about how Arm ensures seamless acceleration for every developer, every model, and every workload. Arm Kleidi makes CPU inference accessible and easy, even for the most demanding generative AI workloads.
Run Generative AI Efficiently on Arm
Want advice on running GenAI-enhanced workloads efficiently on Arm? These resources on Hugging Face help you build, deploy, and accelerate faster across a range of models, including large and small language models and models for natural language processing (NLP).