Toggle Search
   Arm Enables

Partner Q&A: Snips Voice AI Platform

As consumer demand for the voice-enabled smart home grows, AI innovator Snips is looking beyond the cloud to incorporate ethics, privacy and intelligence right into the devices themselves.

Joseph Dureau, CTO, Snips

By 2022, demand for voice enabled devices will have grown to 1.6 billion units in the U.S. alone. But today’s smart voice devices aren’t actually as smart as you’d think, with all the AI-powered natural language processing (NLP) occurring in cloud datacenters potentially thousands of miles away from the Amazon Echo in your kitchen.

Enter Snips— an Arm Partner and Innovator Program member working to revolutionize the way we live and interact with technology, and the stars of our latest Made Possible series. Ultra-aware of humans’ oftentimes exhausting relationship with technology, Snips is working to change the way we interact with it by removing the distraction of physical interaction with the bold mission of “Making Technology Disappear.”

A major part of this mission is in enabling compute (and users’ information) to remain on the endpoint-device, reducing that device’s reliance on the cloud in understanding what we have to say.

Watch Snips in action in our latest Arm Made Possible video

As Ecosystem Manager, it’s my job to support our Arm Innovator Program members in building groundbreaking solutions in a wide range of fields, from artificial intelligence (AI), security and the Internet of Things (IoT) to embedded devices such as robots and drones.

As part of Arm’s latest Made Possible series—designed to showcase the most inspiring innovation powered by Arm—I spoke to Snips CTO Joseph Dureau about his role in building the Snips voice AI platform, the brand mission and how Arm technology helps make it all possible.

What led you to where you are today?

Three things I’ve always tried to do: make an impact, work with great people and solve interesting technical questions. After that, it’s all a matter of opportunities, encounters, and curiosity for new topics. My career has evolved from really technical jobs to more entrepreneurial activities. I studied climate change at NASA which was purely academic. My PhD in statistics was like my first startup – and as I drifted towards entrepreneurship, AI came up as an obvious direction and the Snips voice AI platform was born.

Tell me more about the Snips voice AI platform and your vision.

Artificial intelligence is changing the way we interact with our surroundings through voice, but existing voice assistants have a fundamental shortcoming: they are heavily centralized. Doing most of the computing in the cloud raises critical concerns regarding privacy, security, bandwidth, and of course, dependence on cloud connectivity.

The next generation of voice interfaces will process data locally because it’s the best way to build a trusted, transparent and intimate relationship between humans and their devices.

At Snips, we believe in an end-to-end, private-by-design solution. We run the entire Snips voice AI platform on the device instead of collecting user data and processing it in the cloud.

How did the question of ethics in AI shape Snips’ privacy-centered direction?

We are already seeing the level of mass manipulation and privacy breaches thanks to the way we currently use technology. However, the data we were talking about until now was only whatever we willingly used our smart screens for. Think about the stakes when we start talking about sensors everywhere tracking every move we make, and everything we say. With that reality, privacy and security concerns increase greatly.

This is a trend that was foreseeable five years ago when we started working on private AI. Since then, we’ve been developing on-device alternatives to the current way AI is implemented. We’re making steady progress: At IFA 2019, NXP launched a Snips-powered reference design for embedded voice interfaces. And that’s only the first step.

What devices does your Spoken Language Understanding engine run on?

Our Spoken Language Understanding engine can run on a wide variety of hardware. Our tiniest solution runs on 100MHz Arm Cortex-M4 processors. It’s the lightest MCU platform we’ve been able to integrate on. It’s quite prevalent in the small IoT space, along with the Cortex-M7 processor. Our solution, called Snips Commands, is able to identify a wake word and understand voice commands like “play,” “pause,”or “heavy wash.

On an application processor, our Snips Flow solution can understand queries expressed in natural languages, like “it’s dark in here,” “give me a recipe for pasta and zucchini,” or “throw me some Aretha Franklin on the radio.” The minimal requirement for Snips Flow is a dual-core chip at 1.2GHz. For large vocabulary use cases, we typically require a quad-core Cortex-A53 processor, which corresponds to a Raspberry Pi 3, or an NXP i.MX8.

On this kind of hardware we can achieve cloud-level performance, even on large vocabulary use cases, while keeping all the processing on the device. Last fall, we published a benchmark comparing Snips Flow running on a Raspberry Pi 3 to major cloud Speech APIs, for a music use case. The data revealed that Snips can achieve cloud-level accuracy on the device.

A lot of our work is pushing the limits of what can be done on a microcontroller. Our next generation of products for microcontrollers will be a miniaturized version of what we run on application processors, while still understanding natural language. I’m excited to see the progress we’ll make over the coming years.

Talk me through your patent-pending data-augmenting techniques.

We believe strongly that voice interfaces need to be personal. It’s only by being aware of context, identity, and a user’s past activity that interactions will become fluid and natural. The objective of our data-augmenting research is to generate sufficient volumes of formulation examples for how end users would talk to their voice assistant. It relies on a mix of crowdsourcing and machine learning-based disambiguation. We ask a large number of demographically diverse contributors to generate examples of user queries matching a given intent with given parameters.

Each generated query goes through a validation process taking the form of a second crowdsourcing task, where at least two out of three new contributors must confirm its formulation, spelling, and intent. We train different machine learning models and apply majority voting on these predictions to detect missing parameters and wrong intents.

You put a huge emphasis on developer engagement. Why is this important for the Snips voice AI platform?

Our developer community plays a fundamental role in the development of our solution. Snips’ engineers work to constantly improve our product, incorporating the feedback we get from the community of 30,000 developers who design with Snips.

Whenever a developer builds a new voice app or integrates other technology with Snips, they provide us with other proof-points that voice assistants can reliably run on the device, at a fraction of the computing power of the cloud. These new efforts also help to build a bigger set of examples and use cases for our applications, most of which are shared openly on the web.

What’s the best project you’ve seen built by the Snips community?

One of the best community projects I’ve seen using the Snips voice AI platform is Project Alice by Laurent Chervet, our resident ‘supermaker’. Laurent earned this title, which we bestowed upon him, for his unrelenting commitment to building with Snips and being a source of inspiration, troubleshooting, and support for other developers that are getting started with the Snips voice AI platform.

Laurent’s house is now completely controlled by voice, and he is still shipping one new feature per day, always pushing the boundaries of what the Snips voice AI platform can do. It’s people like Laurent who will help us demonstrate to the world just how far Snips technology can go.

Watch Snips in action and learn more about the Arm-powered Snips voice AI voice platform that’s simplifying human interactions with connected devices.

Back to top