Accelerating Machine Learning

Portrait of Matthew Mattina

Matthew Mattina, Distinguished Engineer and Senior Director ML Research

Today’s advanced semiconductor technology can perform over one trillion multiplication operations every second in an area of around one square millimeter. Modern neural networks require trillions and trillions of multiplications every second to perform their magic.

 

Could it be possible to perform this level of compute in an area the size of a pencil tip while consuming less power than an LED lightbulb? This special combination — intelligent applications powered by neural networks for massive amounts of compute in a tiny area with minimal power consumption—has the potential to enable applications that can improve the lives of billions of people.

 

So how do we make this a reality? Our world-class researchers are developing Arm advanced hardware, software, and tools that provide the energy efficiency and performance required to support increasingly complex algorithms on embedded and mobile platforms, with projects spanning a wide spectrum of ML research. 

Explore ML Research

On the algorithmic side, we’ve taken a different approach to the well-documented computational problems of executing the huge and complex neural networks demanded by new applications. We have shown that it is possible to combine low-precision and complexity-reducing techniques without the usual high loss in prediction accuracy by using residual number systems.

 

Deep neural network inference demands a large compute and memory budget, and these are not available on small, resource-constrained microcontrollers. Working with one of our long-time academic collaborators, we demonstrated the benefits of using Neural Architecture Search (NAS) algorithms to search for models with low memory usage and low operation count. The resulting MicroNets models demonstrated state-of-the-art results for industry-standard benchmark tasks: visual wake words, audio keyword spotting, and anomaly detection.

However, it’s not just about efficient model design. To execute convolutional neural networks (CNNs) on mobile and deeply-embedded devices, efficient hardware acceleration may be required. Data sparsity exploitation is a promising approach and our Sparse Systolic Tensor Array design helps address a key architectural challenge: how to provide support for a range of sparsity levels while maintaining high utilization of the hardware

 

In a more applied setting and in collaboration with Bose, we demonstrated how new model compression techniques could successfully be applied in recurrent neural network (RNN) speech enhancement, raising hopes for the use of RNNs for noise suppression in resource-constrained settings, such as in hearing aid hardware.

Through this and other work, we’ve seen promising evidence supporting the deployment of neural networks in increasingly challenging areas, as well as insights into how ML can improve the efficiency of IP design and verification. I believe that ML truly has the potential to change our lives for the better and we are just scratching the surface. My main hope for the near future is that healthcare applications, enabled by large, complex neural networks running on very small, very low-power processors, usher in a new era of personalized healthcare that improves the lives of billions of people. And I hope that at Arm, we can play a part in making that a reality.