The lifecycle of an endpoint AI device may span years—even decades. Those that go the distance will be capable of processing the machine learning (ML) algorithms of the future.
But while we may not know what those algorithms might look like yet, we can be sure that they will be more complex and more demanding than the workloads we currently task endpoint AI devices with today.
Most endpoint AI devices today are capable of around 4 or 5 Tera Operation Per Second (TOP) per watt. That’s enough for basic ML routines yet incomparable to a datacenter offering of AI compute.
Reducing the power profile of endpoint AI
SiMa.ai began as an ambition to shrink this performance divide: to redefine the performance associated with endpoint AI today. Yet achieving anything close to cloud-like performance in an endpoint AI device would require a marked reduction in power consumption—or rather, a significant increase in TOPs per watt.
With this goal in mind, we developed the MLSoC™ (Machine Learning System on Chip) platform, targeting a peak of 10 TOPs per watt. For an embedded power profile of 5 watts, we can achieve up to 50 TOPs for our ML accelerator. That’s enough to enable AI workloads that would traditionally require cloud performance in a passively cooled endpoint AI device.
We designed our heterogeneous MLSoC to be capable of processing the workloads our customers had created some time back but also one future-proofed for upcoming workloads none of us have identified yet. Unlike a data center, which can be upgraded as new iterations of components come to market, the hardware embedded within an endpoint AI device is set the day it’s baked into silicon.
Our solution to this challenge combines traditional compute IP from Arm with our own machine learning accelerator and dedicated vision accelerator. As the market leader in low power compute, Arm IP was the obvious choice as a secure platform upon which to build our MLSoC. We chose the Arm Cortex-A65 CPU after working closely with our customers to define the compute requirements for their applications: it was a decision very much based on customer needs, from performance down to software toolchain.
While it’s capable of a wide range of ML workloads such as natural language processing (NLP), SiMa.ai’s MLSoC is initially optimized for computer vision applications. Computer vision is already central to many endpoint AI use cases, from traffic cameras to manipulating selfies—and we believe its use will only increase in future applications such as high-end surveillance, crowd control and thermal scanning.
Computer vision unlocks future complex use cases for endpoint AI
Combining the vision accelerator with the ML accelerator also ensures MLSoC can handle complex workloads such as sensor fusion from multiple sensors—this enables it to play a role in autonomous systems from consumer autonomous vehicles to autonomous robots in industrial IoT settings. We also foresee a role for MLSoC in aerospace and defense.
Of course, these complex autonomous workloads require more than 50 TOPs. That’s why we’ve designed MLSoC to be modular: by combining multiple machine learning accelerator mosaics via a proprietary interconnect, we can scale from 50 TOPs at 5 watts up to 400 TOPs at 40 watts.
Consider that today’s level 5 autonomous vehicle prototypes draw around 4 kilowatts, that’s potentially a 100x reduction in power consumption and a greatly reduced physical hardware footprint, alongside reduced need for active cooling.
There’s another good reason for reducing power consumption in devices that will soon be filling our world in the millions. A lot of the OEMs and customers we talk to are very conscious about how to bring down the power profile so they can become carbon neutral by 2030 or earlier. That’s reason enough for us to want to design something low power.
Giving developers the tools they need
I believe that MLSoCs will play a key role in enabling low-power AI in edge and endpoint devices. But I also know that it’s not enough to simply provide a license to a solution benchmarked to achieve a certain number of TOPs.
Many of the solutions that exist on the market today advertise their performance based on benchmarks such as ResNet-50. But quoting frames per second or TOPs per watt only matters if it is achievable under real-world conditions—i.e., our customers’ workloads.
Our customers want one thing: development velocity. How quickly they can go to market. They don’t want to spend months in development cycles trying to achieve the performance they’ve been promised, they want to be able to license your solution and then add their own secret sauce using simple and comprehensive tools.
We’re planning to tape out our MLSoC early next year, with a view to delivering engineering samples and potentially customer samples towards the end of next year. However, we’re already working very closely with customers to define and build their applications and map them to our hardware, and the software development kit (SDK) will be available to customers in advance.
This means they’ll be able to work through the flows, develop their applications and run simulations so that when the silicon becomes available it’s simply a case of compile-and-go.
And because MLSoC is grounded in Arm technology, our customers can be sure that they will have the software, tools and ongoing support they need to build not only the next generation but many subsequent generations of highly capable, low power AI devices.