Never miss a story from Breakfast Bytes. Subscribe for in-depth analysis and articles.
Last week was the 11th Embedded Vision Summit. So that means the first one, back in 2011, was just a couple of years after what I regard as the watershed event in vision, the poster session (it didn't even make the main conference) announcing ImageNet. With millions of labeled images, neural network approaches, that had been regarded as a "tapped-out mine that would never produce any gold", raced past the more programmed approaches to where, today, these networks are better than humans at recognizing dog breeds or traffic signs. For more about ImageNet, see my post ImageNet: The Benchmark that Changed Everything. You can read about three of the people who never gave up on the deep learning approach in my post Geoff Hinton, Yann LeCun, and Yoshua Bengio Win 2019 Turing Award (actually, it was the 2018 Turing Award, bestowed in 2019).
This year's Embedded Vision Summit opened with a take by Jeff Bier as usual. He runs what used to be called the Embedded Vision Alliance, but is now the Edge AI and Vision Alliance. That reflects the fact that vision is increasingly about interpreting what is in the image and not just doing photographic processing. Cadence's Tensilica Vision processors have made the same transition, with more of the silicon given up to multiply-accumulators for neural network processing.
Jeff pointed out that we are in a once-in-a-generation convergence to solve real-world problems and grow read business in this space. As an example, Jeff had a video about Tempo Move, who use video to monitor your exercise program (not to mention providing the weights).
In the 2021 survey you can see that most people are either using neural networks to process vision data, or are planning to do so.
The flow of image processing is more complex than most people think. Here is the what people imagine is the flow.
But here is something more realistic.
Jeff then introduced the keynote for the morning, Ryad Benosman of the CMU Robotics Institute and the University of Pittsburgh. His presentation was titled Event-Based Neuromorphic Perception and Computation: The Future of Sensing and AI. However, he pointed out that "neuromprphicI is too general, like saying that you "play ballgames".
Ryand started with the history of neural networks, which you probably know goes back to early work done in the 1950s.
It turns out that computer vision and its associated algorithms have been invented several times over the years. The big challenge is what Ryand calls the "impossible challenge".
If we sampled the baseball player at a very high frame rate, we waste a lot of effort on the background that is not changing. The big challenge is how to only acquire "events" that are relevant, just the meaningful parts of the image. So when nothing happens nothing is sent or processed. So the matrices involved are very sparse. But it comes at a cost.
Here's a camera incorporating these ideas. On the left, the optics. In the center, the circuitry looking for differences, and then, on the right, deciding whether the difference is enough to send it up for processing.
So going back to the very general idea of neuromorphic computing, it is applicable in many domains:
Sign up for Sunday Brunch, the weekly Breakfast Bytes email.