How Is Google So Good at Recognizing Cats?

22 Oct 2015 • 3 minute read

Convolutional Neural Networks (CNNs) are a key technique for image recognition. As Wired Magazine put it a few years ago, “Google’s Artificial Brain Learns to Find Cat Videos.”

“Artificial Brain” may be going a bit far, but neural networks really are constructed using a similar architecture to the brain, but on a much smaller scale. The brain contains billions of neurons but typical CNNs contain thousands. Google has moved on from recognizing cat videos, but CNNs are still the basis of the image recognition and classification in Google Photos.

I first heard of CNNs at Yann LeCun’s keynote at the 2014 Embedded Vision Summit. He was describing work that was largely done at New York University but he since also had taken a position being in charge of AI research at Facebook. Yann talked about how CNNs had become the dominant methodology for image recognition. He had a demo where he pointed a camera attached to his laptop at various things—a shoe, a bagel, a pencil—and interactively the computer identified what it was seeing. Facebook obviously has a lot of interest in automatically categorizing the ½ billion photographs that are uploaded there every day: what they are, who is in them. CNNs are and will be the basis of this.

Facebook and Google obviously have huge datacenters. But what if you want to do image recognition in your car? You may not want to recognize cat videos, but advanced driver assist, never mind autonomous vehicles, needs to recognize street signs, traffic lights, and tell the difference between a child and a trash-can.

Even more difficult, what about your phone? Predictions are that smartphones will soon have multiple front- and rear-facing cameras and will be able to do recognition. CNNs are the approach of choice here, too. The biggest issue is that there is a limited power budget in your phone and that, in turn, puts a cap on how much computation can be done.

Neural networks can be used for various purposes, such as speech recognition or capturing stock trading strategies, but to keep the description simple I’m going to assume we are just worried about image recognition.

A neural network consists of a number of artificial “neuron” circuits, with the inputs of some neurons connected to the outputs of some others. As in a real neuron, the connections have weights associated them. Typically the values of the weights are determined during a training process and then are used during the recognition process to actually perform identification. If you think about it, your brain went through the same process when your mother told you “that is a cat”, “that is a dog (not a cat)”, and when you were older the DMV told you, “that is a stop sign”, “that is a speed limit sign.”

A CNN is a special case of a general neural network, inspired by the visual cortex in the brain. They consist of a number of layers that receives input from a small part of the previous layer (or the image for the first layer) and can extract primitive features such as edges and corners that can then further be combined. A number of these convolutional layers are followed by fully connected layers that perform the classification itself.

There are a number of ways to implement CNNs, ranging from networks of analog cells that are much closer to real neurons, up to programs running on general-purpose computers (or lots of them). But for embedded, neither of those solutions are ideal. What is needed is a special-purpose core that gives the best tradeoff between the flexibility of software and the power/performance of custom silicon solutions. Billions of multiply-accumulates (MACs) per second is the requirement, plus high bandwidth and low power.

CNNs can be used to implement recognition of traffic signs with visual impairment (like in real life), which is a standard benchmark known as GTRSB. Using the Cadence hierarchical CNN approach achieves a correct detection rate of 99.58%, the best rate achieved on GTRSB to date. One day your car will thank you.

Want more details? A new Cadence white paper, Using Convolutional Neural Networks for Image Recognition, has just been published.