Fooling Neural Networks

6 Feb 2018 • 4 minute read

I wrote recently about various aspects of modeling, not just transistor models, but also deer and climate change. One of the things that I mentioned in passing was over-training. This is where the model is made so accurate, often by adding additional parameters, that it matches the training data perfectly. However, on other data it doesn't do as well as it seems it should. This was my experience modeling Scottish Highland deer populations, where more complex models would match the training forests better and better, but at the same time didn't improve (or got worse) against the test forests. Global warming models are similar, and can be made to match the existing data almost arbitrarily close (known as hindcasting) only to diverge increasingly badly once the parameters get locked in place and time passes to find out how accurate the predictions were.

When training neural networks, a similar phenomenon has been observed. In some ways it is more insidious. When adding another parameter to our deer model, we knew we were doing it. When the neural network is being trained and starts to give weight to a new parameter, which is pretty much the same thing, it is invisible.

This leads to networks that are not, to use Nassim Taleb's word, anti-fragile.

Sound

I came across a recently (January 5, 2018) published paper Audio Adversarial Examples: Targeted Attacks on Speech-to-Text that did something similar with waveforms:

We construct targeted audio adversarial examples on automatic speech recognition. Given any audio waveform, we can produce another that is over 99.9% similar, but transcribes as any phrase we choose (at a rate of up to 50 characters per second). We apply our iterative optimization-based attack to Mozilla's implementation DeepSpeech end-to-end, and show it has a 100% success rate. The feasibility of this attack introduces a new domain to study adversarial examples.

Here is the first figure from the paper, changing "it was the best of times, it was the worst of times" into "it is a truth universally acknowledged that a single" (pop quiz: where are those two quotes from?).

Vision

The first major paper on this on anti-fragile neural networks in vision was published in 2014 with the rather anodyne title of Intriguing Properties of Neural Networks.

The most astounding examples, I think, are the pictures below. The column on the left shows the original pictures, all correctly identified by AlexNet. The column in the middle shows the small amount of carefully constructed noise. When the noise is added to the original pictures, you get the pictures in the right column. These look identical to the original to the human eye, and certainly if we can identify the image in the left column, we can identify the corresponding image in the right column. AlexNet identifies all the images in the right column as ostriches!

Here is another famous example. On the left is Reese Witherspoon. By adding some carefully constructed glasses we get the picture in the middle that looks like...Reese Witherspoon with glasses. But the neural net identifies it as Russel Crowe (on the right).

Okay, those are amusing rather than scary. But, in another paper, a team of researchers from several universities worked out how to fool neural nets in the physical world with just a few pieces of tape:

Starting by analyzing the algorithm the vision system uses to classify images, they used a number of different attacks to manipulate signs in order to trick machine learning models into misreading them. For instance, they printed up stickers to trick the vision system an autonomous car would use into reading a stop sign as a 45-mile-per-hour sign, the consequences of which would obviously be very bad in the real world.

Summary

What is surprising to me is just how little the input data needs to be distorted to cause the neural networks to misidentify things. The stop signs with a few pieces of tape on are clearly just that to a human—a stop sign with a few pieces of tape. The images on the right in the 3x3 grid above look nothing like ostriches. These are not borderlline cases, like the signs in the traffic sign database in fog, for example, where even a human struggles to decide if it is a 30mph or a 50mph speed-limit sign. I've not listened to the sound examples, but I'm sure that they sound normal to the human ear since they have just 0.1% additional noise.

So a bit scary, to say the least!

Pop quiz answers: "It was the best of times..." is the opening sentence of A Tale of Two Cities by Charles Dickens. "It is a truth..." is the opening sentence of Pride and Prejudice by Jane Austen.

Sign up for Sunday Brunch, the weekly Breakfast Bytes email.