Understanding Machine Learning: A Model with One Weight

18 Sep 2020 • 2 minute read

I wrote a post recently, HOT CHIPS: Scaling out Deep Learning Training, which I considered to be an introduction to deep learning training (in the first half) and scaling out (in the second). But even my boss told me it was "meaty" so I think it was not basic enough.

As it happened, I came across this video recently, which I think is a wonderful introduction to what all these terms in deep learning mean: inputs, weights, outputs, ground values. It's a neural network with a number of weights...in this case, just one.

The video has more in it, the brilliant neural network is in the middle. But watch the whole thing (9 minutes):

Here's the heart of it. Three people go to an ice cream shop. How much is their bill?

Well, with zero information, you have pretty much zero idea. You don't know the price of ice cream, how hot the weather is, or whether these people are oldies or kids.

But you get some info about the previous three groups of people who went in. A single person paid ten dollars, two people paid twenty dollars, and four people paid forty dollars. The important thing to notice is that you have never seen three people before. Whatever you think the three-person bill is going to be, it is a guess based on the other people you got told about, the training data. It is not just repeating some data you already saw and recorded.

You already know the answer to how much three people paid because your brain is a neural network training machine. Even though you have never seen a datapoint for three people, you are going to estimate thirty dollars.

This is a neural network. Simple, to be sure. The input is the number of people. The output is the estimated bill. And the weights are a single number, 10, as to how you get from input (number of people) to output (bill). The labeled training dataset is the information about 1, 2, and 4 people. Obviously, this is over-simplified. But the basic idea is correct. Take the input, process it with the weights, and get the output. In this case, the training was simple since I just showed you three other examples.

Realistic training is more like how a zillion people thought about a zillion movies, and thus which movie they might like to watch next. That's what Netflix and HBO have to do to suggest something you might like. It is the same basic idea scaled up a zillion times.

That's what a neural network is. Of course, it has more than one weight. But it is still a statistical probabilistic machine estimating what the weights should be. In my post about the training day at HOT CHIPS, I laid out how training actually happens. It was still oversimplified, but also more realistic since I was telling you about how the HOT CHIPS experts thought you should scale to lots of servers. They can't get away with just one weight.

Usually, a deep learning network is drawn as a sort of horizontal flow, as in the image above, for the one-weight ice cream model. Or the image below, that comes from the HOT CHIPS tutorial on scaling out deep learning training.

Sign up for Sunday Brunch, the weekly Breakfast Bytes email.