NVIDIA's Jensen Huang on Accelerating the Race to Autonomous Cars

24 Jul 2017 • 5 minute read

Jensen (Jen-Hsun) Huang is CEO of NVIDIA. If you know anything about NVIDIA, it is probably that they have a strong position in the graphic processor unit (GPU) market, especially for video games. However, their industry has been changing. A GPU is a specialized, highly parallel floating-point processor, and about a decade ago, NVIDIA started to evangelize the CUDA language that would be used to program GPUs for things other than graphics. This has led to them being used for constructing supercomputers. For example, the DOE's supercomputer Titan at Oak Ridge National Laboratory uses NVIDIA Tesla GPUs on top of a base of IBM Power servers and runs at nearly 300 petaflops.

But Jensen came to Ludwigsburg to tell the European automotive industry what is going on in using this technology to create autonomous vehicles.

Moore's Law Slowing

As everyone who has anything to do with semiconductors knows, some aspects of Moore's Law have stopped. You can still get more transistors. It remains debatable whether these are continuing to get cheaper, but that is not a discussion for today. What for sure is true is that you can't push the clock rate up higher and higher due to power reasons. This means that higher performance comes from finding ways to use more cores in parallel, either by adding specialized processors to an SoC to go along with the CPU, or, as with a GPU, simply using the transistor budget to add as many cores as possible. GPUs are some of the most complex chips in terms of transistor count.

Another huge change is that in the last 10 years, but especially the last four or so, deep learning and neural networks have started to be well understood and effective. This is a mixture of work by AI researchers, and also that the computing substrate is now powerful enough that for some applications, neural networks now have higher performance than humans. And, unless you have not been paying attention, I'm sure you know that convolutional neural networks are expected to be a key technology in autonomous driving.

At the other end of the scale is the NVIDIA DRIVE platform that is intended to run on-vehicle. As I wrote just recently in my post CactusNet: Moving Neural Nets from the Cloud to Embed Them in Cars, the training of neural nets in the cloud (often using NVIDIA GPUs as co-processors) and then optimizing the coefficients to move to an embedded solution looks like it will be the approach for some aspects of automotive.

The diagram above shows 40 years of microprocessor trend data. The blue dots are general-purpose CPUs. You can see the falloff in the improvement since Dennard scaling and other tricks for getting more performance without more power are over, and we can wring any more instruction-level parallelism out of the general-purpose CPU. GPU computing offloads the parts that the microprocessor is not good at, which are the green dots. Without making this change, our performance would be nowhere near what we need (1.5X per year). The scale is logarithmic, so the difference is not trivial, it is a factor of 1000. In ten years' time, we would have lost the power of all the computers in the world today.

For deep learning, every top supercomputer is using NVIDIA GPUs. There are now 500,000 GPU developers. NVIDIA powers deep learning at Alibaba, Amazon, Baidu, Facebook, Google, IBM, Microsoft, Tencent, and more.

So three trends are coming together:

Using CUDA HPC
Deep learning
Autonomous vehicle (AV) platform

Deep Learning Is a New Computing Model

The best way to think about this is that deep learning is a new computing model. Data is the new source code. Deep learning on a GPU is the new compiler. The resulting deep neural net (DNN) is the new executable. This is how the NVIDIA DRIVE PX works.

DRIVE PX

DRIVE PX is a whole AV stack. Jensen explained why NVIDIA is developing the whole stack. Driving application has three components (perception, localization, planning). Or look, deduce, decide what to do. This is all part of DRIVE OS.

We did this so we can understand the computational load of autonomous vehicles. Until now, very few people understand what algorithms and performance will be needed. We decided to understand this problem deeply and completely. Second, we offer this entire stack to the industry optionally. They can use just hardware and the lower levels and add their own OS, etc. We offer that so that partners can choose to use or not use the higher layers.

Jensen talked about the XAVIER chip, which has three types of core, a regular single-thread CPU, a CUDA GPU, and an AI (CUDA + Tensor core). You can see the difference in performance for these approaches versus just running software on a CPU. The whole thing runs in 30W, compared with 100W for a high-end desktop CPU. There is also a lot of I/O to support connecting to the rest of the car, such as four 10G ethernet ports. The goal is to do level-4 ADAS on a single chip in about 30W.

Perception Training

The next big challenge is to use perception training, as we do ourselves as humans. We need real-world data and collecting data is cheap. But labeling is expensive. However, we can use AI to automatically label and then the amount that needs to be done by humans will decrease. We can do this in both the real world but also in the virtual world, where NVIDIA's gaming expertise (which is often all about simulating virtual worlds) comes into play. The virtual world is self-labeling, since we know what is a car and what is a pedestrian, which is a big advantage in getting everything going.

Power

Jensen wrapped up by pointing out that achieving parallel computing with CPUs will work, but the energy efficiency is terrible, 10-20X worse. For a level-4 car, that might mean 300W. For level 5, that goes up to perhaps 1000W for all the cameras, all the lidars. But with all those CPUs, about one-third battery capacity is just to run the computing infrastructure. This just doesn't match with electric traction.

Sign up for Sunday Brunch, the weekly Breakfast Bytes email.