Tesla Goes All-In on Vision...and Supercomputers

12 Jul 2021 • 3 minute read

If you have even a passing interest in autonomous vehicles aka self-driving cars, then you probably know about the lidar controversy. I wrote about Tesla's Self-Driving Computer (SDC) when they announced it at their analyst's day a couple of years ago in my post Tesla Drives into Chip Design. The last sentence of that post is:

There is a second design started about a year ago. They didn't say much about it other than it not being a chiplet-based design. Oh, and they still have no plans to use lidar.

This is controversial since every other autonomous vehicle technology company is using lidar. Of course, as Tesla points out, the fact that humans drive using just vision is a sort of existence proof that you don't need lidar. As far as I know, every other company is also using radar. All Tesla cars are equipped with radar.

The video at the end of this post is a presentation at CVPR, actually at the CVPR Workshop on Autonomous Driving, by Tesla's Andrej Karpathy. He is the Senior Director of AI and heads up all their neural network development. CVPR stands for "computer vision and pattern recognition" but everyone just calls it CVPR. It is the premier conference in computer vision.

I don't know if this is the first time Tesla has announced it, but their latest software relies purely on vision. He refers to the software version that also incorporates radar as the "legacy stack". He maintains that it makes more sense to "bark up the right tree" and not have the additional complication and investment of a radar stack and a sensor fusion stack. In fact, during the presentation, he shows some examples where the legacy stack with radar gets confused (a bridge abutment, a parked truck) and the vision-only version behaves perfectly. Of course, it's not like he's going to show an example the other way around, so this might just be cherry-picking. The biggest challenge in vision-only (without radar or lidar) is depth sensing. He went into quite a bit of detail about this. Also, he showed examples of driving in rain and snow, the other situation in which other people will say that radar is essential.

To do vision-based self-driving requires:

A fleet of cars generating data (or else you don't have enough rare conditions)
A lot of storage (petabytes)
A lot of computer power (a supercomputer, see below)

One of Tesla's big advantages over its rivals is that it has deployed a large number of vehicles and they are constantly recording data. Even if the car is not in any sort of self-driving mode, it is running in parallel and seeing what it would have done, and noticing differences such as "the human braked and we would not have done". So this has generated an enormous amount of data, far more than, say, Waymo with a few hundred cars.

The training of the neural networks is done offline. This means that it is not under real-time constraints and so a lot of the labeling of images can be done automatically with only a small amount of human involvement since the system can take its time.

Training the vision neural networks requires not just a huge amount of data, but a huge amount of compute power. As such, Tesla has built itself a supercomputer. Andrej believes it is the fifth most powerful supercomputer in the world. (To read about the most powerful supercomputer in the world, see my post Japanese Arm-Powered Supercomputer Takes the TOP500 Crown.) Tesla is building an even more powerful supercomputer called Dojo, but Andrej said he wasn't ready to give any details yet.

Once the model is developed on the supercomputer, it is optimized and then deployed to the vehicles. For the last few months, Tesla has been deploying vision-only versions of the FSD software. You can read my earlier post to see more details, but you can see that the FSD has two chips, adding a level of redundancy since only one is required. Inside each chip is the neural processing unit (NPU). In fact, there are two of those.

I recommend watching the entire video to understand the tradeoffs involved and why Andrej believes not just that self-driving is better without lidar but that self-driving is better without radar, too.

Watch the video (38 minutes):

Sign up for Sunday Brunch, the weekly Breakfast Bytes email.