Never miss a story from Breakfast Bytes. Subscribe for in-depth analysis and articles.
On Friday, September 30th, from 6 pm until well after 9 pm Tesla held its AI Day 2022. Working for three hours is not the best way to start the weekend, but that's what I found myself doing. Luckily, the whole evening was fascinating. It was not actually necessary to watch it live, and the video is on YouTube (and embedded in this post at the very end).
Tesla's first AI Day was last year, and I wrote about what I found the most impressive part in my post NOT CHIPS: Tesla's Project Dojo. Tesla is completely open about the fact that the main purpose of AI Day is not to present its progress to the public and Wall Street. It is too technical for that. Instead, it is to recruit world-class researchers and implementers in AI to join Tesla. As such, it reveals a lot of information about how everything is implemented, which for any other company might be considered secret sauce. I'm sure other companies, like Waymo (Google self-driving car) or Cruise, could put on a fascinating show if they were prepared to reveal secrets. However, par for the course is more like my experience in an Aptiv driverless taxi in Las Vegas, where I was not allowed to even take a photograph of the dashboard display (see the end of my post Cruising Through San Francisco with No Driver).
One notable feature of the entire presentation was that everyone on the stage was an engineer. I don't think there was anyone from marketing. There was a long free-for-all Q&A at the end, and there didn't appear to be anyone (like legal) limiting the answers that the engineers (and Elon) gave.
One thing last year I didn't even mention, since it seemed a bit silly, was Tesla's introduction of their humanoid robot. The reason that it was silly was that they didnn't actually introduce a humanoid robot. It was just a dancer in a robot suit. However, in one year, they have made amazing progress, and a good part of the whole presentation was devoted to two versions of the robot, known as an Optimus Bot. One was a prototype (see picture above) made with off-the-shelf actuators. In an act of either bravado or foolishness, that was the first time that the robot had worked completely untethered, with no electrical umbilical nor safety cables. The doors opened, and the robot walked out and waved to the audience. Later they had a video of it doing actual tasks like moving a box from a shelf to a desk or picking up a watering can to water some plants. The "brains" of the robot (actually in the torso) is the same self-driving computer (SDC) that runs in Tesla cars (see my posts Tesla Drives into Chip Design and HOT CHIPS: The Tesla Full Self-Driving Computer).
The second was something much close to what Tesla eventually intends to ship, where the actuators were all designed by Tesla. The first prototype could walk and do other things. The newer one could do a few things but is apparently is still a few weeks away from walking. Elon Musk said that this robot in volume production should be much less than a car, perhaps under $20,000. It has independently movable fingers, opposable thumbs, and more. The next hour or so of the presentation went into a lot of detail about how to make this work, how to build a knee, what is involved in balance during walking, and more. It's way too much for me to write up here, so if you are interested in a deep dive, watch the video at the end of this post.
The next step is to start using Optimus for some basic operations in Tesla's automobile factories.
Next up was a discussion of full-self driving beta. In 2021 there were 2,000 customers using it. Now there are 160,000. Everything runs on the car, and nothing is coming back to the factory. Every Tesla shipped for the last several years has the hardware to be self-driving. The work going on is all in the software. There were rumors that Tesla would announce that all cars for the last few months have been shipping with the second generation of the SDC chip, but that did not happen.
Again, there was a deep dive into how the self-driving algorithms work. Don't forget Tesla only uses cameras, no radar or lidar. Their claim is that it makes the integration of all the inputs simpler, whereas having to do sensor fusion with all three cameras, radar, and lidar is just too complex. Today, the training is done using supercomputers Tesla built in-house with 10,000 GPUs for training and another 4,000 GPUs for auto labeling. There are 1.4B video frames used for training, which requires 100,000 GPU hours.
But the future for training is...
I've written about Dojo a couple of times before, the main post being NOT CHIPS: Tesla's Project Dojo. The head of the DOJO project is Ganesh Venkataramanan, who came on the update us all. He appeared just recently in my post HOT CHIPS: Beyond Compute – Enabling AI Through System Integration (the second day of the keynote from this year's HOT CHIPS).
Above is the time of the development that was used to structure the presentation. As in the other presentations, there was a lot of detail on things like the coefficient of thermal expansion and piezo-electric effects on the clock oscillators.
As I described in the earlier posts, Dojo starts from a D1 chip, and 25 of these are integrated into a tile, and then 6 of those are integrated into a tray. A tray is 75mm in height (3") and weighs 135kg (300lbs). It consumes 100 KW of power and delivers 54 petaflops (in BF16 or CFP8).
There is a host "processor" associated with each tray. It is built underneath and provides 512 x86 Linux cores, 8 TB memory, and 640 GB/s of PCIe bandwidth. Two of these combined units go into each cabinet. This level of integration has only been done "a few times in the history of compute." If you watch the video, Tesla had a cabinet on stage.
The performance of a single Dojo tile matches that of 6 GPU boxes and at a cost less than just 1 GPU box. As a result, networks that used to take more than a month to train now take less than a week. Another comparison: with over 4,000 GPUs in 72 racks, the same performance requires just 4 Dojo cabinets (for auto labeling). Tesla plans to build 7 Exapods in Palo Alto "right here across the wall."
The future roadmap is a new compute chip, D2, a new tile V2, and so on. Tesla believes this will give another 10X improvement in performance.
After that there was a Q&A with some interesting questions from the audience (many of whom were students — the target audience of the whole presentation). But this post is too long already, so you'll just have to watch the video below yourself. Q&A starts at 2 hours and 26 minutes.
You can watch the whole Tesla AI Day event (over 3 hours). Skip to 15:00, where the event proper begins.
Sign up for Sunday Brunch, the weekly Breakfast Bytes email.