Never miss a story from Breakfast Bytes. Subscribe for in-depth analysis and articles.
Last week was the Linley Fall Processor Conference. That still seems to be the name even though the Linley Group was acquired by TechInsights last year. As always, the conference opened with Linley Gwennap's keynote. As has been the case for several years now, the focus was on what was going on in processors for AI training and inference.
He started looking at imaging. One issue with imaging is that everyone tends to use ImageNet, where the images are 224 pixels square. In comparison, HD (1920 by 1080) requires 40 times as much computation and memory, and some automotive apps are moving to 4K which requires even more. Large models have diminishing returns with VIT-4K requiring 100 billion weights.
Natural Language Processing (NLP) models are ballooning even more since they are addressing more challenging tasks such as summarizing articles or translating English into Chinese. The largest models require 12 trillion weights! Model size is limited by the training time. For example, the well-known GPT-3 model takes 1,024 NVIDIA A100 GPUs over a month. Cluster size is topping out too, since 1,024 GPUs costs around $25M. So there has not been any real growth in the largest trained models in the last year. Training separate models in a "mix of experts" (MoE) is easier than training a single large model. Future growth will be paged by hardware progress such as the new NVIDIA Hopper H100 GPU.
Another trend is the move to smaller data types, in particular, FP8 (yes, 8-bit floating point) is emerging for both training and inference, but there are different flavors depending on how many exponent bits are used. I covered this in more detail in my post HOT CHIPS Day 2: AI...and More Hot Chiplets. NVIDIA's Hopper is the first commercial design that supports FP8. It seems that FP8 is most beneficial for large NLP models.
Another tradeoff that some processors are making is sparse computing. NVIDIA's Ampere, for example, can preprocess the model to remove half the weight by eliminating values that are close to zero. This greatly improves performance, but there is sometimes a reduction in accuracy.
Some novel approaches are spiking neural networks (SNN) which are more like real brains. Analog and photonic computation promise large power savings. There are many startups working with some of these approaches (some of whom presented at the conference) but none have achieved high-volume production.
Linley said that in previous years he has been criticized for being "too nice" to everyone. So this year he had a few slides sprinkled through the presentation titled "What does Linley really think?" The first one was:
There are new results from the MLPerf benchmarks. See my post MLPerf: Benchmarking Machine Learning for background. Some random notes:
NVIDIA's Hopper aka H100 triples A100's TOPS and initial MLPerf results show an average gain of 1.8X (using the same datatypes). FP8 support improves training by another factor of 2X, and NVLink communications improve training of very large models by another 2X. But power rises to 700W and the efficiency gain on most models is about 40%. However, production is delayed, with the PCIe card version shipping now and SXM modules around end of the year, and DGX systems in Q1.
Qualcomm's Cloud AI 100 can outperform NVIDIA's A100 especially in power efficiency (75W per card versus 400W) and uses less rack space (2U vs 6U). Hopper/H100 will deliver greater performance but doesn't close the gap in terms of power efficiency. But also note that Cloud AI 100 only handles inference, not training.
Datacenter processors are all adding AI engines:
So what does Linley really think?
Next, Linley moved on to the "edge", which he defined as everything outside the datacenter.
First, all PCs will have AI acceleration (Apple, AMD, Intel). Universal deployment will encourage more software development.
In automotive, NVIDIA's Orin offers 137 TOPS at 60W, Thor in 2024 at 1,000 TOPS (INT8) at maybe 300W. Intel's Mobileye (well, technically not Intel's anymore) has level 4 driving solutions with EyeQ5 needing several chips at 40W, and EyeQ6 Ultra with a single 100W chip but out in 2025. Qualcomm Cloud AI 100 delivers up to 400 TOPS at 75W, or 1000 TOPS at 15W.
There are lots of startups offering scalable IP for edge SoCs. Literally too many to mention. Established IP vendors, such as Cadence, offer AI tiers ranging from basic (4-256 MAC units) to max (1K-16K MAC units, 32 TOPS). But pretty much all the established IP vendors offer some sort of AI acceleration.
That last bullet is the most telling. Given that the automotive market players have settled out, and all laptops and smartphones have (or will have) AI accelerators, there just is not a high volume opportunity for licensing AI accelerator IP or selling AI accelerator chips. The conference reminded me of the old Linley Mobile Processor Conference, where lots of people were designing processors for the mobile handset industry. But it turned out that there wasn't really a market at all. The leaders in mobile built their own processors, and Qualcomm and Mediatek supplied pretty much everything else. It seems clear that most of the AI processor companies will not find large profitable markets.
Sign up for Sunday Brunch, the weekly Breakfast Bytes email.