HOT CHIPS: Esperanto's Dave Ditzel and 1000 Minions

24 Aug 2021 • 5 minute read

At HOT CHIPS this morning, Dave Ditzel of Esperanto is presenting their latest chip in a talk titled "Accelerating ML Recommendation with over a Thousand RISC-V/Tensor Processors on Esperanto’s ET-SoC-1 Chip" I actually had a pre-briefing with Dave the week before HOT CHIPS, which had been set up by Craig Cochran who is working with them. It was Craig who brought me back into Cadence in October 2015, gosh almost six years ago already.

I first heard about Esperanto back in 2018 at the first RISC-V Summits held at the San Jose Convention Center. Until then, the summits had been smaller and were held at the campuses of friendly companies like Google or Western Digital (San Disk). I wrote about the big Esperanto chip Maxion in my post at the start of the following year RISC-V Cores: SweRV and ET-Maxion. This is is a "big" out-of-order implementation and was, at the time, the highest performance RISC-V processor that had been announced.

I actually wondered what had happened to Esperanto since they made a fair bit of noise back in 2018 but not so much since. Dave told me that in the beginning, Esperanto publicized what it was doing mainly as a recruitment tool. But that they didn't want to say too much about what it was doing...until their coming-out party today.

If the big core is called the Maxion, you won't be surprised to learn that the small core is called the Minion. The chip is called the ET-SoC-1 and contains over 1000 RISC-V processors on a single 7nm chip including:

1088 energy-efficient ET-Minion 64-bit RISC-V in-order cores each with a vector/tensor unit
4 high-performance ET-Maxion 64-bit RISC-V out-of-order cores
>160 million bytes of on-chip SRAM
Interfaces for large external memory with low-power LPDDR4x DRAM and eMMC FLASH
PCIe x8 Gen4 and other common I/O interfaces
Innovative low-power architecture and circuit techniques allows entire chip to:
- Compute at peak rates of 100 to 200 TOPS
- Operate using under 20 watts for ML recommendation workloads

Note that power number. Despite over 1000 cores on the chip, the power budget is under 20W.

Dave told me that their initial focus from a market point of view is doing recommendation inference in data centers (think of systems that recommend books or videos for you). Dave says that this is far and away the biggest market opportunity for machine learning chips. Many x86-based servers in data centers have an available PCIe card slot. What they don't have is a big power budget nor space for more than a single card. Dave's list of more detailed requirements is:

100 TOPS to 1000 TOPS peak rates to provide better performance than the x86 host CPU alone
Limited power budget per card, perhaps 75 to 120 watts, and must be air-cooled
Strong support for Int8, but must also support FP16 and FP32 data types
- 100 GB of memory capacity on the accelerator card to hold most embeddings, weights, and activations
- 100 MB of on-die memory
Handle both dense and sparse compute workloads, embedding look-up is sparse matrix by dense matrix multiplication
Be programmable to deal with rapidly evolving workloads, rather than depending on overly specialized hardware

Of course, this might seem a bit self-serving, just taking the Esperanto technology and claiming it is a list of requirements. But Dave had a list of independent references to papers from computer architecture conferences, and presentations by hyperscalers like FaceBook at Linley.

There are two approaches to delivering this sort of AI technology. One is to build a single big "hot" chip (or maybe I should say HOT CHIP since we are at HOT CHIPS). This contains several cores and uses up the entire power budget. But it has limited I/O budget to get data in and out of the system. Esperanto has, instead, designed a system with multiple low power cores. The advantage is that performance, pins, memory, bandwidth scale up with more chips. Esperanto worked out they could have six chips on a card if each was less than 20W. This requires operating at a lower voltage, requiring circuit and architecture developments.

The graph above shows the tradeoffs of voltage versus performance. It is assumed that the 1000 Minions get half the chip power of 20W, so 10W for 1000 cores or 10mW per core. This is very low. Too high a voltage and too much power, too low a voltage and not enough performance (plus, I'm guessing, more circuit design challenges).

I won't go through all the detailed architecture of the chip and how the Minions are linked up into neighborhoods and then shires. The diagram is above is the obligatory HOT CHIPS blog diagram of the chip. There are 32 minions in each of the little black blocks, giving 1088 in total. The reason for the slightly odd number is that the yellow blocks on the top right are the four Maxion processors and the PCIe interface to the chip.

But it's not just a block diagram. As planned, six chips fit on a card. This is a standard OCP Glacier Point v2 card. Peak performance is over 800 TOPS when all ET-Minions on six chips are operating at 1 GHz.

Here are some more details of the design. It is 24 billion transistors in 7nm. Each of the 1088 Minions is an energy-efficient 64-bit in-order RISC-V processor each with an attached vector/tensor unit. There are over 160MB of on-die SRAM used for caches and scratchpad. The package is 45mm x 45mm with 2494 balls to PCB, over 30,000 bumps to die. Each Minion shire has independent low power supply inputs and so can be used to adjust for Vt variations and enables effective DVFS. The silicon is currently undergoing bringup and characterization.

In summary. The Esperanto ET-SoC-1 is the highest performance commercial RISC-V chip to date, with more RISC-V cores on a single chip, more RISC-V aggregate instructions per second on a single chip, and the highest TOPS driven by RISC-V cores.

Esperanto’s low-voltage technology provides differentiated RISC-V processors with the best performance per watt. Dave said that the hard part was making all the tradeoffs, combining processor and memory subsystem architectures, and innovative circuit and technology techniques for low-voltage operation.

Sign up for Sunday Brunch, the weekly Breakfast Bytes email.