Linley: Habana and Cerebras

21 May 2021 • 4 minute read

In the recent Linley Spring Processor Conference, there were many processors for various kinds of AI, deep learning, and inference. As I said in my overview post Linley: Driving AI from the Cloud to the Edge, I don't think that all these products can be successful. Of course, I don't really know which ones are going to turn out to be the winners. However, in this post I'm going to look at two of the products presented: Habana and Cerebras. The reason I picked these is that it is very difficult to evaluate neural network technology. This is obviously next to impossible to do from just a presentation, but there is plenty of anecdotal evidence that it is really hard to assess how effective a given solution will be for a given real-world problem. Benchmarks give at least some independence, but it is not clear whether benchmark results cross over to corresponding results on real-world neural networks. See my post The Latest MLPerf Results for Inference for more details on that topic. But these two products are seeing adoption in data centers, so have already achieved some level of success.

AI Training in the Data Center with Habana Gaudi-Based Amazon EC2 Instances

You probably know that Habana was acquired by Intel a couple of years ago. However, the Habana name seems to be living on (see the logo here from one of the first slides in the presentation at the start of this post). Habana was founded in 2016, launched its first inference processor in 2018, and its first training processor in 2019 (and became part of Intel later in the year). Habana's training processor is called Gaudi, and its inference processor is called Goya.

The reason that I picked Habana as one of the companies to write about is that Amazon's AWS has announced that it will make servers available with Habana designs as accelerators. Of course, that is not a guarantee of business success, but having the largest hyperscale cloud vendor as a customer is a great start. If you want to sell a lot of chips, you have to sell them to people who buy a lot. The two obvious candidates are mobile and hyperscale data centers, but a lot of the obvious customers in those markets design their own chips. In fact, even AWS/Annapurna has a chip of their own, Inferentia, so I'm not entirely sure what types of jobs these different systems are targeted at. More details of the AWS/Habana announcement at the end of last year were made in an AWS blog post Habana Gaudi AI Processors to bring lower cost-to-train to Amazon EC2 customers:

Today, in the re:Invent CEO Keynote, Amazon Web Services announced EC2 instances that will leverage up to eight Gaudi accelerators and deliver up to 40% better price performance than current GPU-based EC2 instances for machine learning workloads. Availability of Gaudi-based EC2 Instances is targeted to the first half of 2021.
...
An 8-card Gaudi solution can process about 12,000 images-per-second training the ResNet-50 model on TensorFlow. Each Gaudi processor integrates 32GB of HBM2 memory and features RoCE on-chip integration used for inter-processor connectivity inside the server.

Eitan Medina, Habana's chief business office, presented a little of the architecture and some results. The software stack is pretty standard, starting from Pytorch and TensorFlow, with a suite of compilers and libraries (in particular, for vision, natural language, and recommendation):

Cerebras Wafer Scale Engine

In August 2019, at HOT CHIPS, Cerebras announced the largest chip ever. I covered it at the time in my post HOT CHIPS: The Biggest Chip in the World and followed up in one of my update posts Weekend Update 2. I'm not going to repeat the information from those two posts, which cover a lot of detail about how the chip was designed and manufactured, and how the redundancy works.

At the Linley conference, Cerebras announced a much bigger chip. Well, technically it is exactly the same size at 46,225 mm2 silicon, the largest die you can get out of a 300mm wafer. But this time it is in 7nm, which allows it to have 2.6 trillion transistors giving 850,000 AI cores, 40 GB on-chip memory, 220 petabits per seond of fabric bandwidth. This table gives a comparison of the WSE-1 and WSE-2.

Some deployments have been announced (all of them WSE-1-based CS1), and, as is normal, there are others that are not public:

Argonne National Laboratory: Cancer therapeutics, epidemiological research, gravity wave detection, material science/new material discovery
Lawrence Livermore National laboratory: Integrated into Lassen, the eighth largest supercomputer; cognitive simulation, traumatic brain injury research, fusion
Pittsburgh Supercomputer Center: A new supercomputer Neocortex made with Cerebras CS-1s and HP Superdome Flex Servers and offered as a cloud to researchers
Edinburgh Parallel Computing Centre (EPCC): NLP, genomics, epidemiological research
GlaxoSmithKline: Drug discovery, multilingual research synthesis
Other wins in heavy manufacturing, pharma, biotech, military, and intelligence

The rack, cooling, and power delivery are the same as in the first system, although apparently the new system draws less power. But I have to say the systems certainly look cool:

The IEEE's Spectrum magazine covered the Cerebras announcement in Cerebras’ New Monster AI Chip Adds 1.4 Trillion Transistors.

Sign up for Sunday Brunch, the weekly Breakfast Bytes email.