Getting to Hyperscale Data Centers: PCs to Clouds

3 Apr 2020 • 8 minute read

This post is a continuation of yesterday's post, Getting to Hyperscale Data Centers: Mainframes to Minicomputers. It continues the story from minicomputers to PCs, and then to today's environment of hyperscale data centers for "big iron" computation, and smartphones for computation at the edge.

The PC Era

The PC came along in the late 1970s. I use PC generically to mean small computers intended for one person, so I include the Apple II, the TRS-80, and, of course, the IBM PC itself. Initially, these were not really designed for scientific computation. They could run simple applications. One of the most important, the first "killer app", was the spreadsheet, which I wrote about in my post 40 Years Ago, "Spreadsheet" Didn't Mean Excel, It Meant VisiCalc.

One of the most significant things about the IBM PC is that IBM didn't use an IBM microprocessor (despite having one of the most leading-edge semiconductor design capability in the world), they purchased 8088 microprocessors from Intel. And they licensed MS-DOS from Microsoft instead of writing their own operating system (despite having one of the most leading-edge operating system development capability in the world). Famously, they didn't retain the rights to MS-DOS on other platforms (what other platforms would it run on?). This meant that other manufacturers built PCs too, known as clones. There were some issues about creating the BIOS since that was IBM's, so it had to be clean-roomed. If you want to see a thinly disguised version of Compaq building PC clones, then watch season 1 of Halt and Catch Fire on Netflix.

This combination of lots of competing clone companies, along with Moore's Law driving microprocessor performance, meant that the power of PCs improved quickly. Engineering workstations also improved, but their volumes were much less so over time Intel was able to invest much more in driving microprocessor performance than, for example, Sun could with their SPARC architecture. In 1991, Linux was announced. This was a Unix-like operating system similar to the ones that ran on all the families of engineering workstations.

The Server Farm Era

The first server farm I came across was the one we built at Ambit in the late 1990s (Cadence acquired us soon after we built it, so it became the first Cadence server farm). Today, server farms are built with various formats configured for standard racks. But when we built our test farm at Ambit, there were no "blade" or "rack" versions of Sun or HP workstations. They wouldn't even give us a discount for not wanting any monitors with each workstation. I think we had 40 Sun workstations and 20 HP workstations in our farm, so every PC in the office ended up with a big engineering-quality monitor (and we still had a roomful of spare monitors). Since the workstations could not be mounted in racks, we built shelving and put them on shelves, with a big rack-mounted router in the center. None of us involved ever thought to take a photograph of this server farm it seems...at least, none of us can find one.

What I called PCs above, along with the Linux operating system, became servers, single-board computers. The x86 microprocessor market bifurcated into low-power processors aimed mainly at laptops, and high-performance processors aimed at servers. This was also around the time that Denard scaling ended and so we could no longer increase the clock frequency of processors—instead we had to take our increased compute power in the form of more cores. The laptop market had a few cores, since it was hard to make use of a lot. The server market had as many cores as would fit on the die since they were easy to make use of. If you have a thousand users accessing a web resource, and a hundred cores, then put 10 users on each core.

The form-factor changed. No longer did you need to buy monitors and put the computers on shelving, they were designed to be rack-mounted. The biggest users of data centers built their own servers. For example, the picture on the right is one of Google's first servers, now in the Computer History Museum in Mountain View. The servers are in the bottom part, and the network routers are across the top.

These servers became the compute fabric of choice for EDA and semiconductor design, in on-premises server farms. Other architectures for general-purpose computers fell by the wayside. Oracle acquired Sun and soon obsoleted the SPARC architecture. HP acquired Apollo and Compaq and, after a brief excursion with Itanium, focused on x86. DEC went bankrupt. SGI spun out MIPS, vanished, and then MIPS was acquired by Imagination, and then spun out again.

The one exception was Arm. They developed the ARM7TDMI and it became the standard in the mobile industry. For that story, see my interview with Simon Segars in The Design that Made ARM (before he was CEO of Arm, he was the project manager for the ARM7TDMI). Mobile started focused on what were clearly phones. But with the arrival of the smartphone, they became general-purpose computers. For most of the world, they would become the primary means of accessing the internet. Most of the world never had a laptop (nor a wired phone) so this was a leapfrog in two technologies at once (and if you add mobile banking/finance into the mix, in three technologies, since they never had bank accounts nor credit cards either).

So the world arrived at a stage that big users of computation such as semiconductor and system companies built their own data centers. Outside, taking a global perspective rather than a US perspective, the primary means of computation was the smartphone.

The Cloud Era

Amazon built its infrastructure on its own data centers, just like everyone else. In 2002, AWS (Amazon Web Services) launched as a free service to allow other people to connect Amazon services into other web services. In 2006, AWS released its first cloud product, providing its infrastructure to paying customers (which would eventually become known as IaaS or Infrastructure-as-a-Service). That was S3 and EC2, which are still workhorses of AWS today.

Google and Microsoft (under the name Azure) would take a similar approach. This became known as cloud computing. Some of the earliest users were software/internet startups. No longer was it necessary to build expensive-to-buy, expensive-to-maintain, compute infrastructure. AWS would provide all of that. Even large internet companies, such as Netflix or Lyft, run on AWS. A few years ago, when Dropbox switched from cloud to its own data centers, it made the news because it was such a rare transition. It seemed on a par to a fabless semiconductor company building a fab (which has never happened).

Most of the initial use of cloud computing was to get a lot of machines without having to buy a lot of machines. But they were largely used to scale services such as large websites. In the EDA and semiconductor worlds, they were used to provide some easy scaling by running a lot of SPICE or SystemVerilog simulations in parallel. But EDA tools had not been architected to take advantage of a lot of cores and had limited ability to take advantage of the almost infinitely scalable capacity of the cloud. Keynote presentations would enthuse about using tens of thousands of servers. That was great if you wanted tens of thousands of simulations, but if you wanted a single big simulation to run tens of thousands as times as fast, that sort of scalability did not immediately exist.

Over the last several years a lot of effort has gone into making EDA tools scale to take advantage of servers in the cloud in a way that was almost impossible in on-premises data centers. A data center might have 100,000 cores or more, but if you needed 1,000 of them at once then there was no real way to do it. Either that was beyond the capacity permitted to ask for, or else you would wait forever. But so-called hyperscale data centers changed that dynamic by having enormous numbers of cores/servers available.

EDA tools vary in their ability to take advantage of hyperscale datacenters. Some of this is simply that some algorithms scale well, and some do not. It is straightforward to scale characterizing a standard cell library at lots of corners to huge numbers of processors. Scaling SPICE to simulate millions of transistors is a lot harder since there is a global timebase. For more on this topic see my post Computational Software.

Of course, Google, Amazon, Facebook, Apple, Microsoft, Alibaba, and more (especially in other countries) operate their own hyperscale data centers to scale their own applications to handle lots of parallel traffic. Amazon (AWS), Microsoft, and Google, Alibaba, Tencent, and others also provide access to their infrastructure to anyone as a paying service.

A trend for the hyperscale data center owners is to improve performance by building their own chips. For one example, see my posts about what AWS is doing: HOT CHIPS: The AWS Nitro Project and Xcelium Is 50% Faster on AWS's New Arm Server Chip. For my coverage of Google's Tensor Processing Unit (TPU), see Inside Google's TPU.

Cadence Cloud

Cadence has provided an increasingly broad offering of tools and business models in the cloud. For more, see:

Sign up for Sunday Brunch, the weekly Breakfast Bytes email.