System Analysis: Computational Software at Scale

12 Feb 2020 • 8 minute read

In about 2000, when I was the VP of Strategic Marketing for Cadence, I got a strange voicemail that started "This is Regis McKenna...". If you are old enough, you know that Regis, and his PR firm Regis McKenna Inc, was a legend in high-tech PR, in particular working with Intel to win the IBM PC microprocessor socket, which has to be one of the most important design wins of all time. He also worked with Apple to launch the Apple II. Plus, he was also the author of several best-selling books on marketing, such as The Regis Touch. In fact, he was so legendary that his business card just gave his job title as "himself".

I called him back and had a brief chat with him. He was working with a startup company called Caliper Life Sciences (acquired by Perkin-Elmer, I see, in 2011). They did micro-fluidic medical diagnostics, that they called "lab on a chip". Regis wanted to know if Cadence's tools could help design these microfluidic devices since doing them all by hand was a bottleneck for Caliper. Each test needed a new custom design. I agreed to go along and find out more, although I was skeptical. I've been in enough discussions at DAC and other places about whether EDA software can be used to design petroleum plants, industrial wiring, and the like, although microfluidics did at least sound closer to microelectronics. For a start, they both had "micro" in. I forget all the details now, but I think the devices were silicon with etched grooves, and used electrical potential to move really, really tiny amounts of liquids around (picoliters). Then they had electronics on the same silicon perform measurements. They were designing everything by hand in the equivalent of a layout editor. What they wanted was something more equivalent to a place and route system for a few dozen objects so that they could effectively automate creating these devices. Although each different test required a new design, they were use-once, and so potentially the company might ship a lot of devices. Obviously, from a technical point of view, placing and routing a few dozen cells should be easy, but it would require building something special. From a business point of view, it made no sense since they didn't have the money to pay us to do the work, yet they would be the main customer. If you think semiconductor has a limited number of customers wanting to do design, let me introduce you to medical microfluidics. Even if they had deep pockets and could pay us, it would still have made very little sense, since there was little synergy with the advanced-node semiconductor design business. So, on behalf of Cadence, I passed.

EDA and Computation

In retrospect, it was also the wrong end of the scale. There is a sense in which "anyone" can place and route a few dozen blocks. What EDA can do, and nobody else can, is place and route a few billion gates. What other industries consider large, a few hundred thousand things, or even a few million, in EDA we consider small. We had to do that size of design a decade or two ago. Now, a big design is 10 billion gates. A system may be another order of magnitude bigger. When I started work in Silicon Valley, the biggest designs were less than 10,000 gates. Of course, if we ran the Moore's Law curves out, we could see that one day we'd design a chip with a million transistors—but it was simply unimaginable. Now we can design chips a million times bigger than when I started. According to an interview with CNN, an Airbus 380, the biggest commercial aircraft in production, is 4M parts, going all the way down to the rivets. It's not a fair comparison of course, but I'll make it anyway: a modern microprocessor with 40B transistors has 10,000 times as many components.

The scale required has also made us leaders in using, firstly, large on-premises data centers at our customers, and more recently, the cloud. By "using the cloud" I don't just mean the equivalent of running Concur in the cloud instead of on a dedicated server, but scaling very large designs to very large numbers of cores for a single computation.

Another underestimated aspect of what Cadence does is to deliver those algorithms and scalability in a way that is supported. Sometimes, research teams in academia come up with clever algorithms (force-directed placement "springs" to mind). However, in the real world, with real design rules, they often turn out to be clever (aka publishable) rather than practical, and require a lot of work to turn them into something usable by real design teams working on real processes. As I discussed my post Alberto and the Origins of the EDA Industry, even when industry was having success with academic algorithms and software, they also begged Alberto to set up a company (that would end up being Cadence) so that they could get professional support, and the software would continue to be developed when that batch of grad students moved on.

So, oversimplifying, Cadence is an expert at:

Algorithms that operate on unimaginably huge amounts of data
Scaling algorithms to be massively parallel in the cloud
Delivering and supporting software in an enterprise environment

Scaling Up to Systems

A lot of algorithms that operate at the chip level can extend to higher levels. Not just packages and boards, but entire signal traces with all their ugliness, including cables, connectors, backplanes, and whatever else is involved. The basic analysis requires a separate underpinning of 3D mechanical/physical modeling. Every detail of every trace and connector is important, especially at high frequencies. When I interviewed Brad Brin on his retirement, he pointed out that the signal frequencies we deal with now are what he called "microwave" at the start of his career.

From a business point of view, this also makes a lot more sense than designing refineries or labs on a chip, since it is a neighboring space of direct interest to many of our customers, even if not always the exact same groups we are already calling on.

However, even a 3D solver is often not enough. Power and signal integrity is affected by temperature, and temperature is affected by power. As is so often the case in EDA, it becomes impossible to "divide and conquer" the problem in the old way: do modeling, do power analysis, do thermal analysis. Instead, modeling, power analysis, and thermal analysis need to be done together. The "divide and conquer" has to be done in another dimension, partitioning the huge problem so that different cores can solve different parts of the problem, but handling power, thermal, and electrical simultaneously. Furthermore, even the partitioning needs to be done cleverly or the partitioning itself becomes the bottleneck, especially for a large design, because if the partitioning itself cannot be partitioned, it can require a single server with an insane (and expensive) amount of memory.

It turns out that many algorithms are, under the hood, solving large numbers of simultaneous linear equations. Even when the equations are not linear, they are often solved by simultaneous linear approximations. The algorithms involve partitioning very large sparse matrices across cores, and then solving each sub-matrix separately while doing just the right amount of communication between the different cores. Too little, and you get inaccuracy and the wrong answer. Too much, and the cores spend all their time communicating instead of solving, and the number of cores that can be used rapidly hits a ceiling. In fact, it can get so bad that adding more cores to an algorithm can make it go slower. It is the algorithmic equivalent of Brooks' Law: "adding more manpower to a late project makes it later".

3D analysis, finite element analysis, circuit simulation, signal integrity, even training neural networks, are fundamentally manipulating sparse matrices. A "sparse" matrix is one in which most of the elements are zero. For example, to analyze all the electrical nodes on a chip might require a matrix that represents how each node is capacitively coupled to every other one. For a million nodes, that is a million by a million matrix, a trillion elements. But each node is actually only capacitively coupled to a few others. Technically, each node has some capacitance to every other node, but most of those values are so close to zero that they can be ignored—a node only couples to other nodes that are nearby. So most of the elements in the trillion element matrix are zero and so neither need to be represented in memory nor processed. A further optimization is that, often, the relationship is symmetrical: the capacitance between one node and a second is the same as between the second and the first. This means that the matrix is symmetric (across its main diagonal) and so only half of it needs to be stored.

Another simplification is that things take time to happen. At the very least, a signal can't get across a chip faster than the speed of light. When I was at Virtutech, we simulated very large systems with hundreds of processors connected by Ethernet. We could distribute these simulations across a lot of host processors since Ethernet takes time to transmit a packet. If all the processors had to be synchronized at the level of a clock cycle, we would rapidly have reached the ceiling I mentioned above. But we only need to synchronize at the level of a packet transmission time. The granularity is a lot smaller, but similar considerations work at the chip and system levels.

Tomorrow

Tomorrow we'll dive a little deeper, using Cadence's Clarity 3D Solver and Celsius Thermal Solver as examples.

Sign up for Sunday Brunch, the weekly Breakfast Bytes email.

"+ res.PreviousPostTitle); // //NextPostUrl // //Previousposturl // } // }); }); if ( $('.blog-post.nextweb-blog-post .ifrmesrc').length ) { iframeattr = $('.blog-post.nextweb-blog-post .ifrmesrc'); markup = ''; $('.blog-post-content .ifrmesrc').html(markup); $('.blog-post.nextweb-blog-post .ifrmesrc').show(); } -->