Computation and the Data Explosion

31 Oct 2019 • 6 minute read

Everywhere you look these days, you get told that there is a data explosion going on. A Boeing 787 generates half-a-terabyte of data per flight. Facebook has 2.5 billion users. Five billion videos are watched on YouTube every day. This impacts the semiconductor industry directly since it is only with SoC-based systems that this volume of data can be created, transmitted, and processed. In fact, there is so much data, that it is simply impossible to transmit it all—there is not enough network bandwidth, and there is not enough cloud compute and storage to cope with it. This raises the semiconductor stakes even more, since increasingly computation must be done at the edge of the internet and not in the big data centers that we also read about daily. A security camera on your home simply cannot transmit all the video for processing elsewhere. The camera itself has to decide who is the thief and who is your child. Or at the very least, who is a person, and what is just the shadow of the leaves rustling in the wind.

Even doing a lot of computation at the edge, the need for increased connectivity speeds and bandwidths is insatiable. At the smartphone level, we are in the middle of the transition from 4G to 5G. In perfect conditions, 4G delivers 1Gb/s, whereas 5G will deliver 100 times that amount. In the data center, 10Gb/s and 25Gb/s Ethernet are giving way to 400Gb/s Ethernet. Of course, all of this would be impossible without modern semiconductors to handle all the signal.

The next thing that it is impossible to avoid hearing about almost daily is artificial intelligence. Home assistants with voice recognition, increasingly powerful advanced driver assistance systems (ADAS), processing of photographs on phones that smartly deliver images as good as professional cameras. Medical X-rays are now apparently processed better by AI algorithms than by experienced radiographers. Even traffic signs are recognized better by modern cars than their drivers.

All of this is nothing without security. That’s another thing it is hard to avoid hearing about. Another million accounts were stolen from some company that you’d think would know better. It is generally accepted in the security world today that security has to be built up in layers starting from a hardware root of trust. Once again, the onus is on the semiconductor industry to provide the foundation of security, and usually also to provide efficient processing of the cryptographic algorithms.

Obviously, the precise setup of computation and communication depends on the application you examine. But increasingly the model is to have edge devices, such as smartphones and IoT “things”, that have both artificial intelligence for detecting what information needs to be communicated up into the cloud, along with high-performance communication links. Between the edge devices and the cloud are advanced communication networks such as 5G and high-speed fiber optics. Up in the cloud, where all the data is aggregated, massive communication can be brought to bear on-demand. And all of this has to take place in an environment with nation-state-level security, from the edge devices, through the network, to the datacenters.

This delivers a combination of intelligence, high-performance compute, high-performance networks, and security. That gives us an environment that keeps us safe and also delivers wonderful experiences, many of which have not yet been imagined.

Those of us involved in semiconductors are tasked with making this vision a reality.

Optimization

All system designs are about optimization and performance. If you didn’t care about performance (which might be speed, power, cost, or some other metric) then you would just write some software, which is its own form of optimization, just directed towards ease and speed of creation.

Moore’s Law is running out of steam in the sense that process nodes are getting more expensive per transistor, not cheaper as in the past. However, the nodes still keep coming, enabling larger and larger systems to be constructed. At the same time, processor architects have run out of ways to improve processor performance. These trends are combined in the graph above, from Hennessy and Patterson's book on computer architecture.

These two trends together mean that silicon is increasingly optimized for a special purpose, even when it contains processors. The processor have to be optimized for the purpose, too. Otherwise, they waste too many resources. This is especially true in edge devices such as IoT things and smartphones, since they have limited resources and don’t need to be so general purpose. It is less so in cloud data centers and smartphones, since they are by nature much more general purpose. But even there, provision for accelerators, be they GPUs or FPGAs, is widespread, since the main processor is not well suited for highly specialized tasks such as training neural networks.

Another result of the slowing of Moore’s Law is the trend towards using advanced packaging of multiple die, rather than integrating everything onto a single die. This is often called More than Moore. Since each successive node is more expensive per transistor, and since some transistors don’t need the performance of the most advanced node, it makes sense to hold back and only put the most critical portion into the more advanced and expensive node. This applies especially to analog, RF, and silicon photonics, which don’t benefit from the more advanced node (they don’t shrink) and, in many cases, are much harder to design, which is a drawback to moving. A good example is the AMD Zen2, in the above diagram, which consists of nine chiplets for the server configuration or three for the edge configuration.

Building a high-performance system in an advanced package out of multiple die is not simple, since all the communication between the parts of the system is usually running at very high speed and needs to be correctly analyzed. This puts a premium on doing full system analysis. It is no longer adequate to analyze each component separately and then somehow stitch those analyses together. It simply is too inaccurate or requires too much guard-banding.

Designing a system increasingly starts at a higher level than the silicon itself. This is most obvious in machine learning—nobody creates a neural network by writing RTL. There are systems for designing and training the networks, the two most widely used today being TensorFlow and PyTorch. These environments can be used for training the networks, and then the trained network can further be optimized and reduced to a form that can be used for inference in edge devices. Typically, all training is done by 32-bit floating point, but inference is often done using 8-bit fixed point or a special 16-bit floating point format known as Bfloat. Implementation is usually done using specialized reconfigurable processors that can be adapted to precisely what the design requires, and then automatically configured from a chain starting from the training network at the TensorFlow/PyTorch level.

Signal processing, and other advanced numerical algorithms, usually start from MathWork’s MATLAB. Again, there are automated processes to take a design, typically in floating point, and optimize it for silicon implementation. For the most optimal solution of all, although giving up flexibility, the output can be run through high-level synthesis (HLS) to produce RTL and, eventually, to an optimized block on the chip. However, usually software flexibility is too important to give up, and a specialized reconfigurable DSP processor is used as the implementation fabric of choice.

Security goes in the other direction, starting from specialized hardware such as physically unclonable functions or isolated safety processors, and building up to secure operating systems, and then to certified application code for automobile control or avionics. This is a tight mixture of software, semiconductor structures (usually called IP), and EDA processes (such as keeping one part of the design physically isolated from another to prevent a problem in one being able to contaminate another).

This is the first part of a series of three posts about computational software. Part 2 will appear tomorrow.

Sign up for Sunday Brunch, the weekly Breakfast Bytes email.