CDNLive EMEA Eins

17 May 2017 • 7 minute read

Every CDNLive has a little bit of a different structure. At CDNLive EMEA in Munich, we start with Monday lunch, giving people the opportunity to fly in from other parts of Europe. The first day is has presentations, some by Cadence but mostly by customers. The big day is Tuesday, which opens with two keynotes, then more presentations in parallel tracks, and finally a big event in the evening. Wednesday is mostly Cadence presentations of a more tutorial nature. The event even has its own flags, with both a Cadence flag and a CDNLive flag flying in front of the hotel.

Being Europe, one of the big themes is automotive. I have remarked before that La La Land set the theme for 2017 keynotes. It is a love story but it famously opens with a number starring lots of cars. It doesn't matter what semiconductor event you go to, the keynotes seem to rapidly be about cars. Cadence's Tom Beckley talked about lots of things, but he really wanted his flying car. NXP's Davide Santo's keynote was entirely about automotive. In fact there is so much automotive at CDNLive that I will group a lot of it into another post. Today I'll cover some non-car stuff.

Tom Beckley Wants a Flying Car

"Alexa, take me to Frankfurt," was something that Tom Beckley can feel is coming close, although he's not that happy about it, since he likes to drive. However, "Alexa, take me to San Francisco," is a lot further off. Almost everything he watched on the Jetsons has come true: smart watches, facetime, robots in the home, news at any time, online medical records. But every episode included a flying car. He's still waiting.

But while he waits, autonomous vehicles are expected to take automotive semiconductors from $31B to $110B between 2015 and 2025. Every semiconductor company sees opportunity in driverless cars.

Tom gave an overview of a lot of the Cadence product portfolio. I won't reiterate it here. He did make two announcements of new products:

JasperGold Apps for RTL signoff that I wrote about yesterday under JasperGold: Stepping up to RTL Signoff
New integration between MATLAB and Virtuoso
While not brand new, he also reminded us all of the recent announcements of Xcelium for parallel simulation...
And Pegasus for next-generation cloud-ready physical verification

But more interesting were the hints he dropped about bringing together Virtuoso, Allegro, and Sigrity. He said there would be more announcements soon. "While my chip is still open, I can pull in package and board parasitics. It's not just about the chip, it's about the product."

With Moore's Law running out of steam—"We're at 7nm, soon 5nm, we'll go to 3, probably 2, maybe 1, but that's it"—wafer-level packaging. Another hint of a new solution across package-chip-board for thermal analysis including computational fluid dynamics at the system level. "Think about the thermal requirements for that camera behind the rear-view mirror. Extreme heat in summer. Cold in winter. No fans."

Tom said he started his career at General Motors for a couple of years before he moved into EDA. Back then, new model car development took 10-12 years. But now the automotive people are challenged by the two-year development cycle of the new entrants. They need ECUs to be smaller, lower power, cheaper, more reliable, more highly integrated. The data rates and the performance are a lot higher than they are used to.

Wireless is going to be a big thing in the vehicles, replacing the wires. Drive-by-wireless will reduce weight, increase reliability of connections, and reduce cost, especially installation cost. But it is another technology that neither the OEMs (car manufacturers) nor their suppliers currently have mastered.

Tom wrapped up with Robert Frost's poem about the road less traveled, encouraging the CDNLive attendees not to settle for ordinary.

High-Level Synthesis

Andries Hekstra of NXP Semiconductors talked about 25 Best HLS Coding Practices and his experience with Stratus. He admitted at the end that he couldn't count and he had no idea how many practices he actually covered. I will not even attempt to cover everything he did, just hit what seem to me to be the most important points. He started off by saying that there is a lot of skepticism about HLS. In fact one person tried to tell him that you certainly couldn't use it for anything latency dependent like a Viterbi decoder—but Andries's first successful use of HLS was...yup...a Viterbi decoder.

Number 1: Don't start without a clear specification—avoid shooting at a moving target. This is true even though an HLS design can be adapted more quickly to changing requirements since the tool fills in all the details.

Number 2: Use blocking handshake communication thorughout the IP and testbench. And not just for scalars, but arrays can be passed and transferred in a single cycle. Tuples, too.

Number 3: Write the testbench as "unsynthesized design" in SystemC and avoid scripting languages. Make the IP plus the testbench a monolithic SystemC program.

Number 4: Use store and forward since just having long delays and feedback is complicated since you need to add FIFOs on the feedback paths to handle delay.

Number 5: Prefer one thread per submodule so that "per submodule" is the same as "per thread".

Number 6: With HLS, you still need to think about architecture. The best approach is to create the IP in small steps with some testing after each step. Never be far from a working state. Avoid having to simultaneously be correcting several things at once. Don't waste time in the middle optimizing QoR since the design will probably change too much.

Number 7: Define macros that you can switch from floating point to various kinds of fixed point. Get the algorithm working with floating point first, and debug without concern for overflow or underflow. Then use those reference values for ideal behavior to make the fixed-point version.

Number 8: “Hurrying is less effective than it feels”. Prefer simple solutions and readable code.

There is a pragmatic side to being effective with HLS that is not covered in public literature or the documentation.

Reducing Area When You Can't Just Shrink

Luca Mattii has been working with imec and various people at Cadence while doing his PhD to look at how to get design done effectively at what imec calls iN7. The process has CPP at 45nm and MM at 32nm, so it is not exactly the same as any other process but should be close to what industry will select for 5nm. The big challenge is that scaling the poly pitch too much loses too much current drive so you can't achieve high frequency. Scaling the metal too much pushes interconnect resistance up too much and it increases superlinearly, so is really bad.

Also the litho for the black region in the graph above is really expensive to achieve, requiring advanced litho like SAQP for both poly and metal. The solution is not to push for pitch scaling alone.

The big win is to reduce the number of tracks in the standard cells from 9 to 7.5 to 6 tracks. But the intracell routing gets harder and pin accessibility becomes more challenging. However, for small cells you get the theoretical reduction in area (the cells don't get longer).

The above chart shows other scaling boosters:

Single-diffusion break
Self-aligned gate contact (SAGC)
m1 and m0 open for routing
Vertical power distribution (instead of wide horizontal power buses that consume too much space leaving only a couple of tracks open for routing)
Porous cells (add two empty tracks to long cells so that the router can get through, even though the cell is bigger)

This basic idea, of not just relying on process scaling, is known as design technology co-optimization (DTCO), taking a holistic approach and optimizing all three of technology, design, and EDA.

If the cells are made smaller, since pin access gets harder, the area gains at the cell level don't necessarily result in chip area gains. The graph above shows the improvements from reducing the tracks to 6, then using a vertical power delivery network. Porous cells don't result in an increase in scaling at this point. Next, tighten the power deliver network and it is a big hit on density, but this time porous cells recover it. With single diffusion break and SAGC, there is a further 35% area gain. So putting it all together gives a full node of gain compared to the initial scenario. They were worried that performance would suffer with 3 fins per device going down to just 2 for the 6-track libraries, but it also reduces the pincap so it turns out not to be a big performance loss (Dennard scaling lite). Note that this full node of improvement is over and above anything done with the actual process—this is purely taking the basic lithography of the process as given and optimizing the cell architecture and adding some of the other process boosters that add some process steps, but do not assume any heroic changes to manufacturing.

Part 2

The second part of coverage of the non-automotive parts of CDNLive Europe will appear tomorrow. All the presentations will eventually, typically after a couple of weeks, appear on the CDNLive EMEA website.