Qualcomm: Bring Me My Snapdragons

15 Apr 2019 • 5 minute read

I'm writing this before the Game of Thrones season opener last night. But since last season ended with an undead dragon demolishing the wall, I'm pretty sure dragons will play a big part.

More relevant to semiconductors is Snapdragon, Qualcomm's name for their mobile application processor series. At the recent CDNLive Silicon Valley there was a presentation by the SoC implementation lead for Snapdragon designs, Pagad Patibanda. His talk was titled FE-PD Correlation—Auto Floorplan and Innovus Implementation System-Based Placement in Genus Synthesis Solution. This was a look at their experience using Genus on the latest generation of designs, and a couple of methodologies that they worked out jointly with Cadence to address the issue.

Then, in the following session, Cadence's Genus R&D lead, Chuck Alpert, talked about the upcoming 19.1 release that incorporates similar ideas more directly (he has the source code, so can accomplish things that are impossible for a user of the tool). I'll cover what he said tomorrow.

Synthesis

Until the mid-1990s, synthesis was done with wire-load models that would estimate the delay due to interconnect based on the fanout of the net and how big the block was. Resistance was a second-order effect, and the overall capacitance could be estimated fairly accurately. Placement was never done at the synthesis stage. There was a clear separation between front-end designers, who knew all about RTL but knew nothing much about physical design, and back-end designers who know lots about physical design but not much about RTL, since the RTL was used unaltered during physical design (except perhaps for buffer re-sizing). When physical design was done, the cells would get placed and routed.

When I was at Ambit, we worked on the first synthesis product that took placement into account, PKS. Place and route have always been a lot slower than synthesis, so the big challenge of a product like PKS is to produce a result that is close enough to the final timing that will result after physical design is complete, but that runs a lot faster. It's a difficult tradeoff. If the results are too approximate to be useful, then design groups have to run physical design to get accuracy. If the runtime is too close to physical design then why bother. Another non-obvious question is how much to seed the placement for physical design using the earlier one from synthesis, or whether to let it "rediscover" much of the placement. At Ambit, we didn't have an industrial strength placer nor our own physical design, so it was impossible to do what Genus and Innovus do today and have a common placement engine. But even with a common placement engine, it is unclear what the methodology should be.

As resistance continued to increase in importance, and SoC designs grew hugely in size and included lots of IP-blocks and not just standard-cells, this correlation problem became harder to handle. As Pavan said in his presentation:

SoC FE development is very much interlaced with floorplan, it takes multiple iterations to reach quality. Synthesis netlist drives the floorplan activity and so DEF used for synthesis is not final in most cases. Synthesis netlist quality has major impact on macro and port placement.

He didn't say how big Snapdragon SoCs are; I suspect it is closely guarded secret. But they have:

Very large instance counts
Large numbers of clocks
Complex power domains
Multiple hard-macros (HMs)
Shifting floorplan

Initial Stage and Genus Early Prediction Flow

In the initial phase, a lot of things are incomplete, missing, or wrong. The RTL will be dirty, there may be no initial floorplan, there may be missing physical data. The constraints may only be 80% of functionality. If it is a new technology, there may be new challenges to address so that even pre-existing netlist can't simply be reused. The challenge is to get a quality netlist without too many iterations.

Qualcomm uses the Genus Early Prediction flow in the first phase. This handles missing LEFs (physical library data), missing LIBs (timing data), LEF/LIB mismatch, unplaced ports, and macros. Their experience was a good correlation between initial auto floorplan and final P&R floorplan, once they have clean inputs.

Final Stage and SHEP

In the final stage, you need to get to a final netlist, and that has to correlate with P&R even after DFT insertion and other disruptions. For this, Qualcomm and Cadence came up with what is known as the Genus SHEP flow, which stands for Spatial High Effort Placement.

Although Genus and Innovus use the same placement engine, in synthesis things are done more approximately, or low effort. In particular, the placement is not legalized so there can be overlapping cells and other issues that will eventually need to be sorted out. The SHEP flow uses the Innovus placement so that it correlates even better with the final physical design:

Legalized placement using Innovus
Genus-based optimization and buffer insertion after Innovus placement
Fully placed new design elements related to DFT
DFT port placement resolving DEF vs. design mismatches
Ended up with tight correlation with P&R timing and congestion
Feeding the placement forward when physical design is done can reduce the number of trials

For example, the images below are the Genus placement after SHEP on the left, and the final Innovus placement on the right. These were done with bounds, or regions, so it is not simply that both designs are over-constrained in the same way.

The routing congestion hotspots correlate well too, as do which paths are critical and the slack values, as you can see in the tables below (the actual timing values have been redacted) but you can see the slack values.

Conclusions

The auto floorplan flow helped in crossing the hurdles related to incomplete data at the initial phases of the project
The SHEP flow feature improved the correlation with PD runs
Both the flows helped in identifying design issues upfront
Dependency on physical design P&R feedback is significantly reduced in the initial phases
Improved overall turnaround time by reducing feedback cycles between front-end and physical design

More Dragons

Qualcomm's commercial from just last week for Snapdragon, featuring...yes...dragons (1 minute).

Sign up for Sunday Brunch, the weekly Breakfast Bytes email