DAC Wednesday: Verification Lunch, Books, and Bagpipes

6 Jun 2019 • 10 minute read

For my coverage of the first two days of DAC, see my posts DAC Monday: Gaming, IoT Security, State of EDA Industry, Mixed-Signal Lunch, Cooley's Troublemakers, and DAC Tuesday: Thomas Dolby, the View from Wall Street, AI Lunch, Denali.

Verification Lunch

Normally, the Cadence lunch on Wednesday is the analog/mixed-signal one, but with the announcement of Spectre X on Monday (see my post DAC Tuesday: Thomas Dolby, the View from Wall Street, AI Lunch, Denali for details), we did that on Monday, too. So Wednesday was the verification lunch. Brian Bailey of Semiconductor Engineering was the moderator and the panel was:

Tran Nguyen of Arm
Raju Kothandaraman of Intel
Dale Chang of Samsung
Paul Cunningham, who runs Cadence's verification organization

I don't think any areas of verification were off-limits, but the title of the panel this year was Optimizing Verification Throughput for Advanced Designs in a Connected World.

Brian started by admitting that he'd been in verification since 1981. Back then we had one language, Verilog, and one tool, simulation. Today we have a plethora of techniques such as emulation, formal, portable stimulus, but still RTL using Verilog. It's a nice point, but actually, Verilog was invented a couple of years later in 1983/4.

Brian started off by asking how we define verification throughput, since there is "No point in running fast to the wrong goal."

Paul said he liked to be ROI-based, and look at "how much value am I getting from this time and this resource."

Tran said that verification is an infinite challenge, but one thing that never changes is that we do a design in 9-10 months. So there are more machines and more technologies, but never more time. "You can run a lot of cycles, but how do you know which are effective and help you find the bugs, and which ones actually give you some confidence that your design is correct."

Raju said that they have a holistic picture of verification from the start of the design to post-silicon. He said Intel uses four metrics: build, resource utilization, runtime, and time to root-cause. "As verification engineers, we pay attention to all four."

Dale agreed and said that in his world the key is how many jobs you can submit to emulation so that the debug team have enough to work on. "The high-level goal is how many bugs we remove." There's a need to auto-triage the failures, and rerun failing jobs with debug methods. "In emulation, everything takes time, so you want to make sure your engineers are not sitting there waiting for output.'

Next, Brian asked Paul how to measure throughput.

Paul said he's a big believer that you can't optimize what you can't measure. But there's not just one way to measure. Raw throughput is a very basic measure, the performance of the car rather than the driver. Higher-order measures are things like time to root-cause a bug, or how much compute resource it takes to root-cause a bug. You can't just say one of these is more important than the other, it's the car and the driver.

Tran agreed, and pointed out that each technology has strengths and weaknesses and we want to reuse the testbench as much as possible across them, and use automation to gain throughput efficiency.

Paul echoed this, saying that he had talked about raw performance and the smarts, the car and the driver. His team has three legs to the stool. The smarts, and the raw power, for sure. But also different levels of abstraction from gates to software. There are different tools for different jobs. Going back to his car analogy, "if you want to go offroad, use a Landrover, on a racetrack, a Ferrari."

Brian wondered how people keep up with overall methodology since the tools and their capabilities are constantly evolving, so which tool is most appropriate changes, too.

Raju said that it's important to keep ahead of this and collaborate with EDA vendors to have an idea where the roadmap is going and what is coming, then work closely to optimize the tool stack. "I'm exposed to all verification levels, so collaborating with EDA is one of the best ways to gain throughput."

Tran said that in the past they would use coverage but that's no longer enough and they are using finer-grained analysis. "We have a lot of data scientists working on how many seconds for a run, which areas of the SoC, from one design to another…this all needs to be analyzed with machine learning. The tools just hand you the data. Then you can be efficient."

Paul said that formal is a great example. "A lot of stuff we do today with dynamic verification can be done better with formal. 18 of top 20 semiconductor companies use JasperGold but it’s still early days for formal. You can’t optimize what you can’t measure and we need to understand when to change the driver and when to change the car."

Brian wondered if debug is getting worse. In surveys, the trend seems to be going in the wrong direction. But that might be that the verification task has expanded so much beyond just debug, and we're worrying about power, and safety, and security.

Raju agreed that the verification scope has become bigger. "Paul keeps talking about the car, and one day we'll have autonomous cars." There is a lot of scope for debug improvements, such as having a converged debug methodology from designers to emulation.

Tran said that at Arm it's no longer just debugging a block of IP, it is up to debugging systems of systems. "We can predict some scenarios, and use some random...but we are talking about verifying unknown stuff."

Paul said the opportunity for AI in the debug space is very significant, driven by the analytics we're collecting. "We don't just want to look at one test, we want to look at a lot. And don't just look at today's runs but yesterday's, too. There are also different aspects, not just how fast a tool runs but also how fast it runs when you turn on waveforms. "They need to be fast in the debug context not just the regression context."

Brian said he had enough questions to last until DAC 2021 but he let the audience ask some.

The first question was about how to keep everyone on the same page when the spec is changing. Sometimes there is a gap where the design engineer and the verification engineer have different takes on it, creating false faults, and wating time creating false stimulus.

Raju agreed that this was a great problem statement. Documentation becomes stale in no time, and different teams are interpreting it differently. Frequent signoff criteria at least enables users to keep specs up to date as the design matures. Another concept that is evolving is portable stimulus. "However, there is no single recipe to stop the documentation going out of date."

Dale said that one thing that helps is different levels of verification vehicles. They had had a problem where the device driver level was covering up a bug at the low level, and it was only when they looked at another level that they saw it.

Paul said there are lots of tools, like smart linting, that help catch common mistakes. Anything standardized, like protocols, has verification IP (VIP) and whole testbenches can be available for the whole block.

The next question was whether the panel used coverage-driven verification.

Raju said they use quite a bit of functional coverage and code coverage in all their verification flows, and it's a key signoff on tape-ins. "Its a challenge we face is that in the graphics domain our legacy coverage has increased exponentially. We have redundant tests where we don’t really know what they do. Investment in smarter coverage to optimize the tests is what we are working on."

Paul agreed and said that coverage-driven is very central to the methodology Cadence recommends. "What are you covering in simulation, what in formal, combine everything into a single dashboard. Coverage-driven is global now, everyone is doing it."

A question on using AI and machine learning. Is there a standard way for the industry to share this? How do we evolve?

Tran said that AI is something new and "we're still learning and there's no standardization. We put a lot of data into play but some is unique to the project: lines of code, cycles, bugs. We then use different algorithms to predict and guide, but it is still very specific to each project. At some point we need to move to something more standard and have proper tools."

Paul said that more and more data gathering is happening. In Cadence, this is pulled together in vManager, but it's still a bit too early to standardize across the whole industry. "We can gather data but what can we do with it? What are the killer apps to improve our verification throughput? I think within the next couple of years we’ll see more standardization."

Brian wondered if the cloud was a place to collect all this data.

Paul's first thought was that they are independent. Yes, you can put stuff in the cloud, but you can do this not in the cloud, too. Tran agreed that the cloud is great for effectively unlimited storage, but it's just a storage technology. What's important is how to extract knowledge from the data.

Brian went back to Paul's analogy of car and driver and pointed out that there's a third level, the software. A lot of algorithms for things like power or caching are captured in the software in the middle.

Raju said software is huge in his business of graphics. "Having the graphics drivers ready at the time of tape-in is part of our methodology."

Dale agreed. "Mobile has a yearly cadence. Android and Linux boot running in an emulator if a requirement for tapeout. In a GPU the kernel and driver play a critical role and a lot of bugs happen in the communication between firmware, RTL, driver, and kernel. It's hard for top management to make a decision on tapeout as a result."

Tran said they can address small areas of applications but can't yet run a complete application, even on FPGA. "Software verification has a lot of challenges too."

Dale said that software is usually developed on a model, an then jumps to an emulator with the RTL. "But that's very late in the cycle. The software team won't jump with unstable RTL."

Paul pointed out that the software spend in terms of manpower is huge and growing faster than the hardware spend. "You want to shift-left and do as much software bringup as you can pre-silicon. So that is now intersecting with the hardware validation."

And with that, Brian sent us back out into the bright sunlight to find our way back to the conference center.

A Year of Breakfasts

On both Tuesday and Wednesday afternoons, I was giving away A Year of Breakfasts. I took over the desk from where you had to pick up your wristband for the Denali party by noon. Well, lots of people completely ignored that advice and so I was also doing wristband duty too. By the end of the last afternoon, I got tired of sitting in one place, I took a pile of books around the show floor, along with my sharpie and lead-scanner. If too few people are coming to have books signed, I'll take the books to them.

A Year of Breakfasts 2018 has been successful enough that you can look for the 2019 edition at next year's CDNLive conferences and at DAC next year (co-located with SEMICON West).

Bagpipes

The DAC tradeshow traditionally ends with bagpipes. This tradition started with HP when they came to DAC to sell workstations, then, when they stopped coming, Forte took up the mantle. HP had used a big sound system but Forte got real bagpipers. Cadence acquired Forte and we've continued the tradition. I had a plane to catch and I decided not to stay to take photos of the bagpipers. But I left the convention center, and on my way to the monorail station, I came across this year's bagpipers warming up outside the deserted south hall of the convention center. So I got to see and hear the Forte bagpipes anyway. I introduced myself and told them the history of why we close the show with a couple of Scottish bagpipers.

And so, like DAC itself, this series of blog posts ends with bagpipes!

Sign up to get the weekly Breakfast Bytes email: