What Sort of Bugs Does Portable Stimulus Find?

17 Feb 2017 • 3 minute read

In a recent blog post, we discussed some general concepts of bugs, problems, issues, and features. We gave examples of different types of bugs typically found during the functional verification of chip designs, and made the claim that “portable stimulus and software-driven verification are excellent at finding all the categories of functional bugs.” In today’s post we’ll back up this claim by presenting some examples of bugs found by our customers with our Perspec System Verifier product.

Many of these bugs are due to scenarios that seem fairly simple once they’re known, but which were never anticipated or written explicitly by the verification team. The automatic generation of realistic use-case tests by Perspec often includes legal scenarios that no one had ever considered before. As an example, one SoC design had a bug in which two processors could be granted “exclusive” access to the same memory region. The code running on each processor assumed that it had a memory region that would not be changed by anyone else.

If course, this was a problem. If a processor wrote a new value to a location already in use by the other processor, the result would be data corruption. In the well-documented test generated by Perspec from the portable stimulus model, the corruption was detected as soon as the first processor read back an unexpected value. Our visualization showed the sequence of operations preceding the error, so it was easy for the user to find the source of the bug. Had data corruption occurred while running production software, the effects might not have been observed for thousands of cycles and tracking down the root cause would have been very hard.

Portable stimulus is especially effective at finding design bugs related to processor, memories, caches, resource locks, and coherency. Generation of multiple programs and multiple threads that cross all processors and all memories is often a good starting point. The verification problem becomes even harder when the design includes low-power features such as turning regions of the chip on and off. For example, critical data must be retained so that it can be recovered after the memory or registers in which it was stored are powered back up.

One customer found that data was not retained when one of the CPU subsystems had its power turned off and then resumed. Again, the bug was detected as soon as the read yielded an expected value so the debug process was quick. The same user also found a firmware bug because the portable stimulus model called some routines that were used both for functional verification tests and for writing production drivers. As we mentioned in another recent blog post, this is an example of hardware-software co-verification facilitated by portable stimulus.

Verification complexity increases even more when interrupts are in the mix. By their nature, interrupts occur at random times relative to other events, so bugs are hard to trigger and even harder to diagnose. Since Perspec-generated tests run on bare metal, they have fine-grained control over interrupts and other hardware features shielded from the users by operating systems and other levels of production software.

In a third design, using the ARM big.LITTLE architecture, a bug occurred when the big cluster was going through a power-down or power-up sequence and the little cluster received an interrupt. The interrupt caused the little cluster to go into an abnormal state due to an RTL design bug. The portable stimulus test that found this bug was emulating one scenario of the Global Task Scheduling (GTS) software used to control the chip in production, but triggering the bug and diagnosing the root cause much easier by running a focused portable test than GTS.

Our final bug example for today also involves the interaction of interrupts and power. In this SoC, if a processor changed from normal to standby (low-power) mode while its wake-up interrupt was active, it would never wake up again. Releasing and then activating the wake-up interrupt had no effect. This bug was triggered by a test that changed the interrupt release time randomly while powering the processors up and down.

There are common themes to all these bugs. They involve unpredictable interactions among different parts of the design, they would take a long time to show up with production software, and they would be tough to diagnose and fix without the focused test and visualization capabilities of Perspec System Verifier. We invite you to check out our industry-leading portable stimulus solution to see how it can help verify your most complex designs.

Tom A.

The truth is out there...sometimes it's in a blog.

Subscriptions

What Sort of Bugs Does Portable Stimulus Find?