Get email delivery of the Cadence blog featured here
This is my last post about DAC 2020. During DAC Accellera had a workshop about functional safety. In case you don't know, Accellera has a relatively new working group (WG) on Functional Safety. The chair is Cadence's Alessandra Nardi, who coincidentally also received the Marie Pistilli Award for Women in EDA during DAC (you can read more about that in my post Alessandra Nardi Receives Marie Pistilli Award for Women in EDA). But functional safety is, for sure, not all about Cadence.
As you can see from the above timeline, the provisional working group (PWG) was created in October last year. There was a face-to-face kickoff in December (with 36 companies and 90 members). The working group was formally formed in February with 26 companies and 89 members. There were presentations scheduled at DVCon that I planned on reporting on—indeed, I was already in the room—but Cadence pulled all employees out that morning. There was a virtual kickoff later that month. The plan is to produce a while paper in Q3 ready for public review in Q4.
The mission is summarized in the above statement. There are really two parts to improving automation for functional safety development, standardizing what safety data means, and defining a language/format for exchanging the data. Alessandra put it to me that "We want to do for a functional-safety-aware flow what CPF/UPF did for a power-aware flow".
This working group does not operate in isolation. Work has gone on in various areas of functional safety for decades. If you want to read the interesting history, then start with my post "The Safest Train Is One that Never Leaves the Station" from IRPS a couple of years ago. Other relevant standards related to functional safety are:
The plan is that eventually, the work of the Accellera WG will flow into the provisional IEEE P2851 Exchange/Interoperability Format for Safety Analysis and Safety Verification of IP, SoC, and Mixed Signal ICs standard.
The diagram above shows the concept in more detail. In the center is the functional safety (FS) data. This is required for the system level, silicon level, IP (within silicon level), and the EDA flows. Moving data up and down that stack is known as "inter-layer". At the same time there are aspects such a FS architecture or FMEDA (failure modes, efffects, and diagnostic analysis) that cuts across all the layers. Moving data between thiese domains is known as "intra-layer". The initial focus is on automotive and industrial, but the goal is to expand to medical and aerospace, too.
Following that brief overview, there was a panel with Arm, Renesas, Intel, NXP, and AMD. You can see the names of the panelists above. Each panelist started with a few prepared slides, and that was followed by a Q&A. I'm going to skip the prepared slides and jump straight to the less scripted part of the morning.
One thing that several panelists pointed out was that the electronic systems are getting more complex, with more complex processors involved. This is obviously the case in automotive, with both ADAS and autonomous driving requiring large amounts of processing power. But it is also the case in avionics, which are both switching to multi-core processors and from specialized processors to COTS (commercial off-the-shelf) processors. So like everywhere else in electronics, systems are getting more complex, and not just the electronics but the software and the interaction between hardware and software.
One piece of terminology, just in case you don't know it, is FIT. This stands for "failures in time" and is the number of failures per billion hours of operation. With semiconductors, there are two big FIT challenges. Firstly, the FITs allowed for a chip (say 10 FITs) is much less than the FITs of the basic semiconductor process (maybe 500 FITs). Secondly, when the FITs for the top-level (say, the whole vehicle) are flowed down to all the components, the FITs allowed for each component might be very low since chips are pretty much the lowest item on the totem pole.
Ghani Kanawati of Arm looked at things from the IP point of view. Future SoC designs that are used in several FS products are very complex designs and need a different approach for non-safety-critical products. Safety analysis for advanced many-core SoCs is at least an order of magnitude more complex than for an 8-bit MCU (common in automotive even today). We need to guarantee that safety analysis information produced at the lower level of integration is sufficient to complete the safety analysis at the higher level of integration (so IP to SoC to system).
Ricardo Vincelli of Rensas looked a lot at integration and customization. He used another bit of specialized ISO 26262 terminology: SEooC which stands for Safety Element out of Context. This basically means a component that will eventually end up in a safety-critical product. Mostly just think IP. So there are two levels: the SEooC supplier's job is to povide accurate results based on an assumed safety concept. The integrator's job is to adapt the results to reflect the actual safety concept. The raw FIT number used by the component supplier must be adjusted based on the specific strategy adopted by the overall system and the expected mission profile (duty cycle, lifetime, etc). The FIT provided by the component supplier will most likely not be enough. Faults can be classified into three categories: directly violating the safety goal, violating only in combination with other faults, or never violating (so good).
Next up was Jyotika Athavale of Intel. She took a look at functional safety from the perspective of aviation. She led off with the above table that compares the various standards for industrial, automotive, and aviation, along with the names used for the safety levels in each (learn them well — there will be a test!). The severe one is DO-254 DAL A certification. One big problem area for aerospace is what are called single-event effects. These are atomic particles that disturb the chip, either flipping a bit temporarily, or possibly causing something destructive such as latchup. This is much worse in avionics since you are in the air (and worse again for space since you have zero protection from the atmosphere). A big challenge in complex systems is working out the worst-case execution time in the face of shared memories, bus contention, multi-core, and so on.
Q: How useful are virtual platforms?
Riccardo: They are mainly useful for concept and starting integration. There are accuracy limitations. For fault injection, they can be used to check interactions, but I wouldn't use them for coverage.
Q: What will be in the paper at the end of Q3?
The Q&A turned into more of a discussion among the panelists, but I didn't really note down who said what.
It is a big opportunity to bring all the EDA stuff into a safety-aware flow. But we need some common languages, for example (when talking to each other, not file formats). It is a work on methodologies, too, not just a format. Some methodologies don't use all the technology we already have.
A hot topic is configurability and how it affects interlayer and intralayer communication. For example, when you do your design and it is built on one safety concept, but then it is used in a context where the safety concept is different. The standard says you have to do the analysis for that use case. In practice, the simplest is to redo all the analysis over again, but obviously that involves a lot of wasted effort. Think of a modern case where there may be 100 to 200 IP blocks and we may have to repeat all the analysis of each one. It is unclear who is responsible for what, and that is one of the things this working group can help clarify.
Q: FIT and SPFM (single-point failure metrics) can vary a lot based on usage. So we need to combine information. But does that require too much confidential data?
Bala: This is where we need to standardize across interlayer (within IP to another IP). When we calculate the data we have everything but we need to consider what is internal and what is external.
Franck: ISO 26262 is mostly hardware but we need hardware and software together and the capability to transfer between them. Recovery on the fly, for example, is hardware and software together.
Ghani: Machine learning adds another layer since nobody can prove the DNN is correct by construction. We can only do it with respect to a particular training set. You can't just claim you have a machine learning network that is safe since you don't know what training will be done on it.
Q: How does traceability start and end?
Bala: Traceability sounds simple, but even with just IP-level traceability, we have safety goal at the top, and requirements come from the goal, and it gets broken down. We must verify that we have implemented what we wanted to implement. Then make sure results are captured so that if some auditor says “how did you develop the test case”, we need to be able to find it.
With that, we ran out of time.
Sign up for Sunday Brunch, the weekly Breakfast Bytes email.