• Skip to main content
  • Skip to search
  • Skip to footer
Cadence Home
  • This search text may be transcribed, used, stored, or accessed by our third-party service providers per our Cookie Policy and Privacy Policy.

  1. Blogs
  2. Breakfast Bytes
  3. CDNLive EMEA Zwei
Paul McLellan
Paul McLellan

Community Member

Blog Activity
Options
  • Subscribe by email
  • More
  • Cancel
CDNLive
virtual platform
CDNLive EMEA
Bosch
CDNLive Munich
OrCAD
ARM
arrow
Breakfast Bytes

CDNLive EMEA Zwei

18 May 2017 • 9 minute read

 cdnlive logo breakfast bytescdnlive teeThis is the second post about CDNLive EMEA in Munich. Here is the first.  If the topics I selected to write about seem to cover a big range, that is because CDNLive covers a big range. I can only write about sessions I attended, and with eight or nine parallel tracks, I can obviously only see a fraction of what is available.

So today's topics are virtual platforms, teenagers using Cadence's tools for mixed signal design (I bet you didn't know about transistors at that age), using dual-DFF cells to reduce power in ARM processors, and PCB design in the cloud. I hope everyone can find at least one of those topics interesting.

IP and Driver Development

This was presented by Bosch, so technically perhaps it is automotive. But it is actually about their experiences with virtual platforms and Palladium for IP development. The contents of the IP they were describing was confidential, although they did say they were part of a video processing pipeline so it is not that hard to guess what type of IP the blocks probably were. They were part of the Bosch semiconductor group in Sophia Antipolis. The IP blocks were part of an SoC that would be developed by another team, but they had to test the IP blocks together with something approximating the system on the final SoC.

bosch socThe challenge that they had is that they were bringing up IP and exercising it using software. By using a hybrid platform with the processor subsystem as a virtual platform and their RTL blocks on an emulator, they could do tests in hours versus weeks for pure RTL. Another advantage of the virtual platform is that it reduced the size of the load on the emulator, meaning that it could be shared with more groups or more tests simultaneously.

bosch platformThe diagram above shows the IP blocks under development in red. To test the IP blocks, they need to be instantiated together and run along with their drivers, or at least reference drivers if not the ones that will eventually be used. The bandwidth and latency of the whole system is important, and that is one of the things to be measured. The virtual platform processor is not cycle accurate, but the IP is RTL and so is cycle accurate by definition.

The initial version of the platform was created by Cadence. A subsequent version was optimized to have memory in both worlds so that the processor did not have to run out of simulated memory being accessed through a simulated memory controller. The memory was duplicated for the IP to work on in the accurate world, and the code running on the processor ran out of a local copy. Round the back was a means to keep them synchronized (and presumably the IP blocks never wrote code to memory for the processor to run immediately, so that case could be ignored).

Another issue was the debugger needed access to RTL registers. When the virtual part is stopped (at a breakpoint, say) there is no clock for the RTL part so it is not possible to access the registers through the RTL itself. Instead an XML description was used to create a backdoor so that the debugger could see the registers as if there was a clock. You can actually write the registers, too, but it takes the clock coming back on before the rest of the system notices.

The project was quite big with six users accessing the system in parallel, and two teams working in separate shifts.

The results were:

  • IPs fully validated within the hybrid environment
  • Software drivers validated within the environment
  • It now takes under two hours for a new IP delivery to be incorporated into the environment and testing can begin
  • Can access the registers live in debug mode
  • Flexibility about changing to a new NoC design
  • The hybrid platform is an excellent approach when the full SoC is not under control of the IP development team
  • Although this project was primarily hardware development, the setup can also be used to prototype and even validate the software drivers

 Microelectronics for Teens

One of the more surprising presentations I attended was one on the academic track by HTL-Rankweil school in Austria. It was about a mixed-signal analog front-end with a Bluetooth gateway. It was a mixed-signal design involving op amps, DACs. The thing that made it surprising is that HTL is not a university, it is a technical school for pupils from 14-20 years old. This design was done as part of an elective course on chip design by teenagers.

There were two teams of pupils, one for the analog front-end and one for the digital back-end. They started with a design that measured temperature and then communicated it over BT, but it was quite large. For a real IoT design, it would need to be small, it has too many components. The solution was to integrate it onto a chip. They used a 0.35um process from Austria Mikro Systeme (AMS). This process is relatively cheap, easy to use, and flexible.

austrian teen chipAbove is the layout of the chip. It will tape out about two weeks after CDNLive.

The goals were not just to design the chip, but learn how to work with industry, and how to run a project. At this level, they only had basic knowledge of how transistors worked, and if the pupils want to know much more then that will wait until they get to university. At 17, they learn how a transistor works, in an elective course that shows them long channel and short channel, no body effects, but enough that they managed to design both the analog front-end and the digital back-end. Everything, including simulation, is done in the Cadence environment.

Reducing Power with Dual Flipflops

ARM's Stephane Zonza gave a presentation on optimizing an advanced ARM Cortex-A processor. He didn't give much detail on the processor since it has not been announced yet, although he siad that we would have to wait two more weeks, so presumably the announcement is imminent. The design flow is iterative, not a full signoff tapeout flow. Each new design drop comes at about two-week intervals. However, the flow has to be realistic enough to be useful, with modern static timing including AOCV, synthesis to post-route optimization using GBA mode. The flow can highlight critical paths for subsequent optimization on the next drop. The two weeks thus end up being one week to do the trial layout and analysis and one week to modify the design. The overall goodness of the design was assessed with total negative slack (TNS) coming down and worst-negative slack (WNS), which was basically the maximum clock frequency.

arm v8 top flowThe main focus of his talk was on power reduction by using a new methodology making use of multi-bit, or in this case just 2-bit, flops. They are smaller since there is a single clock, but they are slightly slower. The power saving comes from the reduced clock tree, since there are only half as many leaf cells to clock (roughly), and so fewer buffers requred. There isn't any inherent power saving in the flops themselves.

The dual-bit DFF used a two-pass flow. In the first pass, they multi-bit as many flops as they can, everything except really timing critical or special useful skew requirements. Their experience with the version of the Genus solution that they started with was that this caused a big increase in runtime, but curiously the latest version is faster using the dual flops.

The second pass of the flow uses the Innovus Implementation System. The grouped dual flops are split back into two based on critical timing, first looking at the data before clock tree synthesis (CTS) and then again afterwards. But post-route splitting messes up the clock tree topology.

The results are that around 90-92% of flops are initially grouped in the first pass, and a few are split again in the second pass, resulting in 88% of flops being mapped to DFFs. The resulting clock tree capacitance is down 17%. No critical paths run through dual flops (or they would have been split). The whole synthesis and place and route runs for about 100 hours.

For what Stephane called "last mile" performance, they enabled a barely documented feature of the Innovus system known as the "extreme" flow, which allows more useful flow pre-CTS with only a minmal effect on hold violations.

Conclusion: They save about 5-6% on total dynamic power using this dual-flop methodology. 

Find. Place. Draw. All in the Browser

OrCAD has been working with Arrow to create a low-end PCB solution in the cloud that has limited capability but is free. It is targeted at startups getting off the ground, especially crowd-sourced ones (Arrow partners with IndieGoGo) and makers. IoT covers a huge range of solutions, of course, but at the low end, this is where this solution gets traction.

Luis Fischer of Arrow started by giving some statistics about Arrow: 19,000 employees, 460 locations in 58 countries. $24B sales. 125,000+ customers. Those are big numbers. Every month 50M engineers visit Arrow's websites and they reckon that 50% of all engineers visit regularly. (How do I get numbers like that for Breakfast Bytes?!)

orcad arrow partnership goalsThe plan is to make the OrCAD/Arrow solution the easy choice for PCB design and component sourcing, with open reference designs—there are 120,000—that can be simply loaded into OrCAD Capture. Parts can be searched from the Arrow portfolio, a bill-of-materials (BOM) can be automatically generated. Even before purchasing any components, change notifications can be sent to users just because they have used the component in OrCAD.

They announced OrCAD Capture Cloud in December 2016 (basically for CES 2017 in January). Over 3,000 users have been on it, about 800 from Europe.

Of course, the motivation for the two companies is that people will source components from Arrow, which is their bread-and-butter. For OrCAD, when designs get more complex and the free solution is too limited, that they will move up. Coming soon is an intermediate step callled OrCAD Entrepreneur, which will be a $99 solution for offline use. It will be easy to switch back and forth between online (running in the browser) and offline. There is more coming after that: complete BOM management, a library of 6M footprints and symbols, data from SiliconExperts database, cloud simulation, cloud layout. Luis said that it is only on the Arrow website for now but you can expect to see it on the Cadence website sooner or later.

There are a lot of instructional videos. Here's one to give you a flavor of the solution in use:

 

Rocking Design Technology Since 2006

The first CDNLive in Europe was held in 2006, at the Acropolis in Nice (which I know well having lived near Nice for nearly six years in the early 1990s). Since then it has always been in Munich. We all got T-shirts for the rock concert listing the shows over the years. Then Rock Tools took the stage...