• Skip to main content
  • Skip to search
  • Skip to footer
Cadence Home
  • This search text may be transcribed, used, stored, or accessed by our third-party service providers per our Cookie Policy and Privacy Policy.

  1. Blogs
  2. Breakfast Bytes
  3. Running the Software for an Embedded System
Paul McLellan
Paul McLellan

Community Member

Blog Activity
Options
  • Subscribe by email
  • More
  • Cancel
virtual platform
Palladium
embedded software
Emulation
protium s2
software development

Running the Software for an Embedded System

24 Sep 2019 • 6 minute read

 cdnlive logo breakfast bytes A big challenge with developing a modern SoC is developing the software. Not so much the development itself, which is not very different from any other software development, but running and debugging the software. Waiting for the silicon to be available has a few major problems. The first is that it serializes semiconductor and software development, rather than overlapping them. The second reason is that often the chip lacks all the visibility into its internals that software debug requires. A third problem is that it can be difficult to test rare conditions and error conditions with real silicon (such as a bit error coming out of a SerDes receiver).

There are various approaches from FPGA prototyping with the Protium S1 platform, emulation with the Palladium Z1 platform (above left), and various hybrid approaches.

FPGA Prototyping

I already planned to write this post before I went to Israel for CDNLive and Eli Zyss, VP Silicon Design at Altair (which is now owned by Sony) went into the details of their decision on what to use in a presentation titled Protium Experience (rather giving away the answer to his question). As he put it:

The challenge is to create all the drivers and bring up the operating system before the IC arrives.

He takes the view that there needs to be a stable SoC design before software development can use it, and so the start of test and debug of the software for them is RTL signoff. Many designs don't have the luxury of waiting that long, there is just too much software to get started that late in the design cycle. If software development starts at RTL signoff, then there are two to three months more between RTL signoff and tapeout, plus about three months for wafer manufacturing, giving about six months for software development. With super-hot-lot silicon processing, that schedule can be pulled in by just a week or two (at a cost), which is good for the overall schedule but shortens the time for software development even more.

The big issue is:

We need a platform for the software engineers and simulation is way too slow.

 Altair has had a Palladium system since 2007 and was amongst the earliest to use Palladium for software bring-up. They have had a Protium since the start of 2019, also used for software bring-up. Eli's project was an Arm processor with various hardware accelerators, 2.2M instances, a single 150MHz clock domain. He looked at three solutions:

  • Create their own FPGA: This is the cheapest solution, but there is limited debug. But the big downside is the investment in porting the design and getting it running on the FPGA. This is not just an investment in manpower, it uses up some of the valuable software development schedule, too.
  • Use Palladium Z1 emulation: As I said above, they have had Palladium emulators for over a decade. Palladium has very good debug, quick compile, and so a much smaller investment in porting. But the cons are that it is very expensive (if you don't already have one) and it is slower than an FPGA platform by about 3-5X. It also requires special installation since Palladium Z1 is (normally) water-cooled.
  • Use Protium X1 FPGA prototyping: This is relatively cheap (compared to emulation), a smaller investment in porting than doing their own FPGA, and faster than the Palladium platform. It also does not require any special cooling. The downside is that it has more limited debug capability. It also has a relatively long compile and P&R time—they were using 15 hours for a 2M instance design.

protium x1They finally decided to use Protium platform along with a CPU simulator. They had a two-FPGA system. This ran 25X slower than the actual silicon, which may sound a lot, but means that it is actually fast enough for software work. They partitioned the design so that FPGA A had a utilization of 64% logic and 92% RAM. FPGA B used 5% logic and 30% RAM. The only changes they made to the RTL were to add a Debug Access Port (DAP) to access the Arm via JTAG, and replaced the foundry SRAM model with a very simple RTL SRAM model.

Installing Protium S1 platform is:

as simple as installing a server, it's just a 19" rack mount.

They did also compile the design for their Palladium system, and this table gives the comparison:

Note that Cake2 runs at a substantially different (~2x slower) clock rate than Cake 1, so the speed comparison is qualitatively useful, but not quantitatively accurate. Eli's conclusion: The Protium X1 platform is a good candidate before the IC arrives. It has a really good cost-debug tradeoff. The flow works out of the box, especially when the design ran in Palladium beforehand, with no need for a dedicated support engineer. It is just a few weeks to be fully ramped up.

For full details on the Protium X1 platform, see my post Protium X1: FPGA Prototyping for the Enterprise from when we announced it back in May.

Emulation

The Protium X1 platform is not the only answer to the question of what to use for embedded software test and debug. As Eli said, a Palladium platform is another solution. This is especially attractive if you already own a Palladium system since the issues of cost and installation have already been addressed. Also, Cadence offers Palladium Cloud to make emulation available to customers who don't have a big and constant enough need to justify investing in their own Palladium system. For details on that, see my post Palladium Cloud. For details on Palladium, see my post Palladium Z1, an Enterprise Server Farm in a Rack.

Also at CDNLive Israel, Gel Peled of Intel's MobilEye, presented Hybrid Virtual + Emulation SoC Platform for SW-Drivers Validation. His team used a combination of a virtual platform model for the CPU and memory along with emulation for everything else. Their chip has a DDR interface to the main memory, and to gain speed and not waste emulation capacity, they replaced this with a smart memory model that communicated to the SystemC virtual platform model, as shown in the diagram.

The pink boxes on the right are all running on the emulator, the blue boxes are SystemC TLM models, and the yellow box is the Arm processor model.

Gil had some caveats about switching to a processor model from an RTL model:

  • It is not cycle accurate, just data accurate; signoff for cycle-accurate behavior needs to be done on pure RTL.
  • CPU access on memory bypasses the RTL interconnect, so there is lower utilization of the bus than in reality.
  • The virtual platform uses blocking transactions that could hide RTL race conditions.

Their results:

  • Full Linux OS boot in 32 seconds instead of 2-3 hours on non-hybrid emulation
  • High level of software validation and readiness even before Si arrival
  • Validate complex software and complex scenarios
  • Validate SW drivers for peripherals in OS context: OSPI, eMMC and PCIe on RTL
  • Shorten SW bring up time for silicon
  • User-friendly environment for both HW and SW debug
  • Same SW image used as for pure emulation
  • Quick retest of new SW image
  • Emulator gate count of hybrid design is 37% lower

His assessment of this hybrid approach compared to others:

  • Vs pure emulation: Dramatically reduced boot time, interactive operation via console
  • Vs pure virtual platform simulation: Much more precise model, and no need to develop any models for peripherals
  • Vs silicon: Start SW development before silicon arrival, enable error injection to test bad path scenarios, convenient waveform debug of hardware signals and register values to debug HW/SW issues

Both Emulation and Prototyping

Another attractive approach is to use both emulation and prototyping. Early in the design when the RTL is not completely stable, being able to compile fast is more important than run-time for the software tests, so it makes sense to use the Palladium platform. Later, once the RTL is getting more stable, it makes sense to invest the time to compile the design for a Protium platform and control how often new RTL drops get accepted. This was the approach used by Amlogic, which I wrote about in my post Pre-Silicon Software Development with Protium and Palladium Platforms.

 

Sign up for Sunday Brunch, the weekly Breakfast Bytes email.