Get email delivery of the Cadence blog featured here
Before getting into the details of today's topic I'm happy to report a brand new baby girl was born on October 1 into the Andrews family of Ham Lake, MN. She is our sixth child, and the forth girl to go along with two boys. Currently, I play a lot of golf with my oldest three kids and with the new baby girl I'm assured the three youngest will form my next foursome after the oldest three grow up and leave home.
My first job in EDA (back when I only had 1 child) was at Simulation Technologies in St. Paul, MN working on a product called Virtual-CPU or just V-CPU for short. The software was developed inside Cisco in San Jose by a consultant named Benny Schnaider for the purpose of early integration of software running with hardware simulations of Cisco routers. Details of V-CPU were first published at the 1996 Design Automation Conference in a paper titled "Software Development in a Hardware Simulation Environment". I have continued to toil in EDA, but Benny has moved on to many other things including working in another of my favorite areas, virtualization, at a company called Qumranet. There is even one line in his CEO profile hinting at this V-CPU work at Cisco. At the time Cisco used off-the-shelf MIPS processors on boards with custom ASICs to build routers (these were the days before every chip had multiple processors embedded in it). The ASICs were simulated in Verilog-XL (on Sun and HP workstations) and the software was the Cisco operating system, IOS. It was not feasible to obtain any kind of model of the MIPS processors that would run inside Verilog-XL, so Benny implemented a way to run the software on the host machine and use a network socket to connect to the Verilog simulator and drive a MIPS bus functional model. This technique of running software on the host machine at high speeds and only communicating with the simulator when interesting data accesses occurred is called host-code execution.
One of the coolest features of V-CPU was something called "implicit access". Most companies that are using host-code execution today use "explicit access". This means they require all places in the code that access the hardware to call read() and write() functions so every hardware access goes through a common set of functions and then they use #ifdef to change the hardware accesses to call the simulator if they are doing verification with host-code execution. If they are running on the target system, then pointer dereferences are used. The code below shows an example of explicit access. This works just fine if software engineers plan for host-code execution and structure the code correctly to access the hardware from a central location.
If the planning was not that great or if the code base is just large and uses scattered pointers everywhere, there is no way to go into the code and change every hardware access into a function call. This is where implicit access came in. It provided a way to automatically trap pointer dereferences that were reading and writing to hardware locations and convert the load or store instruction into a simulated read or write. For reads it would put the result into the proper host CPU register and the user had no idea that a line of C code would magically turn into a bus transaction on a Verilog BFM. The code below shows implicit access, of course it's nothing but regular C code using pointers, but underneath was some nifty low-level programming involving the assembly language of the host machine. In the V-CPU days Cisco ran the software on Sun workstations so the complexity of the load and store instructions on the Sparc RISC processor was much less than the x86 instruction set which is the most common host CPU today. Implicit access made host-code execution feasible for projects that didn't really plan for it.
I still run into host-code execution at companies today. Over time they have figured out how to plan for it or have created something similar to the clever implicit access feature in V-CPU, but I have to wonder if the days of host-code execution are coming to and end. Chris Tice, who is the General Manager of the Cadence emulation group, tells of how he once told a customer that host-code execution didn't seem that important and they could just do everything with a Palladium emulator. He says it took some time for him to recover because engineers really like the ease-of-use and performance of running software on a host machine. Chris is a sharp guy so I'm sure he understands the benefits now also.
The main reason I'm pondering the end of host-code execution is because of the emergence of new high performance CPU models that execute the instruction set of the embedded processor and run at nearly the speed of the host. Given the hassle of host-code execution I would prefer to cross compile the software and run the target instruction set. Beyond the implicit or explicit access issue, this also eliminates issues with differences in data type sizes, data structure layout, byte order (endianess) and other differences between the host and target processor. New techniques are now available that use code translation to dynamically translate the target instructions into host instructions. These models provide the speed of host-code execution and run the target instruction set.
Recently I have been working with three tools that provide very high performance models of embedded processors. Two commercial ones are the ARM System Generator and Simics from Virtutech. An open source software that I also work with is QEMU. Another that I don't have direct experience with is sponsored by Imperas is OVP. All of these models provide excellent performance for popular embedded processors. Since details of how the proprietary tools work are harder to come by, here is a link to some info on how QEMU works, it's interesting stuff.
Yesterday I ran a test that booted and ran embedded Linux for 30 seconds of simulation in only 15 seconds of wall clock time. The equivalent speed is greater than 1 GHz (and this is on a not very impressive laptop).
Anybody out there doing interesting things with either host-code execution or with fast CPU models?
Running fast is just the tip of the iceberg, there are lots of other interesting topics related to the Virtual System Prototype or Virtual Platform. The parallels to the workstation and server virtualization industry are also very interesting, since I usually run the embedded system Virtual Platform inside a VMware virtual machine, but these are all topics for another day. The virtual model is quickly becoming the logic simulator for software engineers, and holds great promise to improve embedded software development. Unfortunately, it's time for me to follow the #1 rule of parenting, sleep when the baby sleeps.
Shane, thanks for reading. I agree host-code execution will continue to be used, but the post does stimulate some thinking about how we are executing software. Another technique is to use a virtual machine for x86, something like VMware, VirtualBox, or QEMU to do the same host-code execution you were doing before. This can provide the exact same software environment, but can offer additional features like starting and stopping the entire virtual machine when the hardware simulation is running for better synchronization.
I agree with your points about HCE being less valuable with the availability of high speed models. However, there are two reasons I can think of the may keep HCE around for for some time. 1) Money: Fast models may require licenses that are not required for users of HCE. 2) Project Management: If you decide to use the fast model approach, then your firmware will need to run on an OS that may or may not be ready for the target processor. HCE de-couples OS development from driver/low-level firmware development. I would see HCE and fast models co-existing for some time (at least for early phases of the development). Congrats on #6 Jason!!!
In developing quantum chemistry software for physiological systems simulations, I am confronted with the task of developing balance between computational accuracy and efficiancy. Parallel code is an absolute neccesity. With the introduction of Microsoft Compute Cluster Server, i'm going to rely on virtualization for code testing; allowing me simulate multiple machines from a single host, prior to scaling up.
Jakob, all great points. Maintaining the code for an additional target is not free and takes extra work. Final performance depends on how much time is spent running on the host (or in the virtual platform) and how much time is running in the logic simulator. This is sometimes called the hardware density as it represents how often the software accesses the hardware. More details of how to compute the hardware density are in my book at coverification.home.comcast.net I also covered this at the Embedded Systems Conference 2002 in San Francisco.
Yes, virtual platforms do tend to simplify things... you do not mention the expense of maintaining that host-compile as an additional target, possibly including an API-level simulator of your target operating system.
What I wonder, about, though is that the speed of the target code really matters when it is coupled to something as slow as a VHDL/Verilog simulator? RTL-level simulation would tend to be many orders of magnitude slower than the rest, so how much do you gain here from using a fast simulator?
See also jakob.engbloms.se/.../308 for a lengthier reply/commentary