The month November goes to the Brits, no question. Not only did the James Bond movie Skyfall open, but Santa Clara also experienced somewhat of a "British Invasion" for ARM TechCon in the Santa Clara convention center. To be there properly I even brought out my favorite new pin striped suit ;). With that being at the cleaners now that ARM TechCon is over, I am reflecting on what I heard. In my mind ARM TechCon's focus was all about low-power -- at all levels of abstraction.
Various presentations at ARM TechCon focused on low power, including one describing techniques at the system-level by my colleague Michele Petracca and Yosinori Watanabe on "Analysis of Software-Driven Power-Management Policies Using Functional Virtual Platforms." So what about the later stages once we have passed the TLM level? I recently had blogged about emulation and how many cycles it takes to verify a big.LITTLE sub-system.
Earlier this year at CDNLive China, Peng Wang of Nufront -- a provider of chipsets for mobile computing -- won the best paper award for a presentation called "NS115 System Emulation Based on Cadence Palladium XP". His presentation was quite fascinating. He described how Nufront used emulation to verify the third generation computer system chip NS115, which provides performance and low power execution for Android applications.
Speaking first of the challenges, Peng Wang described the NS115 as a very large and complex design of about 12M Gates, containing a dual core ARM Cortex-A9 processor, a Mali400 multi-core 2D/3D graphic processor, a dedicated 2D block for hardware acceleration and numerous interfaces and memory subsystems including LPDDR2 and a DDR3 memory interface up to 800 Mbps. Given that the design is running Android, external storage, multiple screen displays and the ability to accept data inputs from various sources need to be considered. This combination of features leads to a long start time for IC simulation.
For system-level verification, Nufront determined that software simulation and FPGA based prototyping were not suitable. RTL simulation was too slow, and the frequent design iterations and the need for full debug visibility made the design unsuitable for FPGA based prototyping in the stage of the project Nufront was at. This nicely validates the points I made earlier on the advantages of processor based emulation over FPGA based approaches -- both have their place in system development.
Nufront chose Palladium XP as their solution for emulating the ARM based NS115. They reported performance improvements of about 1,000 times over pure software simulation with up to 1.3Mhz real-time frequency for the NS115. They were able to synthesize and implement the whole chip of 12 million gates easily, and they could include external models for DDR and eMMc conveniently within the Palladium XP Verification Computing Platform. They also used real world Interfaces with SpeedBridges for VGA, UART, SD/MMC, JTAG and USB.
During software bring-up they compiled the Android kernel with default boot arguments, pre-loaded the kernel image into DDR using memory load functions, and modified the ROM boot code to directly jump to specific kernel positions.
Before booting full Android, Nufront used a BusyBox RAM file system that was small and simple, and supports basic Linux commands. Engineers loaded the kernel image without external storage.They ran test cases under the Linux console and eventually used the chroot command to switch to Android. On Palladium XP it took about 15 minutes to boot a RAM file system.
Later Nufront used a refined Android system to save boot time. After removing unnecessary applications, disabling the JNI functions and unnecessary JAVA classes, as well as removing some not useful items from init.rc, Nufront was able to boot an Android system in about 2 hours compared to a projected time of 83 days in an RTL simulator.
Once up and running, Nufront captured LCD frames using Video SpeedBridges as output, used as input events representing the key input, a monkey system generating touch input, and special commands to start applications, services and broadcast intent. To test an application they eventually ran benchmark applications pre-installed in the Android file system.
For power -- one of their key issues -- they used the Palladium XP Dynamic Power Analysis (DPA). This allowed NuFront to get data of power peak windows by collecting toggle and weighted toggle counts at low resolution. With that information they found power peaks and zoomed into the peak window and then re-generated the next iteration of data at higher resolution.
For the actual power analysis they generated TCF (Toggle Count Format) files for each peak and calculated the dynamic power consumption from TCF files, gate level netlist and other libraries. The figure associated with this post shows the results for a video decoding function for which Nufront was able to optimize the hardware/software interactions to optimize for low power.
In summing up Nufront's experience, Peng Wang cited four main results of their use of the Palladium XP Verification Computing Platform:
So how does one optimize ARM based designs for low power? At all levels from TLM through implementation! As this example from Nufront shows, emulation is becoming a crucial, necessary step for power optimization. Here is an example of a tablet device the NS115 enables - the IPS Tablet by Xusit.
If you want to read more about the power flows at TLM and Emulation, here is a recent article I wrote called "Optimizing for Low Power prior to Silicon Availability". And as I am writing this during a trip to New York in a hotel lobby between visiting my daughter's newly born cousin and watching "Spiderman - Turn off the Dark" tonight here at Broadway, I am reminded how necessary power optimization really is: Imagine five guys around a table at "Link @ Sheraton," all of us sharing two wall outlets to charge our various devices and power the laptop I am writing this on. Oh well. Way to go for power optimization.