Every year, UBM, the parent company of EE TImes and EDN (and the organizer of several shows such as ESC and ARM TechCon) has a competition known as the ACE Awards. ACE stands for Annual Creativity in Electronics. They have been going for a long time, I remember collecting one in my Compass Design Automation days for some long-forgotten product.
Cadence produces EDA software and IP, so while we enable a lot of world-class design teams, we are not normally eligible to enter the category for Design Team of the Year. We don't do design (ok, some test chips, but those are hardly Design Team of the Year material). But every two or three years, we produce a new version of the Palladium emulator family. Now that is Design Team of the Year material, and indeed Cadence won the 2016 award for the team that developed the Palladium Z1 Enterprise Emulation Platform.
I am not going to describe the details of the emulation platform itself in this post. If you want to know them then you can read my post Palladium Z1: An Enterprise Server Farm in a Rack. But the headline numbers are a capacity of up to 9.2 billion gates, 92 percent smaller footprint, 8X higher gate density, and up to 44 percent better power density than previous generations. Plus, it is made for sharing, with the ability to run 2304 parallel jobs with 4M gate granularity. It looks pretty neat too (especially inside the door):
The team "ate their own dogfood" and applied all the system development techniques we recommend to our users to develop chips and systems, including virtual prototyping, simulation, and formal techniques, extensive emulation of Palladium Z1 on the earlier generation Palladium XP II. All this led to first-time-right silicon as well as significantly reduced time to market for the hardware/software system.
The technical hurdles to increase user productivity were severe. Virtualization had to happen at several levels, and form factors changed completely to allow usage in datacenters, posing significant integration and thermal challenges. Several “industry-first” breakthrough features were introduced. Full virtualization of the external interfaces is achieved using a unique virtual target relocation capability, enabling remote access of fully accurate real-world devices up to 30m away from the emulation racks.
The hardware combines custom Cadence bit-level processor IP and Cadence peripheral IP into a 150+ million gate SoC with significant memory content. Several SoCs are then integrated on boards into logic drawers yielding emulation capacity of 32 million gates, which are then integrated into clusters of six drawers yielding 192 million gates emulation capacity. Three clusters can be integrated into racks of standard datacenter format yielding 576 million gate emulation capacity per rack. Up to 16 racks can be combined using optical interconnect to yield that headline number of 9.2 billion gates. Going in the other direction, in the prototype lab there is a test board that contains just a single SoC, but it is still capable of emulating a system...just not a very big one. It is probably the smallest emulator in the world.
A software stack of 3+ million lines of software code executes on and connected to the Palladium Z1 hardware. Firmware enables bare metal execution, connections, and interfaces, and a run-time operating system controls Palladium Z1 during execution including complex job scheduling and resource management. An application software stack and GUI enables key functionality like hardware software debug, in-circuit emulation, verification acceleration, low-power verification, dynamic low-power analysis, virtualization of real world interfaces, design for test, etc.
Key design targets included 5X increased emulation throughput, 8X better gate density, and 44% better power efficiency in a standard rack design with smaller foot print. A key focus was on quality and reliability, reducing the number of discrete components by 12X using standard interfaces. Scalability and maintainability was achieved using optical interconnect (those are the turquoise "cables" that you can see when the front door is open). The verification workload efficiency was increased by up to 2.5X using relocatable external targets, advanced job reshaping, and virtualization of interfaces. Significantly higher system integration was achieved using the custom SoC replacing multi-chip modules and FPGA memory controllers. Diagnostics include self-monitoring and smarter reporting, specialized hosts were eliminated using connections based on standard Infiniband protocols.
As a result, the team achieved about 20X improvement in footprint compared to previous generations, about 6X improvement in weight, about 14X reduction in the volume of liquid used to provide cooling, and more than 20X improvement in interconnect space/volume. In reality at customers sites, the team achieved a 6X reduction in installation time using a highly modular system with improved availability and upgradability, and more than 2250 telemetry health/operating monitoring points in a single chassis using a distributed monitoring architecture. The system allows "push-plug" target interconnect and has about half a km of optical cabling inside a single cabinet.
The Palladium Z1 team was tasked to move emulation into the datacenter, which requires new levels of virtualization previously not available. Several key innovations were made to overcome these challenges. Previous generation emulators had external interfaces directly connected to the system. In a datacenter, the remote connections to interfaces like Ethernet, PCIe, USB, etc., need to be placed further away. For that the “virtual target relocation” capabilities were introduced as an industry first. Target pods to connect speed adapters for external interfaces can now be placed up to 30m away from the system and can also be dynamically relocated between development jobs executing in the emulator.
Thermal and power consumption posed unique challenges in itself. To be able to operate in a datacenter environment and to achieve the appropriate scale of integration, air cooling of the individual boards was not possible—a classic challenge in supercomputing. Instead, water cooling mechanisms were implemented across the system, reaching every core processing chip.
As a highly parallel multiprocessor system with thousands of processors per chip, the interconnect structure poses a significant challenge of scale. Interconnect has to be managed between processors on chips, between chips on boards, between boards in a cluster, between clusters in a rack, and between several racks. Interconnect technologies range from on-chip structures though copper to optical technologies. The multicore fabric works in conjunction with advanced custom compiler technology that consumes the hardware description language (RTL and gate level) of a user design and maps it into the custom multicore fabric provided by the Palladium Z1 hardware.
To allow software development as early as possible, virtual prototyping was applied to represent the hardware for software development. The next-generation emulation system was emulated on the previous generation emulator, the Palladium XP II platform, including PCIe target interfaces both using verification acceleration and In circuit emulation. Hardware abstraction layers were developed so that the software couldn't tell in which environment it was running (emulation or real silicon). The exact software customers would use once full hardware was available was actually running over a year before first shipment. When silicon came back, the chip worked immediately and did not require any re-spins. Within hours it was emulating a design booting Linux.
All this was achieved as a true “one-team” effort combining hardware and software disciplines to achieve the best architecture and implementation. The team worked on the project for four years.
If you want to see a video taken in what NVIDIA thinks is the largest emulation lab in the world, then Narendra Konda explains how they use their Palladium emulators:
For more information, take a look at the product page for Palladium Z1, including the datasheet and more.
Previous: Mediatek's Experience with Perspec