New Capabilities in the C-to-Silicon Compiler 2013 Releases

6 Jan 2014 • 4 minute read

2013 was a banner year for high-level synthesis and C-to-Silicon Compiler in particular. We saw our customers take on over 75 new projects using C-to-Silicon, much of that coming from expanded adoption within our existing customers. These designs spanned everything from algorithm-intensive designs like image processors to control-dominated designs like H.265 and high-speed cache controllers to various mixtures such as automotive and networking applications.

As part of this expansion effort, the focus of our new feature development has been on delivering what these new groups need to more easily adopt C-to-Silicon. In almost all cases, the designers were new to high-level synthesis (HLS), coming from a register-transfer level (RTL-) centric background. In many cases, the requirements involve more easily adapting C/C++ code from their algorithm teams. In any case, we spent a lot of time working directly with and listening to customers so we could make HLS easier to adopt for new hardware design projects.

These capabilities were delivered over the course of 2013 - if you upgrade to the latest version on downloads.cadence.com you will be able to take advantage of these new capabilities and more. All new features in each release are detailed in the "Release Notes" in the first chapter of the C-to-Silicon Compiler User Guide. But here is a quick overview I put together with the help of Felice Balarin, who in his role as a Sr. Architect in Product Engineering spends a lot of time working with customers and then specifying what we need in each release.

Ability to pipeline functions. Older versions of C-to-Silicon required combinational C functions that needed more than one clock cycle to complete to be inlined in order to be pipelined. This allows you to take significantly more complex designs into C-to-Silicon without re-writing the source code and decomposing them into smaller pieces. This makes it much easier to get the best quality of results for new designs. This command is in the popup menu for Functions in the GUI, or with the "pipeline function" text command.

Memories with a read latency of three. Previously the max read latency supported for memories was two for built-in RAMs, prototype memories, and vendor RAMs. Now if your design requires a read latency of three, the "-read latency" option for "allocate_memory", "allocate_prototype_memory", and "allocate_builtin_ram" supports values between one and three. This increases the class of design for which C-to-Silicon can be used.

RTL schematic viewer. We have always taken care to try to generate RTL that is as readable as possible for machine-generated code. But it's much easier to grasp the structure and flow when it's presented visually, so we have included a cross-linked RTL schematic viewer in the GUI, complete with searching and filtering. Additionally, the timing-critical paths can be isolated as schematics.

RTL schematic viewer

Critical path viewer and corresponding RTL schematic:

Critical path schematic

Pragma "ctos keep_signal" for white-box verification. "White box" verification requires that the testbench have read access to signals internal to a module (the opposite of "black box", which would only allow for reading the inputs and outputs). Sometimes those signals are optimized away or transformed during high-level synthesis. We have added a pragma that specifies that an sc_signal be maintained post-synthesis for verification purposes.

Support for separate memory clocks. The aforementioned memory allocation commands now have a "-clock" option in order to specify a separate clock signal for accessing the memory. The clock still needs to be at the same frequency as the module's main clock, but by having a separate clock signal you can now control the clock to the memory and memory interface logic in order to save switching power.

Sequential functions with memory accesses no longer need to be inlined. This is pretty self-explanatory, and it was developed to enable more architectural exploration freedom, and the ability to handle more complex designs that would otherwise have to be decomposed

Packed structs. When converting composite data types into bit vectors, C-to-Silicon always aligned components to byte boundaries. This approach is usually beneficial because it simplifies accessing logic, but sometimes it generates redundant logic. But now by using a pragma, you have a choice of object models in order to tune your quality of results.

C-to-Silicon directives embedded in source code. The C-to-Silicon use model always mandated that commands be issued via the GUI or in Tcl. This was to separate the functionality from the implementation, one of the key benefits of high-level synthesis. However, there are certain cases where designers require control over the particular implementation of a chunk of code. A good example is using "#pragma ctos create_protocol_region" to specify that the C-to-Silicon Compiler scheduler not add or remove states in the communication protocol section of code.

Additionally, there were many minor usability enhancements added, including:

Additional RTL mux implementation option. C-to-Silicon has always had attributes to specify "reverse one-hot" and "case z" muxes. Now "one hot" is also supported.
Write every Verilog module to a separate .v file. This was developed to support existing methodologies that require each module to be in its own file. Now when you issue the "write_rtl" or "write_sim" commands, the "-dir" option specifies an output directory where each <module> will be written to <module>.v. The ability still exists to write all modules to a single .v file, using the "-file" option.
Multi-phase scheduler results display. We implemented the multi-phase schedule in 12.2 in order to be able to provide earlier feedback on the feasibility of scheduling a given design/constraints. Now additional design, behavior, progress, and timing feasibility feedback is displayed so you can make a more informed decision as to whether to abandon a run or let it finish.
Transforming shared arrays. Previously, if an array was accessed from multiple processes, it could not be merged or restructured. Now this is possible without re-writing any code.

Enjoy!

Felice Balarin

Jack Erickson

Subscriptions

New Capabilities in the C-to-Silicon Compiler 2013 Releases