Cadence® system design and verification solutions, integrated under our System Development Suite, provide the simulation, acceleration, emulation, and management capabilities.
System Development Suite Related Products A-Z
Cadence® digital design and signoff solutions provide a fast path to design closure and better predictability, helping you meet your power, performance, and area (PPA) targets.
Full-Flow Digital Solution Related Products A-Z
Cadence® custom, analog, and RF design solutions can help you save time by automating many routine tasks, from block-level and mixed-signal simulation to routing and library characterization.
Overview Related Products A-Z
Driving efficiency and accuracy in advanced packaging, system planning, and multi-fabric interoperability, Cadence® package implementation products deliver the automation and accuracy.
Cadence® PCB design solutions enable shorter, more predictable design cycles with greater integration of component design and system-level simulation for a constraint-driven flow.
An open IP platform for you to customize your app-driven SoC design.
Comprehensive solutions and methodologies.
Helping you meet your broader business goals.
A global customer support infrastructure with around-the-clock help.
24/7 Support - Cadence Online Support
Locate the latest software updates, service request, technical documentation, solutions and more in your personalized environment.
Cadence offers various software services for download. This page describes our offerings, including the Allegro FREE Physical Viewer.
Get the most out of your investment in Cadence technologies through a wide range of training offerings.
This course combines our Allegro PCB Editor Basic Techniques, followed by Allegro PCB Editor Intermediate Techniques.
Virtuoso Analog Design Environment Verifier 16.7
Learn learn to perform requirements-driven analog verification using the Virtuoso ADE Verifier tool.
Exchange ideas, news, technical information, and best practices.
The community is open to everyone, and to provide the most value, we require participants to follow our Community Guidelines that facilitate a quality exchange of ideas and information.
It's not all about the technlogy. Here we exchange ideas on the Cadence Academic Network and other subjects of general interest.
Cadence is a leading provider of system design tools, software, IP, and services.
Get email delivery of the Cadence blog featured here
There's a general view that everything gets faster and better as technology advances, but when it comes to external memory latency, that's not the case. In a recent ARM TechCon paper Marc Greenberg, director of product marketing at Cadence, showed why DRAM latency is increasing and discussed ways of improving the situation.
The paper was titled "DDR4, Higher Speeds and Larger SoCs: Why External Memory Latency is Getting Worse, and What to do About it." It was presented before a standing-room-only audience Oct. 25. You can read an article by Marc Greenberg on the same topic in the Nov. 22 ChipEstimate.com newsletter. A video of the presentation is embedded below and you can also click here to view it.
Greenberg started the ARM TechCon presentation by showing a chart, based on publicly available data, that predicts a DDR4 read latency of 22 clock cycles for the highest DDR4 data rate. The chart assumes an average latency of around 13.5 ns and is basically a plot of 13.5 ns against the clock periods of the various DRAM types. "Basically the DRAM cell array hasn't changed in the past 10 years," he explained. "At its core is a 100 MHz to 200 MHz array that has an access time of about 10 to 15 ns."
RL-tRCD (RAS to CAS delay)-tRP (read-to-precharge) of DDR3 DRAM by speed grade, with curve-fit prediction for DDR4.
DRAM is getting faster, Greenberg noted, because successive DRAM technology generations are increasingly parallelizing the array. With DDR3, for example, you can send transactions to 8 arrays in parallel. So even though the DRAM data rate has increased by over 10X in the past ten years, and CPU clock frequency has increased by over 10X, "the latency really hasn't changed," Greenberg said. "In fact, if you measure it in clock cycles it's been getting worse."
How Can We Improve?
In discussing ways to improve the situation, Greenberg pointed to some options that are in many cases impractical. He first warned that while reducing minimum CPU-to-DRAM latency is important, it should not be done at the expense of average latency, or at the expense of DRAM bandwidth. It is possible to make a very low latency DRAM controller that doesn't do any reordering of transactions, but that will come at the expense of DRAM bandwidth.
Other potential solutions include:
What if we just build a simple DRAM controller with the goal of reducing latency? This won't work, Greenberg said, because "a DRAM controller requires a queue of upcoming commands to optimize the performance of the DRAM. Almost every memory controller has the ability to look ahead. Without doing look-ahead optimization, you'll waste a bunch of clock cycles."
For the most common system configurations, Greenberg noted, DDR4-3200 speeds will require 5 to 6 cache line fills in the DRAM controller at any given time to have enough look-ahead to keep the data pipe full. Okay, you might conclude, we'll just have a simple controller that can look ahead but still executes in-order. That works until you issue two transactions to different rows in the same bank. Now the tRC (activate-to-activate) delay of each bank in DRAM becomes a problem. tRC is another timing parameter that is not decreasing over time; at DDR4-3200 with a tRC of 45ns, tRC delay will be 72 clock cycles.
Things get even more complex. For most system configurations, DDR4 speeds will require 14-18 cache line fills in the DRAM controller to cover the tRC time of the DRAM. But if all those transactions are done in order, latency will suffer. Further, you don't always need to hold exactly 6 cache line transactions in queue for effective look-ahead. What if a more optimal command comes along? Some degree of flexibility is needed.
Another complication is that modern systems have three types of masters -- latency-sensitive masters that need low latency, bandwidth-sensitive masters that need a lot of data, and maximum-latency masters that care only about a latency limit. Greenberg reviewed the requirements for each. He noted that memory controllers should re-order transactions for priority, making it possible to differentiate transactions based on their latency requirements.
Greenberg concluded that a static allocation of a fixed number of commands to the DRAM controller cannot reliably meet latency and bandwidth demands. The best approach is to allow as much flexibility as possible in command ordering, and to make decisions on command ordering as close as possible to the memory.
Note: In April 2011 Cadence announced the industry's first DDR4 IP solution. The solution includes hard and soft PHY IP, controller IP, memory models, verification IP, tools and methodologies, and signal integrity reference designs for the package and board. For more information on Cadence DDR memory controller IP and the optimizations it offers, click here. To view the video of the presentation, open the video image below or click here.
Other blog posts about ARM TechCon papers:
ARM TechCon Paper: New Methodology Eases Challenges of 32/28nm Designs
ARM TechCon Paper: "Tips and Tricks" for ARM Cortex-A15 Designs