Log In|Register|Resource Library| Worldwide
China|EMEA|India|Japan|Korea|Taiwan|Global Office Locator
  • Tools
    • ▼
    • Tools for:
    • System Design and Verification
    • Functional Verification
    • Logic Design
    • Digital Implementation
    • Custom IC Design
    • RF Design
    • IC Packaging and Co-design
    • PCB Design and SI/PI Analysis
    • Silicon Signoff and Verification
    • Products A-Z
    • Tools Home
  • IP
    • ▼
    • IP:
    • Interface IP
    • Denali Memory IP
    • Analog IP
    • Systems/Peripherals IP
    • Tensilica IP
    • Verification IP
    • IP Home
  • Solutions
    • ▼
    • Solutions:
    • Mixed-Signal
    • Low-Power
    • Advanced Node
    • 3D-IC
    • System to Silicon Verification
    • Solutions Home
  • Services
    • ▼
    • Services:
    • DFM Services
    • Design Services
    • Hosted Design Solutions
    • Educational Services
    • Methodology Services
    • Services Home
  • Support and Training
    • ▼
    • Support:
    • Support Offerings
    • Support Process
    • Cadence Online Support
    • Software Downloads
    • Computing Platform Support
    • University Software Program
    • Training:
    • Training Options
    • Training Course Catalogs
    • Support & Training Home
  • Alliances
    • ▼
    • Alliances:
    • Foundry Program
    • ChipEstimate.com - Chip Planning Portal
    • Channel Partners
    • Power Forward Initiative
    • PCB Service Bureaus
    • Interoperability
    • Connections Program
    • Standards and Languages
    • Industry Memberships
    • Alliances Home
  • Community
    • ▼
    • Communities:
    • All User Forums
    • Industry Insights Blog
    • Low Power Blog
    • Mixed-Signal Design Blog
    • System Design and Verification Blog
    • Design IP and Verification IP Blog
    • Functional Verification Blog
    • Logic Design Blog
    • Digital Implementation Blog
    • Custom IC Design Blog
    • RF Design Blog
    • PCB Design Blog
    • IC Packaging and SiP Design Blog
    • Silicon Signoff and Verification Blog
    • The Fuller View Blog
    • Whiteboard Wednesdays Blog
    • Quicklinks
    • All Blogs
    • Community Search
    • CDNLive User Conferences
    • Community Home
  • Company
    • ▼
    • Company:
    • About Us
    • News and Events:
    • Newsroom
    • Events and Webinars
    • Live Blog Feed
    • Company Info:
    • Executive Team
    • Board of Directors
    • Investor Relations
    • Careers
    • Contact Us
    • Resources:
    • Customer Success
    • Newsletters
    • Publications
    • EDA360 Vision Paper
    • Multimedia Center
    • Logos
Home > Community > Blogs > Industry Insights > ARM TechCon Paper: Why DRAM Latency is Getting Worse

Cadence Community

  • Site
  • Search Community
  • User
rgoering
rgoering
17 Nov 2011
  • controller IP
  • ARM Techcon
  • Industry Insights
  • Memory
  • DDR4
  • memory latency
  • Greenberg
  • controller
  • DRAM
  • DDR4-3200
  • tRC
  • bandwidth
  • DRAM latency
  • latency
  • ARM
  • DDR3
Subscriptions
Get email delivery of the Cadence blog (individual posts).
Blog Categories
  • Industry Insights
  • Low Power
  • Mixed-Signal Design
  • System Design and
    Verification
  • Design IP and Verification IP
  • Functional Verification
  • Logic Design
  • Digital Implementation
  • Custom IC Design
  • RF Design
  • PCB Design
  • IC Packaging and
    SiP Design
  • Silicon Signoff and
    Verification
  • The Fuller View
  • Whiteboard Wednesdays
  • All Blogs
Popular Tags
  • 20nm
  • 3D-IC
  • analog
  • ARM
  • cadence
  • DAC
  • DVcon
  • EDA
  • EDA360
  • encounter
  • ESL
  • Functional Verification
  • Incisive
  • Industry Insights
  • IP
  • Low Power
  • mixed signal
  • Mixed-Signal
  • simulation
  • SoC
  • SystemC
  • TLM
  • uvm
  • verification
  • Virtuoso
  • Twitter
  • Facebook
  • LinkedIn
  • Google+

ARM TechCon Paper: Why DRAM Latency is Getting Worse

Comments(0)

There's a general view that everything gets faster and better as technology advances, but when it comes to external memory latency, that's not the case. In a recent ARM TechCon paper Marc Greenberg, director of product marketing at Cadence, showed why DRAM latency is increasing and discussed ways of improving the situation.

The paper was titled "DDR4, Higher Speeds and Larger SoCs: Why External Memory Latency is Getting Worse, and What to do About it." It was presented before a standing-room-only audience Oct. 25. You can read an article by Marc Greenberg on the same topic in the Nov. 22 ChipEstimate.com newsletter. A video of the presentation is embedded below and you can also click here to view it.

Greenberg started the ARM TechCon presentation by showing a chart, based on publicly available data, that predicts a DDR4 read latency of 22 clock cycles for the highest DDR4 data rate. The chart assumes an average latency of around 13.5 ns and is basically a plot of 13.5 ns against the clock periods of the various DRAM types. "Basically the DRAM cell array hasn't changed in the past 10 years," he explained. "At its core is a 100 MHz to 200 MHz array that has an access time of about 10 to 15 ns."

RL-tRCD (RAS to CAS delay)-tRP (read-to-precharge) of DDR3 DRAM by speed grade, with curve-fit prediction for DDR4.

DRAM is getting faster, Greenberg noted, because successive DRAM technology generations are increasingly parallelizing the array. With DDR3, for example, you can send transactions to 8 arrays in parallel.  So even though the DRAM data rate has increased by over 10X in the past ten years, and CPU clock frequency has increased by over 10X, "the latency really hasn't changed," Greenberg said. "In fact, if you measure it in clock cycles it's been getting worse."

How Can We Improve?

In discussing ways to improve the situation, Greenberg pointed to some options that are in many cases impractical. He first warned that while reducing minimum CPU-to-DRAM latency is important, it should not be done at the expense of average latency, or at the expense of DRAM bandwidth. It is possible to make a very low latency DRAM controller that doesn't do any reordering of transactions, but that will come at the expense of DRAM bandwidth.

Other potential solutions include:

  • Adding more on-chip memory will reduce latency, but it's expensive.
  • Specialty DRAM with lower latency is available, but it comes at a high cost.
  • Off chip SRAM is fast but very expensive.
  • Out-of-order CPU execution lets the CPU work on other instructions while waiting for data from the DRAM, but there's a practical limit to the number of outstanding transactions, and a cost in area and power.

What if we just build a simple DRAM controller with the goal of reducing latency? This won't work, Greenberg said, because "a DRAM controller requires a queue of upcoming commands to optimize the performance of the DRAM. Almost every memory controller has the ability to look ahead. Without doing look-ahead optimization, you'll waste a bunch of clock cycles."

For the most common system configurations, Greenberg noted, DDR4-3200 speeds will require 5 to 6 cache line fills in the DRAM controller at any given time to have enough look-ahead to keep the data pipe full. Okay, you might conclude, we'll just have a simple controller that can look ahead but still executes in-order. That works until you issue two transactions to different rows in the same bank. Now the tRC (activate-to-activate) delay of each bank in DRAM becomes a problem. tRC is another timing parameter that is not decreasing over time; at DDR4-3200 with a tRC of 45ns, tRC delay will be 72 clock cycles.

More Complexity

Things get even more complex. For most system configurations, DDR4 speeds will require 14-18 cache line fills in the DRAM controller to cover the tRC time of the DRAM. But if all those transactions are done in order, latency will suffer. Further, you don't always need to hold exactly 6 cache line transactions in queue for effective look-ahead. What if a more optimal command comes along? Some degree of flexibility is needed.

Another complication is that modern systems have three types of masters -- latency-sensitive masters that need low latency, bandwidth-sensitive masters that need a lot of data, and maximum-latency masters that care only about a latency limit. Greenberg reviewed the requirements for each. He noted that memory controllers should re-order transactions for priority, making it possible to differentiate transactions based on their latency requirements.

Greenberg concluded that a static allocation of a fixed number of commands to the DRAM controller cannot reliably meet latency and bandwidth demands. The best approach is to allow as much flexibility as possible in command ordering, and to make decisions on command ordering as close as possible to the memory.

Note: In April 2011 Cadence announced the industry's first DDR4 IP solution. The solution includes hard and soft PHY IP, controller IP, memory models, verification IP, tools and methodologies, and signal integrity reference designs for the package and board. For more information on Cadence DDR memory controller IP and the optimizations it offers, click here. To view the video of the presentation, open the video image below or click here.

 

Richard Goering

Other blog posts about ARM TechCon papers:

ARM TechCon Paper: New Methodology Eases Challenges of 32/28nm Designs

ARM TechCon Paper: "Tips and Tricks" for ARM Cortex-A15 Designs


Comments

Post
About Cadence| Investor Relations| Careers| Terms of Use| Privacy Policy| US Trademarks
© Cadence Design Systems, Inc. All Rights Reserved.