Imagine this scenario: Your chip is a low power design. You’ve used everything in the book – clock gating, multiple threshold optimization, power shutoff, multiple supply voltages etc. What else can you do to reduce power in your design?
Or, maybe you can’t do power shutoff – the entire device is always on. Maybe you can’t use multiple supply voltages (face it – if you’re already running at 0.8V, how much lower can you go?) But you know you have plenty of random logic, and you know you have to reduce power in your design.
A dual flop methodology could help to furter reduce power in your design. What is a dual flop? It’s basically two flops physically merged into one. Kind of like a multi-bit flop, but in parallel instead of in series. The merged flop will share the same clock pin, but besides that, it’ll have two separate inputs and outputs.
This setup saves dynamic power in two ways: first of all, there is some savings from efficiency by using a common clock pin. In the worst case scenario, the new clock pin will have double the amount of capacitance, resulting in no significant savings, but usually there is some amount of efficiency and the resulting capacitance of the clock pin will not be double the original capacitance of the individual clock pins, but some amount less than that. Therefore some amount of dynamic power will be saved there.
The second way this setup saves dynamic power is in the clock network distribution. For every two flops, instead of the clock network having to route to two individual places, the clock network now only has to reach one location. Therefore, dual flops are more tighly clustered than individual flops, which results in savings on clock distribution net length, and more importantly, the buffers needed to drive the clock distribution network.
So, how much power does dual flop save? It really depends on what kind of design you’re dealing with. For designs with a large portion of random logic, and especially designs where clock power is a significant contributor to total power (e.g. designs with large clock networks or low signal-to-clock switching ratio), using a dual flop methodology will yield better results. Used in the right design category, a dual flops methodology has the potential of saving roughly 10%-20% of total power in the design.
What do you think?
Wei Lii Tan
is dual flop methodology different from dual edge triggered flip flop , or both are same.
instead of all this cluster the flops around the gate
Is this a miniature concept of memory, where in all the memory element share a common clock point, Is not an overhead to the router and how does the congestion map behave for this ?
it will definately ease congestion, apart from the clock routes, i'm assuming the flops would share a common set/reset as well. even better if they are internally scan connected.
Yes, it really depends on the design. High random logic (meaning the design is not dominated by hard macros), low switching ratio (meaning the clock actually uses a good proportion of the total power), and definitely the way the dual flop cells are designed, all are critical factors on the effectiveness.
10 percent? interesting. but is it possible achievable in high power set-up