I am curious to know how you handle the path to clock gating enable pin.
I have a design containing multiple level of clock gating on the clock
network and when synthesizing using ideal clock all my patht o enable
pins have a full cycle but when My CTS will introduced the
latency of the network will reduce the available time to reach those
I understand I could perform post CTS optimization but I would prefer a
more robust method to constraint those in ideal clock mode.
I'am thinking of the 2 following approach and would like to hear from you if you have use them or if you have used any others.
- max_delay to enable pins equal to (clock period - expected network latency post clock gating element)
- defining generated clock after each clock gating element with different latency
Those 2 methods have the inconvenience of requiring a lot of data management :(
Thanks for your help,
I think the solution can be in the design and not in the implementation. Example of solutions are:
1. The enable generation logic must be simple, so timing can be met
2. The enable logic to the clock gater at the base of the clock tree should be multi-cycle.
During synthesis, there is no way (at least for the current technology)
to predict the clock latency of the clock gater. So if you happen to
have timing closue issue on some clock gaters, then you could
re-syntheiss with additional constraint for those clock gaters, or
specify the FFs driven by those clock gaters not to be clock gated.
RC has better command than set_max_delay. Use "path_adjust" command.
set_clock latency at the clock gating cell might be better than
set_max_delay. Note that set_max_delay (as well as set_clolck_latency)
are both affect by the clock skew. With set_max_delay, this constraints
has to be removed after cts. With set_clock_latency, it is ignored
after cts. I remember reading from somewhere that different tool
understand "set_max_delay" different as the defination of this command
is not clear. Not sure if this is still the case now.
I like path_adjust because it does not depend on clock skew.
Eng Han,You're right about removing the timing constraints after CTS if you follow my suggestion. It's a good point, and I should have said that in the first place (our flow uses tweaked SDCs between pre and post CTS, I just forgot about that "minor" detail!!)."path_adjust" sounds good, but how many tools other than RC support it? I'd not come across it before, might have to have a play in the morning (well morning for us europeans!!).I also note your comment about the timing of the gating (either have simple logic or false-path it), that sort of thing is something (in my opinion) you need to design in from the architecting level. For example, by false/multicycle constraint, you make the gating non-immediate (switch may or may not pass clock this cycle). This gating is then only suitable for on/off switching (eg turning a block on for use then off again after some period of time). This behaviour may need even the s/w guys to account for. By contrast the gating done for localised power control is immediate (the clock must pass on the next cycle). The only way this enable can meet timing is to have simple logic gating a small register bank (so the tree size after the gate is minimised).Or perhaps we could just persuade the world to stick to mains-powered devices and stop using these damn batteries ;-)CD
Hi Crispy Duck,
Before the technical discussion, I am surprise that you are in Europe
as "Crispy Duck" sound Asian. A surprise for you too; I am in Paris,
but will be in Asia next month.
You bring up a good point to explain why depending on who you talk to,
some designers want the clock tree to be after the clock gater
(obviously to save power), and some designers want the clock tree to be
before the clock gater (obviously to meet timing).
In the design that I have here, the clock gaters at the base of the
clock tree are hand instantiated, and there is always a pair of FFs
(like the synchroniser) to drive the enable of the clock gaters. In
this way, the logic from the FF to the clock gator is just a wire, and
meeting timing become easy.
It is tricky to decide what should be the latency for the pair of FFs
(in front of the clock gator). Ideally, they should have shorter
latency. However, if you want to include them in the scan chain, then
it is better to balance them. Also depend on where you place the clock
gater. If it is placed next to the PLL, and is miles away from the core
(and somehow you decide to place the pair of FF next to the clock
gaters), then it is better to not balance the latnecy, and also exclude
them from the scan chain (too many if here...).
Now, back to the original question. If the designer does not know the
impact of clock gaters on timing closures, the backend engineer will
suffer; and the quality of the layout will be bad. The new RC 6.1 has
some feature that can merge/split the enable condition of clock gaters.
This might help, or make thing worse. Also, if timing closue due to
clock gater inserted by the tool is a problem, then use a smaller
fan-out for the clock gater (and don't do declone after that. "declone"
is actually merging clock gaters together...). This will move the clock
gater "near" to the FF, and thus have similiar clock latency.
PS: CD, could you send me a mail at email@example.com. Would
like to introduce some of the works I am doing to a experienced backend
engineer like you.
Thanks everybody for your participation.
A lot of good things have been said here and I would agree that good
planning i.e. a clock gating aware architecture is the best path to
success. However we don't always have that and even when you do have a
FF drivign the enable it is no guarantee that your path will be as fast
as it could as this path will meet timing without problem in synthesis
and thing like arearecover migth actually slow this path down as there
is a large positive slack. Granted post route optimization should be
able to optimize this fairly easily but it will be better to have it as
fast as possible to start with.
In addition in many physical implementation the clock gating cells are
duplicated based on where they are on the clock tree i.e. havign
a single instance of a clock gating near the root of the clock is
good for something control with a signal like IP_ON_OFF but not
too practical if the enable is generated deep in the block. So
depending on the physical distribution of the flop you want to gate the
cloning strategy of your clock gating cell might differ and tool can
now handle that in the backend the part they don't handle very well is
the enable logic and mostly because there is no "universal constraints"
to define those signal and constrained them
We all agree that if you can have a backend estimate of your clock
latency and strategy and a stable it is not too hard to solve this
problem the problem is who ever get that and has a chance to go back
all the way back to the syntheisis. In General at that point my manager
is pushing to tape it out and screw it if it is not the best/most
So from what I have read so far I take:
Well I have not yet seen a chip trying to do cycle accurate power
up/down but I have seen that done (and done it myself) for clock fior
more than 10 years. As you say the latency through power switches
is far too great.
However I do agree that good planing is the key to all this
unfortunately it is often quite difficult to achieve down to the lower
level when re-using soft IPs coming from internal team or
outside. In general we end up witha well define chip level clock
architecture to go to the various IP with limited number of clock
gating. isolation cells and level shifter. Where we start havign
problems is when we reach the soft IP blocks which have all been
written following different guidelines (Today's guidelines are not the
one used 1 or 2 years ago) and unfortunately as the management see
those IPs are done there are no ressources to re-open and fix those.
But yes good architecture is the key to a happy inplementation engineer
(especially if it comes with detail clock network diagram and balancing
How many night did I dream about this? ....