In my design i got many clock gating setup violations which are due to negative skew i.e., my launch clock delay is more than capture clock delays.Placement in the design is clock gate aware.
Can someone suggest some techniques for these kind of violations.
This is a classical design challenge and a common source of difficulty. I'm glad you raised the question, even if it is a tough subject to tackle in a forum like this. It's tough because it cuts across many areas of functionality in the tool: placement, cts, and optimization.
Two things I'd mention for your consideration:
"ckCloneGate" will attempt to push the integrated clock gating cell (ICG) down as close as possible to the flops. If successful this will reduce the amount of skew you're seeing for paths ending on enable pins of ICGs.
If you're using 8.1, you could try the "setOptMode -clkGateAware true" option (Note: This options serves a different purpose than setPlaceMode -clkGateAware!) which automatically models the earlier arrival time of the clock to the ICG clock pin, which gives preCTS optimization the ability to see these violations and potentially fix them if possible. If you're using a release newer than 8.1, you could achieve similar with a script in the gifts (<install>/share/fe/gift/scripts/tcl/) directory: "userSetClkLatToIcgCkPins.tcl".
Maybe you could consider these options and share some additional information on the nature of the problem you're seeing. Specifically, do you think the problem is a matter of preCTS optimization not seeing these violations? Or whether clock gate cloning is needed to push the ICGs farther down the tree?
Hope this helps,Bob
In reply to Robert Dwyer:
Actually i am using SOC71 so i dont find setOptMode -clkGateAware option.
In my design i have made the nets after the icg till the branching point to registers as dontTouchNet, hence the buffers which were in that net got place in capture path of icg clock pin.So my logic got extra time and slack is met.
But this did not work for all paths.
I feel while building CTS tool should balance skew even with respect to the clock pin of icg which will reduce violations in clock gating paths .
In reply to maven7783:
I think I know how you feel (wanting CTS to balance skew with respect to the clock pin of the ICG). I felt the same way last time I was working through a scenario similar to this. But I found that if I balanced to the clock pin of the ICG that the downstream registers driven by the ICG would have increased insertion delay relative to the rest of the registers in the design and as a result large skew. It would just be pushing the problem to other paths in the design.
If you wanted to test that the theory that balancing to the clock pin of the ICG would be the right thing to do, you could specify the clock pins of the ICG lib cell(s) as LeafPorts in your clock tree spec file (but be aware that anything downstream from these points will not be buffered which I consider a problem that makes this approach unsuitable):
LeafPort+ ICGX1/CK rising+ ICGX2/CK rising
Similarly, I think marking the output nets of the ICGs dont_touch isn't suitable because it prevents the tool from optimizing in cases where buffering is required to reduce delay.
If I were looking at your design, I'd ask these questions:
What is the fanout of the ICGs in the cases where you're seeing a setup violation on the enable pin? If it is greater than 10 or so I would think this is a case where you'd need to do cloning. If it's less than 10 it seems like a case where the tool did a poor job placing the ICG at the center of gravity of the flops it drives (ie, "setPlaceMode -clkGateAware false" didn't work very well.)
i attached a doc which shows rough scenario in my design.
While building CTS i made the net (shown in brown in the doc) as dontTouch (in .ctstch file) so that tool while building CTS moved the buffer on the brown line and added it before the clk pin of icg (indicated by blue line in doc) satisfying the latency.
This is not the scenario of all icgs there were some icgs whose fanout is more than 10.
Obviously many path will get violated if i specify icg ck pin as leaf port.Tool should not stop at icg ck pin but it should consider the icg ck as intermediate balancing point and not put many buffers after it to meet max delay constraint.
i thought of cloning and come back to you after using it.
I downloaded and viewed your attachment. Thanks for sending that. I think we're aligned on the fundamental challenge presented here.
I think the tool is inserting buffers downstream from the ICG for a reason. Either it needs to satisfy max cap/max tran -or- it is seeking to minimize overall insertion delay. One thing that occurred to me is that perhaps your ICGs are marked dont_touch in the .lib? It's fairly common that the ICGs would be marked by the library provider as such. The reason I think this could be relevant is that if the ICGs were marked dont_touch, it would disallow the tool from upsizing the ICG to meet max cap/tran and the tool would be forced to insert buffers downstream from the ICG as a result.
To check whether this might be the case, check for dont_touch (and dont_use while you're at it):
encounter 11> get_property [get_lib_cell <name_of_your_icg_cell>] is_dont_touchencounter 12> get_property [get_lib_cell <name_of_your_icg_cell>] is_dont_use
If this is the case, you'll want to relieve the dont_touch/use markings within the tool to enable CTS to resize the ICG to avoid making downstream buffer insertion necessary.
Just something else to consider in conjunction with icg cloning.
ICGs in my design doesnt have dontTouch/dontUse properties.
As you said buffers were placed after icg only to minimize insertion delay in some cases and other cases to meet max tran or max cap constraints.
I have used the script userSetClkLatToIcgCkPins.tcl in my design but could not understand it completely.
From my understanding, along with the script WNS (after CTS value) was given.
An sdc output had come having set_clock_latency values equal to WNS ( -1 was given by me) . It indicates clock is made to reach the ck pins of icgs at the earliest time in perCTS stage (this can be confirmed from the reports where other end arrival time is -1), optDesign -preCTS was run and there is no significant improvement.
Is my understanding correct?
This script is meant to optimize the data path for the earliest arrival of clock.
Thanks for giving the script a try. I think you are understanding how it works correctly.
It sounds like you're missing timing by -1ns postCTS. It would be useful to know, for your worst paths, whether this is caused by significant clock skew on a path ending on an ICG enable pin. If it is, then this script will give you insight into whether the tool could do a better job optimizing these paths preCTS than it can postCTS. You indicated there is no significant improvement- so that implies that the problem can't really be solved by enabling the tool to see the violations preCTS.
Are these paths meeting timing preCTS? If so, it would indicate that there was approixmately a 1ns skew on paths ending on ICGs, which would also imply that the clock is arriving at certain ICGs 1ns earlier than the downstream flops it drives. That is a very large delay, and it sounds like we need to fundamentally seek a way to resolve the situation whereby so much downstream delay occurs after the ICG. What is the fanout from the output of the ICG to the downstream flops it drives in cases where there is such bad skew? If it is large (greater than 10) then I'd recommend pursuing cloning. If it is small (less than 10) then I'd recommend seeking to understand why "setPlaceMode -clkGateAware true" didn't place the ICG at the center of gravity of the downstream flops it drives.
Thanks for your replies.
My design had long run times thats why i could not respond to you quickly.
i used useful skew in preCts and cloning to close my design.Due to useful skew my design skew was bad (400ps) but timing was met and icg have a max fanout of 32.
Did some ecos in postRoute stage and closed my design.
Your script worked well on a small design.
Thanks for posting this follow-up! I'm glad to hear you've achieved timing closure.
Your success was due in large part to your dilligence in troubleshooting and addressing the problem. Thanks for posing the question and engaging in this dialog in the forums- I hope others might benefit from our discussion. I'd like to highlight this topic in a future blog entry.