Stats

Locked Locked
Replies 4
Subscribers 62
Views 16810
Members are here 0

This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

RC synthesis flows

Dear all,

The question may be quite trivial for many out in this forum .. .i would like to discuss more of a methodology related question with respect to RC tool usage ...for synthesis .!!!

What are the several Synthesis flows recommended by RTL compiler?

Not all designs are computational intensive datapaths, Few are intensive datapaths, Few are Clock intensive paths( more of clock path generation) and few designs are Memory intensive..

What is the best way to deal with each type of the Designs interms of methodologies .. ? how do we ensure that we have optimized the best possible way? How to verify for any improvements in the design QoR, timing, area & power...?

Apart from CG gating, what other techniques do we employee to target the low power synthesis?

I also found that Multi-Vt optimization in synthesis is not a good idea from the tool perspective as this is not giving a good results?

Usually, DC provides a lot of template scripts targetting for timing, area & power separately? Any such tricks available in RTL compiler as well??

Please share your views !!

thanks

suresh

grasshopper over 13 years ago

Hi Suresh, nothing trivial about this question :) As you know, there are multiple ways to skin the cat. The RC write_template command provides a nice mechanism to generate various template for running RC but most of these are feature driven and not goal driven. It shows you for example, how to do retiming, and not necessarily the different things you can do when doing retiming. The goal of RC is to provide best QoR or close to best with the template flows provided. That being said, sometimes user has a need to make changes to squeeze the very last drop on a particular metric. I hate to give you a non-answer but I will. The issue is that there is no one-size fits-all answer when pursuing every last drop of QoR. There is certainly a trade-off user makes between time-to-good results vs. best results. As you know, time is the worst enemy of engineers. I will try to provide some quick suggestions that may be helpful and encourage you to enlist your local support to help you tailor things further: (1) Targets, targets, targets Targets are a key element of RC optimization, make sure you understand them and adjust flow accordingly (2) Constraints Previous generations of MIS/SIS based synthesis tools relied on over-constraining. This is definitely not the right approach to RC. Only over-constraint if you have a valid reason or evidence that it yields better results in your design using RC. Otherwise you are just giving up area and power for no good reason (3) Best QoR This means different things to different users. If you have time and resources, I always encourage users to do multiple runs to identify performance bounds of their runs. For example, you can run without constraint to obtain an area bound. You can run without timing constraint but provide power constraint to obtain a power-bound (in smaller technologies, area and power do not necessarily correlate. The problem is many times users will demand improvements on a given metric and not always know whether or not it is even feasible. (4) Low Power Design Today, CPF can be used to enable several LP techniques beyond clock gating(MSMV, PSO, DVFS, etc.). RC also provides what-if analysis where you can retarget, on-the-fly a block for a different voltage and see the impact without having to re-run a full synthesis. This will not give you best QoR but will provide you a good idea of what can be accomplished. One key thing when doing LP design is ensure you have representative activity vectors. There is limited value in spending lots of time optimizing a design to vectors that may not be representative of the design activity. You know what they say, GIGO... For large, complex designs, DPA (RC + Palladium emulation) provides best solution to quickly and accurately extract desired activity vectors (5) Physical-aware Especially in smaller geometrics, use of physically aware synthesis has become extremely important to reach the best QoR and avoid surprises when handing over your design to P&R (6) on Multi-VT A lot of effort has gone into the RC multi-VT flow and currently our recommendation is to enable multi-VT in synthesis with an understanding of your operating envelope. However, this depends on your backend flow as well since some tools/flows end up undoing the work here and potentially causing more harm than good. It is therefore important that your P&R flow is complimentary to your synthesis solution hope my non-answer helped, gh-
Cancel
Vote Up 0 Vote Down

Cancel
sureshm over 13 years ago

Hi gh,
Thanks for the answer !!
You are right, RC template scripts are mainly feature driven, unlike DC provides the goal driven template ( best fit-in template for the given target accepted by most of the designers )
I am currently using 20% higher frequency with no wireload model and an uncertainity of 250ps ( blanket) for all the clocks, Do you think it is a right approach ? or any other approach that you find is more appropriate !!
Also, I am not sure if RC picks up the right architectures for most of the DW components !!, Do you have any idea, if RC optimizes CW components than to DW components better?
Analyzing design for different perfomance bounds is a very good idea, keeping one target @ a time.. however, how to take those details in the final design run, where all the targets are in place independent of each other. However, RC claims, that the tool try to fix the targets, orthogonally,which means it tries to employ best algorithms to meet the target withtout effecting others !!
Thanks
suresh
Cancel
Vote Up 0 Vote Down

Cancel
grasshopper over 13 years ago

HI Suresh, I am assuming when you indicate no wireload, you are indeed reading all appropriate physical collateral to enable PLE synthesis. 20% higher frequency indeed sounds excessive as does 250 ps uncertainty but I do not even know your clock period or technology so hard to say. As per the 20% number you mention I would ask, what is the basis for 20% ? rule of thumb or any specific experience with this design and technology. In general I would recommend a number that account for SI impact, OCV impact, and effects known not to be captured in the synthesis portion of the flow. Usually this number is around 5-10% + uncertainty to model skew, PLL jitter, etc. As per optimizing DW vs CW, there should not be a significant differences either way. Using the high-level constructs is always preferred if possible but depending on what functions you are trying to model with xxWare this may or may not be desirable. If you suspect of RC picking the wrong xxWare, I would first take a look at the 'targets' in your log. The architectures chosen will be largely driven by those hence make sure they are a true reflection of what you expect. I think your last paragraph is not accurate. RC does not try to target cost functions orthogonally. On the contrary, it tries to address them concurrently such that the best trade-off is made. The comment about not affecting others is somewhat misleading. We all know 'you do not get something for nothing' When you pick a really fast address, it is usually bigger than the slow adders so there certainly is an impact but the goal of concurrent optimization to make decisions that yield the best QoR. If I tried to go only for, say area, first and then do the rest, I will spend considerably more time fixing timing or may not be even able to do so. hope this helps clarify, gh-
Cancel
Vote Up 0 Vote Down

Cancel
sureshm over 13 years ago

Hi gh,
When i am doing Zero WL for nets, I am not enabling the PLE Synthesis. Infact, we did couple of experiements with the block sizes and the nature of the clocks, finally came to an experimental number that worked with most of the designs/blocks of our chip. PLE with 5-10% of clock frequency is almost giving the similar results in QoR. I wonder, if we get any additional advantage using PLE synthesis with 5-10% of Clock Frequency as oppose to 20% (Zero WL).
You are right, Nothing comes for free..!! In order to achieve something, it costs something else :) .I am mistaken in the previous message.
Thanks
suresh
Cancel
Vote Up 0 Vote Down

Cancel

Community Guidelines

The Cadence Design Communities support Cadence users and technologists interacting to exchange ideas, news, technical information, and best practices to solve problems and get the most from Cadence technology. The community is open to everyone, and to provide the most value, we require participants to follow our Community Guidelines that facilitate a quality exchange of ideas and information. By accessing, contributing, using or downloading any materials from the site, you agree to be bound by the full Community Guidelines.