• Skip to main content
  • Skip to search
  • Skip to footer
Cadence Home
  • This search text may be transcribed, used, stored, or accessed by our third-party service providers per our Cookie Policy and Privacy Policy.

  1. Community Forums
  2. Custom IC Design
  3. Sim Err

Stats

  • Locked Locked
  • Replies 9
  • Subscribers 125
  • Views 18790
  • Members are here 0
This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Sim Err

SteveDobbs
SteveDobbs over 5 years ago

cadence_ic                      6.1.7.500.23.EHF6258

I am frequently experiencing a problem running simulations via ADEXL.

Simulations fail to run and the results pane in ADEXL displays "sim err" - no output log or job log or any sort of explanation.

Looks as if the job has not even reached the (LSF) queue or is maybe unable to grab a slot.

Mostly this occurs with large corner or MC runs, although I have some failures with only a few corners.

I am then forced to Rerun unfinished/error points which usually succeeds.

The problem is intermittent.

I wonder if this is somehow related to my job policy options (see below):

Start Timeout = 300; Configure Timeout = <not set>; Simulation Run Timeout = <not set>; Linger Time = 10

  • Cancel
Parents
  • ShawnLogan
    ShawnLogan over 5 years ago

    Dear SteveDobbs,

    > Simulations fail to run and the results pane in ADEXL displays

    >  "sim err" - no output log or job log or any sort of explanation.

    I have also experienced this type of error when running ADE-XL simulations. I have never found the issue related to an LSF submission problem. Thinking back, the two issues I recall being responsible on two different occasions were a PDK model issue where one of the relevant model files was no longer available and a disk permission/access issue where ADE-XL was not able to write to the disk specified.  

    > I wonder if this is somehow related to my job policy options (see below):

    > Start Timeout = 300; Configure Timeout = <not set>; Simulation Run Timeout = <not set>; Linger Time = 10

    If there are no warnings in your CIW window after submitting your job, then I am not thinking your job submittal settings are responsible. With the exception of your Start Timeout setting, they are identical to what I use in ADE-XL and now Assembler.

    There are some documents detailing means to further debug this type of issue on Cadence Online Support if you have not seen them. For example,

    support.cadence.com/.../ArticleAttachmentPortal

    A few other possible experiments:

    1. When this occurs, can you separately generate a netlist? If so, can you run spectre from the command line (or submit it to an LSF machine) using the runSimulation file in the netlist directory?

    2. Have you tried using a Reference netlist?

    3. Have you tried porting your ADE-XL session to Assembler and submitting as an Assembler job? Since ADE-XL is no longer actively supported, you might consider this.

    Shawn

    • Cancel
    • Vote Up 0 Vote Down
    • Cancel
  • Andrew Beckett
    Andrew Beckett over 5 years ago in reply to ShawnLogan

    Steve,

    You would definitely be best placed contacting us via customer support. Also your internal support team should be able to help as well.

    One thing that might help debug this is to set:

    envSetVal("adexl.icrpStartup" "showJobStdout" 'boolean t)
    envSetVal("adexl.icrpStartup" "showJobStderr" 'boolean t)

    Then as the ICRPs try to start, it will report any issues in the CIW and CDS.log. I wouldn't have these on all the time, because if submitting a lot of jobs it gets pretty noisy, but it can often reveal if there is some reason why the jobs never start.

    Regards,

    Andrew.

    • Cancel
    • Vote Up 0 Vote Down
    • Cancel
  • SteveDobbs
    SteveDobbs over 5 years ago in reply to Andrew Beckett

    Firstly, thankyou Shawn and Andrew for your replies.

    I tried the envSetVals Andrew suggested but nothing was reported in the CDS.log.

    In response to Shawn's suggestions, the fact that re-running unfinished error points is successful and also that most other corners run successfully implies to me that there is nothing fundamentally wrong with netlist creation. I still suspect some sort of comms problem with the remote servers (I have seen communication failures in the ADEXL diagnostics pane in the past although this is usually resolved by ADEXL automatic re-submission of the offending jobs).

    Based on this I think it makes sense for me to contact local support in the first instance and then cadence support if this does not yield a solution.

    Steve

    • Cancel
    • Vote Up 0 Vote Down
    • Cancel
  • Andrew Beckett
    Andrew Beckett over 5 years ago in reply to SteveDobbs

    Steve,

    I'm rather surprised nothing was reported in the CDS.log - you should see lines such as:

    JOB 1 Stdout:

    or maybe Stderr if there is anything. Note that they won't appear if you already have ICRPs running after setting the cdsenv).

    Anyway, let's see what your local support (and my team if needed) says.

    Cheers,

    Andrew

    • Cancel
    • Vote Up 0 Vote Down
    • Cancel
  • SteveDobbs
    SteveDobbs over 5 years ago in reply to Andrew Beckett

    Andrew,

    Excerpt from my CDS.log …

    \i envSetVal("adexl.icrpStartup" "showJobStdout" 'boolean t)
    \t t
    \p >
    \i envSetVal("adexl.icrpStartup" "showJobStderr" 'boolean t)
    \t t
    \p >
    \a (axlRunSimulation)
    \p >
    \a hiiSetCurrentForm('HistoryNameForm)
    \r t
    \a HistoryNameForm->histName->value="diffstb_noClk.g0123.3ch.PVTRC"
    \r "diffstb_noClk.g0123.3ch.PVTRC"
    \a hiFormDone(HistoryNameForm)
    \r t
    \o WARNING (ADEXL-1704): The following global variable(s) are enabled, but are not used by any test in the active setup:
    \o gainT
    \o It is recommended to delete or disable these variable(s) because they are not being considered by the simulation run.
    \a _axlSelectResultsTab(axlOutputsForm0->axlOutputsWidget0 2)
    \r t
    \r 18
    \o \i envSetVal("adexl.icrpStartup" "showJobStdout" 'boolean nil)
    \t t
    \p >
    \i envSetVal("adexl.icrpStartup" "showJobStderr" 'boolean nil)
    \t t
    \p >

    So nothing (of relevance) reported between setting and unsetting the env vars. And there were no other simulations running at the time of simulation submission.

    Anyway, I have talked to local support and demonstrated the problem. One thing I saw was a large number of "communication broken" errors reported in the diagnostic pane. There is no output from a failed simulation in either netlist or psf directories. The support guy will try to reproduce the problem and we will go on from there.

    Cheers

    Steve

    • Cancel
    • Vote Up 0 Vote Down
    • Cancel
  • SteveDobbs
    SteveDobbs over 5 years ago in reply to SteveDobbs

    Andrew,

    One more thing I discovered yesterday - some entries in my .cdsenv file which affect distributed job submission. These were added some time ago during debug sessions of similar problems with our "on-site" Cadence FAE. Could any of these have a detrimental effect on job control for large corner runs?

    ; generateJobFileOnlyOnError - if set to nil all Job log files will be saved
    adexl.distribute generateJobFileOnlyOnError boolean t

    ; Set max job limit for ADEXL (note: still constrained by LSF limits, usually max=80)
    adexl.distribute maxIPCJobsLimit int 200

    ; ADE XL allows the original ICRP to re-establish the connection and to run the same point again.
    adexl.distribute enableICRPReconnect boolean t

    ; Enables continuation and completion of in-process simulations after the ADE XL GUI exits abruptly.
    adexl.distribute continueICRPRunOnAbruptGUIExit boolean t

    Cheers.

    Steve

    • Cancel
    • Vote Up 0 Vote Down
    • Cancel
  • Andrew Beckett
    Andrew Beckett over 5 years ago in reply to SteveDobbs

    Hi Steve,

    The first is the default value, so that won't make a difference. The second is lower than the default of 1000; I see no reason to set that nor see why you'd ever want to - it's possible that this might limit the maximum number of jobs you can run, so I would remove that. The third is rarely used, and I'd be surprised if you need it (I don't think I've ever come across anyone using that) - it's specifically there to support DRMS (queueing systems) that can re-target a job onto a different machine after it has started, which is a pretty rare thing to have. The last is fairly widely used - it's possible it could have an influence, although I think we'd have heard about it. Of course, you are using an old version, so that might be relevant too.

    In general I would recommend moving to the latest IC618/ICADVM181 hotfix, and probably using LSCS instead of ICRP (this is a new job submission mechanism), for which you'd need to be using ADE Assembler. This is a lot more robust (well, it's designed to be) in terms of handling lots of parallel simulations. However, I recognise you can't do that in isolation - you'd need to do that in consultation with your local support.

    Sorry for a quick answer - I'm on vacation today.

    Regards,

    Andrew.

    • Cancel
    • Vote Up 0 Vote Down
    • Cancel
  • SteveDobbs
    SteveDobbs over 5 years ago in reply to Andrew Beckett

    Hi Andrew,

    Update:

    Our IT support engineer (thinks he) has tracked down the source of the problem. I had linger time = 10s in my job policy instead of the defaults 300s (can't remember why I changed this. Some time in the distant past).

    Our LSF control allows a maximum number of CPU slots (sim. runs) per user at any one time. A multiple corner sim. which needs more that this number necessarily means that some runs are pending until slots become free. My understanding is that when a run finishes, the slot is held by the user for the linger time. During this time a new run should take the slot and start executing. However, if the time taken for the new run to start exceeds the linger time, the slot is lost and the job disappears into the ether. The result is "Sim Err" with no error reporting.

    So I have increased linger time to 300s. So far, no more errors.

    I would be interested if you have any comments on this diagnosis. Is my understanding correct?

    Cheers

    Steve

    • Cancel
    • Vote Up 0 Vote Down
    • Cancel
  • Andrew Beckett
    Andrew Beckett over 5 years ago in reply to SteveDobbs

    Hi Steve,

    Having a very short linger time can certainly lead to problems - although I am slightly surprised that it would just cause the simulation to fail. Normally it would try again to submit it, but perhaps you have other env vars set to disable the resubmit of failed jobs.

    Andrew.

    • Cancel
    • Vote Up 0 Vote Down
    • Cancel
Reply
  • Andrew Beckett
    Andrew Beckett over 5 years ago in reply to SteveDobbs

    Hi Steve,

    Having a very short linger time can certainly lead to problems - although I am slightly surprised that it would just cause the simulation to fail. Normally it would try again to submit it, but perhaps you have other env vars set to disable the resubmit of failed jobs.

    Andrew.

    • Cancel
    • Vote Up 0 Vote Down
    • Cancel
Children
No Data

Community Guidelines

The Cadence Design Communities support Cadence users and technologists interacting to exchange ideas, news, technical information, and best practices to solve problems and get the most from Cadence technology. The community is open to everyone, and to provide the most value, we require participants to follow our Community Guidelines that facilitate a quality exchange of ideas and information. By accessing, contributing, using or downloading any materials from the site, you agree to be bound by the full Community Guidelines.

© 2025 Cadence Design Systems, Inc. All Rights Reserved.

  • Terms of Use
  • Privacy
  • Cookie Policy
  • US Trademarks
  • Do Not Sell or Share My Personal Information