Get email delivery of the Cadence blog featured here
We all know the importance of good immunity and how a good immune system makes you strong.
Pegasus is a strong tool which is immune to various types of failures that could occur in a typical compute environment such as network failures, disk issues, servers not responding, servers running out of memory and so on. That is why we call Pegasus Fault Tolerant.
Pegasus run does not exit despite such issues. It continues even if multiple hosts crash or do not respond due to network issues. At the same time Pegasus keeps an account of all these failures and flags warning messages like the following and skips the workers that have problems:
"WARNING: Encountered problems with Worker 4, attempting recovery..."
The rules that were scheduled on the problematic hosts are skipped. Pegasus reports the rules that were lost, and the run continues.
WARNING: Results for rule DRC.1 are lost
WARNING: Results for rule DRC.2 are lost
At the end of the run, Pegasus prints all rules that are incomplete in the user log:
ERROR: Incomplete rules: DRC.1 DRC.2
You can copy these rules and rerun them using the select_check -drc command. This is known as Manual Recovery.
It also says that the run finished abnormally in the user log:
Pegasus finished abnormally with exit code 1 -- see above for earlier errors.
It prints following message to the summary file as well:
Results for rule DRC.1 are lost
Results for rule DRC.2 are lost
RULECHECK DRC.1 ...................... Not Completed
RULECHECK DRC.2 ...................... Not Completed
In short, Pegasus continues the run despite the typical failures in a compute environment but publishes detailed reports highlighting the issues that were faced during the run.
Special thanks to Dibyendu Goswami (Digo) who is the subject matter expert of this blog.
Pegasus: Get your Wings is a blog series to showcase the capabilities of Pegasus and to familiarize you with its notable features.