• Skip to main content
  • Skip to search
  • Skip to footer
Cadence Home
  • This search text may be transcribed, used, stored, or accessed by our third-party service providers per our Cookie Policy and Privacy Policy.

  1. Community Forums
  2. Custom IC SKILL
  3. pcreCompile and use of OR in search pattern

Stats

  • Locked Locked
  • Replies 7
  • Subscribers 143
  • Views 7502
  • Members are here 0
This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

pcreCompile and use of OR in search pattern

FuadBadrieh
FuadBadrieh over 2 years ago

Hello:

my very first post on Cadence community so please be gentle Slight smile.  I am trying to replace a "^x" OR ".x" with " " (space).

So if my input is "xi1.xi2.xx3" I would like output to be " i1 i2 x3".  Having failed I broke problem into two pieces and while

each piece works on its own, combined case does not.  Here are the details:

First sample works as expected

pat=pcreCompile("^x") pcreReplace(pat "xi1.xi2.xx3" " " 0)

" i1.xi2.xx3"

 

Second sample also works as expected

pat=pcreCompile(\\.x) pcreReplace(pat "xi1.xi2.xx3" " " 0)

"xi1 i2 x3"

 

Now do first OR second and get UNEXPECTED result!!!

pat=pcreCompile("^x|\\.x") pcreReplace(pat "xi1.xi2.xx3" " " 0)

" i1 i2  3"

I would have expected, based on common sense and based on first two searches to get

“i1 i2 x3”

but instead I am getting “3” and not “x3” in return match.   What am I missing?

Thanks

  • Cancel
Parents
  • Andrew Beckett
    Andrew Beckett over 2 years ago

    This is a quirk of how the implementation of pcreReplace works with 0 as the fourth argument works. What will happen is that it does the first substitution, and then tries again on the remainder of the string and so on; the challenge is that if your pattern includes the start anchor ^ then the sub-strings may match as f they were the start. You can see this also happening with:

    pat=pcreCompile("^x")
    pcreReplace(pat "xxxx" "" 0)
    => ""

    which is probably not what you'd expect. It might just be simpler to do the two matches separately; I'm not sure how easy it would be to fix this - although conceptually I would expect using 0 to behave like s/pat/sub/g - i.e. a global repetition replacement, and that's not what happens.

    I'll probably file a CCR on this.

    Andrew

    • Cancel
    • Vote Up 0 Vote Down
    • Cancel
  • FuadBadrieh
    FuadBadrieh over 2 years ago in reply to Andrew Beckett

    Hi Andrew:

    BTY, I've read your posts in the past and I am a big fan!  You have been a constant source of assistance and inspiration for those new to the skill world. Pleasure to speak to finally you personally!

    I understand the issue at work in the example you provided.  But I am having some difficulty seeing how that is working on my example. 

    pat=pcreCompile("^x|\\.x") pcreReplace(pat "xi1.xi2.xx3" " " 0)

    I would have thought that the tool would first find the "^x" and replace with space:

    " i1.xi2.xx3"

    Then it would find the ".x" and replace too with space:

    "i1 i2 x3"

    But, again I am getting "i1 i2 3" instead of the above. So the ".xx3" seems to have been processed more than once, but not sure exactly how.

    I am not seeing exact symptoms as example you provided bur perhaps I have not looked deep enough.

    What is the exact root of the problem?  Is it the OR "|", the anchor "^", the DOT ".", some or all of the above?

    Also, what is a sub string in your context?  If not much trouble can you walk through the starting string and show how it transforms one step at a time as the regular expression is parsed through?

    Thanks again

    Fuad

    • Cancel
    • Vote Up 0 Vote Down
    • Cancel
  • Andrew Beckett
    Andrew Beckett over 2 years ago in reply to FuadBadrieh

    Fuad,

    To be honest, when I said "quirk" in my earlier reply, that was just my polite British way of saying "bug". I think this behaviour is wrong. In essence, when the last argument to pcreReplace is 0, it repeatedly finds a pattern, and then moves the starting point for the next pattern check to just after the last replacement - which means that having change the ".x" after i2 to " ", it starts with the remaining string "x3" and it (erroneously) sees that x as anchored to the beginning and replaces that. This is clearly wrong.

    So the problem is when an anchored expression is used and the pcreReplace count is 0 or anything other than 1, in fact. The same problem occurs if I do this:

    pat=pcreCompile("^[xy]")
    pcreReplace(pat "xyz" "@" 1) ; => "@yz"  - good
    pcreReplace(pat "xyz" "@" 2) ; => "x@z" - bad

    I'm filing this as a bug with R&D. Haven't quite done that yet - will do so later today.

    Regards,

    Andrew

    • Cancel
    • Vote Up 0 Vote Down
    • Cancel
  • FuadBadrieh
    FuadBadrieh over 2 years ago in reply to Andrew Beckett

    Hi Andrew:

    I think I understand the problem now. And this brings up a related question on regular expressions in general.  When in doubt is there a "golden" reference, "POSIX" compatible tool that one can use to test against questionable results as that in this case?  I've had this issue on and off, and talking with peers response I was getting was tantamount to something like "every tool has its interpretation of regular expressions" which quite frankly is concerning.  Maybe I am old fashioned but the two tools that I've used over decades for regular expressions are sed and gawk.  And both seem to give correct answer (as per our mutual agreement in this post) as follows

    bash#> echo "xyz" | gawk '{r=gensub("^[xy]", "@","g"); print r}'
    @yz
    bash#> echo "xyz" | gawk '{r=gensub("^[xy]", "@",1); print r}'
    @yz
    bash#> echo "xyz" | gawk '{r=gensub("^[xy]", "@",2); print r}'
    xyz

    bash#> echo "xyz" | sed -e 's/^[xy]/@/g'
    @yz
    bash#> echo "xyz" | sed -e 's/^[xy]/@/1'
    @yz
    bash#> echo "xyz" | sed -e 's/^[xy]/@/2'
    xyz

     So, and when in doubt and when using either rexCompile (which as per an older response of yours is outdated) or pcreCompile (as is case here) can I still use those tools (sed and gawk) as a reference and guide (to test if response is correct or ... a bug!!) in assessing rex/pcre responses or that's not the case?  And if not any insight why "big players" in CAD tools industry and folks from GNU open/source world can't work out a consistent set of regular expression standards that everyone can work with?  Philosophical and perhaps out of this user group scope?  Perhaps yes, but seems like someone ought to bring this up!!

    Thanks again

    Fuad

    • Cancel
    • Vote Up 0 Vote Down
    • Cancel
Reply
  • FuadBadrieh
    FuadBadrieh over 2 years ago in reply to Andrew Beckett

    Hi Andrew:

    I think I understand the problem now. And this brings up a related question on regular expressions in general.  When in doubt is there a "golden" reference, "POSIX" compatible tool that one can use to test against questionable results as that in this case?  I've had this issue on and off, and talking with peers response I was getting was tantamount to something like "every tool has its interpretation of regular expressions" which quite frankly is concerning.  Maybe I am old fashioned but the two tools that I've used over decades for regular expressions are sed and gawk.  And both seem to give correct answer (as per our mutual agreement in this post) as follows

    bash#> echo "xyz" | gawk '{r=gensub("^[xy]", "@","g"); print r}'
    @yz
    bash#> echo "xyz" | gawk '{r=gensub("^[xy]", "@",1); print r}'
    @yz
    bash#> echo "xyz" | gawk '{r=gensub("^[xy]", "@",2); print r}'
    xyz

    bash#> echo "xyz" | sed -e 's/^[xy]/@/g'
    @yz
    bash#> echo "xyz" | sed -e 's/^[xy]/@/1'
    @yz
    bash#> echo "xyz" | sed -e 's/^[xy]/@/2'
    xyz

     So, and when in doubt and when using either rexCompile (which as per an older response of yours is outdated) or pcreCompile (as is case here) can I still use those tools (sed and gawk) as a reference and guide (to test if response is correct or ... a bug!!) in assessing rex/pcre responses or that's not the case?  And if not any insight why "big players" in CAD tools industry and folks from GNU open/source world can't work out a consistent set of regular expression standards that everyone can work with?  Philosophical and perhaps out of this user group scope?  Perhaps yes, but seems like someone ought to bring this up!!

    Thanks again

    Fuad

    • Cancel
    • Vote Up 0 Vote Down
    • Cancel
Children
  • Andrew Beckett
    Andrew Beckett over 2 years ago in reply to FuadBadrieh

    Fuad,

    Actually, the regular expression support with the pcre functions is standardised - it's using the Perl-Compatible Regular Expression support  as described at https://en.wikipedia.org/wiki/Perl_Compatible_Regular_Expressions and https://pcre.org/ . Also covered as part of https://en.wikipedia.org/wiki/Regular_expression which gives the history. So using PCRE is a pretty solid basis of building rich and powerful regular expression support.

    The challenge here is not with the regular expression support itself, but the implementation of pcreReplace in how it calls the underlying PCRE functions; that's what's wrong, not the regular expression support itself. That's just a plain bug (IMHO). Bugs happen, of course - somehow this one slipped through (I didn't find any reports of it so far, despite the PCRE functions within SKILL being very widely used).

    The ancient rexCompile functions were very limited and only supported basic regular expressions, which is why the PCRE APIs were introduced around 17 years or so back.

    Regards,

    Andrew

    • Cancel
    • Vote Up 0 Vote Down
    • Cancel
  • Andrew Beckett
    Andrew Beckett over 2 years ago in reply to Andrew Beckett

    Fuad,

    I filed change request (CCR) 2845977. Of course, greater priority is given to requests coming from customers, so you might want to contact customer support and request that a duplicate is filed on your behalf.

    Kind Regards,

    Andrew

    • Cancel
    • Vote Up 0 Vote Down
    • Cancel
  • FuadBadrieh
    FuadBadrieh over 2 years ago in reply to Andrew Beckett

    Hi Andrew:

    thank you for following up on this.  I think I am starting to see the bigger context of this.  PCRE is a standard library which different tools can use (and seems like a successor to POSIX?).  In this case it is Cadence SKILL tool; but other tools, such as R too use PCRE.  For example, and for our beloved example here I observed that R actually will give the correct outcome as shown below

    gsub("^x|\\.x", " ", "xi1.xi2.xx3", perl=TRUE)
    [1] " i1 i2 x3"

    So, we have confirmed (again) that it is ... Cadence's implementation that is at odds with pcre (for this very specific case), and not pcre itself. Sorry, not to state the obvious I am just trying to understand the different components in the game. If all is good then looks like, and as you stressed PCRE is a (or the) solid platform for regular expressions and seems to me the logical way to invest down this path, and hopefully this minor bug fixed soon. Please add on/correct of needed.

    Regards,

    Fuad

    • Cancel
    • Vote Up 0 Vote Down
    • Cancel

Community Guidelines

The Cadence Design Communities support Cadence users and technologists interacting to exchange ideas, news, technical information, and best practices to solve problems and get the most from Cadence technology. The community is open to everyone, and to provide the most value, we require participants to follow our Community Guidelines that facilitate a quality exchange of ideas and information. By accessing, contributing, using or downloading any materials from the site, you agree to be bound by the full Community Guidelines.

© 2025 Cadence Design Systems, Inc. All Rights Reserved.

  • Terms of Use
  • Privacy
  • Cookie Policy
  • US Trademarks
  • Do Not Sell or Share My Personal Information