• Skip to main content
  • Skip to search
  • Skip to footer
Cadence Home
  • This search text may be transcribed, used, stored, or accessed by our third-party service providers per our Cookie Policy and Privacy Policy.

  1. Community Forums
  2. Custom IC SKILL
  3. Regular Expression: detecting 2 or more consecutive spaces...

Stats

  • Locked Locked
  • Replies 7
  • Subscribers 143
  • Views 24817
  • Members are here 0
This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Regular Expression: detecting 2 or more consecutive spaces oddity

Sheppy
Sheppy over 9 years ago

Hi,

For a particular script I have to check whether or not a string has two or more consecutive spaces between lower-case characters. Zero and one space are allowed, two or more not. I am using rexMatchp() to check. Please have a look at the following code+output:

rexMatchp("[a-z][ ]{2,}[a-z]" "test test")

nil

rexMatchp("[a-z][ ]{2,}[a-z]" "test  test")

nil

rexMatchp("[a-z][ ]{2,}[a-z]" "test   test")

nil

rexMatchp("[a-z][ ][ ]+[a-z]" "test test")

nil

rexMatchp("[a-z][ ][ ]+[a-z]" "test  test")

t

rexMatchp("[a-z][ ][ ]+[a-z]" "test   test")

t

The three last commands do exactly what I want. However, the first three do not work, although this is perfectly legal syntax for a regular expression. I found in the Virtuoso documentation that using {} to specify the number of occurrences is supported, but it clearly doesn't work in my example.

Am I doing something wrong?

With kind regards,

Sjoerd

  • Cancel
Parents
  • Sheppy
    Sheppy over 9 years ago

    Hi Andrew,

    I keep having problems with the regular expression syntax in SKILL.

    This is the string with the special characters I want to check:

    regExprMatchPattern = "[a-zA-Z0-9`-=;'./~!@$%^&*()_+{}|:\"<>?]"

    When I used rexMatchp it works:

    rexMatchp( regExprMatchPattern previousChar )

    And it works like this as well:

    validLine = !rexMatchp( strcat( regExprMatchPattern "[ ][ ]+" regExprMatchPattern ) nextLine )

    However, when I replace rexMatchp with pcreMatchp or pcreCompile/pcreExecute, it doesn't work at all. There is a problem with the characters in the regExprMatchPattern. This is what I tried, and what doesn't work:

    validLine = !pcreMatchp( strcat( regExprMatchPattern "[ ]{2,}" regExprMatchPattern ) nextLine )

    OR

    regExprMatchPattern = "[a-zA-Z0-9`-=;'./~!@$%^&*()_+{}|:\"<>?]"
    rexPatternSingleSpace = pcreCompile( regExprMatchPattern )
    rexPatternMultipleSpace = pcreCompile( strcat( regExprMatchPattern "[ ]{2,}" regExprMatchPattern ) )
    printf( "%L - %L\n" rexPatternSingleSpace rexPatternMultipleSpace )
    validLine = !pcreExecute( rexPatternMultipleSpace nextLine )
    rexMatchp( regExprMatchPattern previousChar )

    When I tried to find out which characters were causing the issue, I wrote a bit of code like this:

    testString = "[`-=;'./~!@$%^&*()_+{}|:\"<>?]"
    for( i 1 strlen( testString ) printf( "%L - %L\n" substring( testString i 1 ) pcreCompile( substring( testString i 1 ) ) ) )

    It showed that the characters *()+? are causing problems. After removing them from the testString, all pcreCompile() commands resulted in a pcreobjId.

    However, when I removed those characters from my code, it still did not work (not working for both pcreMatchp and pcreCompile/pcreExecute).

    It turns out that various combinations of characters are causing problems as well. That's where I stopt testing: there are thousands of combinations to test. A Google search did not return something useful.

    Can you help me to solve this regular expression problem (detecting all the characters in regExprMatchPattern)?
    Bonus question: how to detect [] in a regular expression (\[ and \] do not work)?

    I'm having issues with detecting EOF (End Of File) as well. The input file for my code is a CSV file, but some tools generate a CSV file without a EOL/LF/NL/CR (End Of Line/Line Feed/New Line/Carriage Return) at the end. The values at the end of the file should be processed as well.

    Below you'll find the actual code and an example input file (I couldn't find a button to attach some files...).
    The code works like it has to work with the given example. Some lines are commented-out, at two points, just replace the ; to get the other cases. Remember: if line 1 is used at location 1, then line 1 must be used at location 2 as well, 2 & 2 and 3 & 3 of course.

    The code (stripped down, the processing of listCellList is removed):

    procedure( testLineRead( inFileName "S" )
        let( (    ( inPort nil )
            ( lineNr 1 ) ( nextLine "" ) ( lineLength 0 ) ( validLine nil )
            ( previousChar "" ) ( currentChar "" ) ( nextChar "" )
            ( cell "" ) ( cellList nil ) ( listCellList nil )
            ( regExprMatchPattern "") ( rexPatternSingleSpace nil ) ( rexPatternMultipleSpace nil )
            ( i 1 )
            )

            regExprMatchPattern = "[a-zA-Z0-9`-=;'./~!@$%^&*()_+{}|:\"<>?]"
            rexPatternSingleSpace = pcreCompile( regExprMatchPattern )
            rexPatternMultipleSpace = pcreCompile( strcat( regExprMatchPattern "[ ]{2,}" regExprMatchPattern ) )
            printf( "debug: %L - %L\n" rexPatternSingleSpace rexPatternMultipleSpace )

            when( isFile( inFileName )
                inPort = infile( inFileName )
                when( inPort
                    while( gets( nextLine inPort )
                        lineLength = strlen( nextLine )
                        previousChar = ""
                        currentChar = ""
                        nextChar = ""
                        cell = ""
                        cellList = nil

                        ;;; location 1: reg-expr. test lines (3):
                        validLine = !rexMatchp( strcat( regExprMatchPattern "[ ][ ]+" regExprMatchPattern ) nextLine )
                        ;validLine = !pcreMatchp( strcat( regExprMatchPattern "[ ]{2,}" regExprMatchPattern ) nextLine )
                        ;validLine = !pcreExecute( rexPatternMultipleSpace nextLine )

                        unless( validLine
                            warn( "Illegal number of spaces ( >1 ) in cell value detected - (%s - line %d), please fix!" inFileName lineNr )
                        ) ;;; end of unless

                        while( validLine && i <= lineLength
                            currentChar = substring( nextLine i 1 )
                            nextChar = substring( nextLine i + 1 1 )
                            cond(
                                ;;; Filter out all white spaces or tabs...
                                ;;; But a bit more complicated than that: a valid value may contain spaces, but NEVER at the start or end.
                                ;;; A valid value may be "M1 path", the space between "M1" and "path" MUST NOT be removed.

                                ;;; location 2: reg-expr. test lines (3):
                                ( ( currentChar == " " && rexMatchp( regExprMatchPattern previousChar ) && rexMatchp( regExprMatchPattern nextChar ) )
                                ;( ( currentChar == " " && pcreMatchp( regExprMatchPattern previousChar ) && pcreMatchp( regExprMatchPattern nextChar ) )
                                ;( ( currentChar == " " && pcreExecute( rexPatternSingleSpace previousChar ) && pcreExecute( rexPatternSingleSpace nextChar ) )
                                    cell = strcat( cell currentChar )
                                    previousChar = currentChar
                                )
                                ;;; All other whitespaces and tabs must be filtered out...
                                ( ( currentChar == " " || currentChar == "\t" )
                                    previousChar = currentChar
                                )
                                ;;; Filter out empty lines...
                                ;;; An empty line may contain whitespaces, but no valid characters.
                                ( ( currentChar == "\n" && cell == "" && cellList == nil )
                                    warn( "Empty line (%s - line %d) detected, please remove..." inFileName lineNr )
                                    previousChar = currentChar
                                    i = lineLength
                                )
                                ;;; Filter out all comments (a comment is all after a #)...
                                ;;; When the line only contains a comment, with or whithout whitespaces left of the #...
                                ( ( currentChar == "#" && cell == "" && cellList == nil )
                                    previousChar = currentChar
                                    i = lineLength
                                )
                                ;;; When the line contains a comment after one or more valid cells (with or without whitspaces in-between)...
                                ( ( currentChar == "#" && cell != "" )
                                    cellList = append( cellList list( cell ) )
                                    previousChar = currentChar
                                    i = lineLength
                                )
                                ;;; When the line contains a comment AND the last non-whitespace character was a "," (with or without whitspaces in-between)...
                                ;;; If the last non-whitespace character was NOT a "," "cell" would have had a value and thus the previous condition would have been true.
                                ;;; The current value of "cell" is therefor "" (empty string) thus it is safe to add this to the "cellList".
                                ;;; If there is a ",", there is at least 1 cell in cellList, either a valid cell value or ""
                                ( ( currentChar == "#" && cellList != nil )
                                    cellList = append( cellList list( cell ) )
                                    previousChar = currentChar
                                    i = lineLength
                                )
                                ;;; Detect the comma's that separate the values (that's why it is a CSV-file...)
                                ;;; If the first valid character of the line is a ",", so previousChar = "", with or without whitespaces
                                ;;; If empty in-between two ","'s (so no valid character(s))...
                                ( ( currentChar == "," && ( previousChar == "" || previousChar == "," ) )
                                    cell = ""
                                    cellList = append( cellList list( cell ) )
                                    previousChar = currentChar
                                )
                                ;;; Empty in-between a comma and new-line, so add "" to the list...
                                ( ( currentChar == "\n" && previousChar == "," )
                                    cell = ""
                                    cellList = append( cellList list( cell ) )
                                    previousChar = currentChar
                                )
                                ;;; A comma, and none of the special things in front of it, so construct the new cell...
                                ;;; A new-line, and none of the special things in front of it, so construct the new cell...
                                ( ( currentChar == "," || currentChar == "\n" )
                                    cellList = append( cellList list( cell ) )
                                    cell = ""
                                    previousChar = currentChar
                                )
                                ;;; Everything else must be part of a valid value...
                                ( t
                                    cell = strcat( cell currentChar )
                                    previousChar = currentChar
                                )
                            ) ;;; end of cond
                            i++
                        ) ;;; end of while validLine && i
                        when( cellList
                            println( cellList )
                            listCellList = append( listCellList list( cellList ) )
                        )
                        validLine = nil
                        i = 1
                        lineNr++
                    ) ;;; end of while
                    close( inPort )
                ) ;;; end of when inPort
            ) ;;; end of when isFile
            println( listCellList )
        ) ;;; end of let
    ) ;;; end of procedure testLineRead

     

    The Example input file, make sure that after the last line no EOL/LF/NL/CR is present:

    #This is a comment
     #This is a comment after a single whitespace
    abc#This is a comment after a valid cell value
    ,def#This is a comment after a valid cell value
    ghi,jkl #This is a comment after two valid cells and a whitespace
    ghi , jkl #This is a comment after two valid cells and a whitespace
     mno,pqr #This is a comment after a whitespace, two valid cells and a whitespace
    ,#This is a comment after a single ",", thus two "NA"'s
    ,,#This is a comment after two ","'s, thus three "NA"'s
     , , #This is a comment after two ","'s, thus three "NA"'s
    ,stu,#This is a comment after two ","'s, thus three "NA"'s
     ,vwx, #This is a comment after two ","'s, thus three "NA"'s
         

    #empty lines above, with and without spaces...
    ,
    ,,
    ,,,
     ,
     , ,
     , , ,
    yza,bcd,efg
     hij, klm , nop ,
    qrs,tu vw,xyz
    012,34 56,789
    012,34   56,789,abc # This should give a warning that there are too many spaces
    012,34$ ^56,789 # this is valid
    012,34" '56,789 # this is valid
    # The following line should be added to the output, but I don't know how to check for End of File without New Line...
    noNewLineAtEndOfFile

    Many thanks in advance.

    With kind regards,

    Sjoerd

    • Cancel
    • Vote Up 0 Vote Down
    • Cancel
Reply
  • Sheppy
    Sheppy over 9 years ago

    Hi Andrew,

    I keep having problems with the regular expression syntax in SKILL.

    This is the string with the special characters I want to check:

    regExprMatchPattern = "[a-zA-Z0-9`-=;'./~!@$%^&*()_+{}|:\"<>?]"

    When I used rexMatchp it works:

    rexMatchp( regExprMatchPattern previousChar )

    And it works like this as well:

    validLine = !rexMatchp( strcat( regExprMatchPattern "[ ][ ]+" regExprMatchPattern ) nextLine )

    However, when I replace rexMatchp with pcreMatchp or pcreCompile/pcreExecute, it doesn't work at all. There is a problem with the characters in the regExprMatchPattern. This is what I tried, and what doesn't work:

    validLine = !pcreMatchp( strcat( regExprMatchPattern "[ ]{2,}" regExprMatchPattern ) nextLine )

    OR

    regExprMatchPattern = "[a-zA-Z0-9`-=;'./~!@$%^&*()_+{}|:\"<>?]"
    rexPatternSingleSpace = pcreCompile( regExprMatchPattern )
    rexPatternMultipleSpace = pcreCompile( strcat( regExprMatchPattern "[ ]{2,}" regExprMatchPattern ) )
    printf( "%L - %L\n" rexPatternSingleSpace rexPatternMultipleSpace )
    validLine = !pcreExecute( rexPatternMultipleSpace nextLine )
    rexMatchp( regExprMatchPattern previousChar )

    When I tried to find out which characters were causing the issue, I wrote a bit of code like this:

    testString = "[`-=;'./~!@$%^&*()_+{}|:\"<>?]"
    for( i 1 strlen( testString ) printf( "%L - %L\n" substring( testString i 1 ) pcreCompile( substring( testString i 1 ) ) ) )

    It showed that the characters *()+? are causing problems. After removing them from the testString, all pcreCompile() commands resulted in a pcreobjId.

    However, when I removed those characters from my code, it still did not work (not working for both pcreMatchp and pcreCompile/pcreExecute).

    It turns out that various combinations of characters are causing problems as well. That's where I stopt testing: there are thousands of combinations to test. A Google search did not return something useful.

    Can you help me to solve this regular expression problem (detecting all the characters in regExprMatchPattern)?
    Bonus question: how to detect [] in a regular expression (\[ and \] do not work)?

    I'm having issues with detecting EOF (End Of File) as well. The input file for my code is a CSV file, but some tools generate a CSV file without a EOL/LF/NL/CR (End Of Line/Line Feed/New Line/Carriage Return) at the end. The values at the end of the file should be processed as well.

    Below you'll find the actual code and an example input file (I couldn't find a button to attach some files...).
    The code works like it has to work with the given example. Some lines are commented-out, at two points, just replace the ; to get the other cases. Remember: if line 1 is used at location 1, then line 1 must be used at location 2 as well, 2 & 2 and 3 & 3 of course.

    The code (stripped down, the processing of listCellList is removed):

    procedure( testLineRead( inFileName "S" )
        let( (    ( inPort nil )
            ( lineNr 1 ) ( nextLine "" ) ( lineLength 0 ) ( validLine nil )
            ( previousChar "" ) ( currentChar "" ) ( nextChar "" )
            ( cell "" ) ( cellList nil ) ( listCellList nil )
            ( regExprMatchPattern "") ( rexPatternSingleSpace nil ) ( rexPatternMultipleSpace nil )
            ( i 1 )
            )

            regExprMatchPattern = "[a-zA-Z0-9`-=;'./~!@$%^&*()_+{}|:\"<>?]"
            rexPatternSingleSpace = pcreCompile( regExprMatchPattern )
            rexPatternMultipleSpace = pcreCompile( strcat( regExprMatchPattern "[ ]{2,}" regExprMatchPattern ) )
            printf( "debug: %L - %L\n" rexPatternSingleSpace rexPatternMultipleSpace )

            when( isFile( inFileName )
                inPort = infile( inFileName )
                when( inPort
                    while( gets( nextLine inPort )
                        lineLength = strlen( nextLine )
                        previousChar = ""
                        currentChar = ""
                        nextChar = ""
                        cell = ""
                        cellList = nil

                        ;;; location 1: reg-expr. test lines (3):
                        validLine = !rexMatchp( strcat( regExprMatchPattern "[ ][ ]+" regExprMatchPattern ) nextLine )
                        ;validLine = !pcreMatchp( strcat( regExprMatchPattern "[ ]{2,}" regExprMatchPattern ) nextLine )
                        ;validLine = !pcreExecute( rexPatternMultipleSpace nextLine )

                        unless( validLine
                            warn( "Illegal number of spaces ( >1 ) in cell value detected - (%s - line %d), please fix!" inFileName lineNr )
                        ) ;;; end of unless

                        while( validLine && i <= lineLength
                            currentChar = substring( nextLine i 1 )
                            nextChar = substring( nextLine i + 1 1 )
                            cond(
                                ;;; Filter out all white spaces or tabs...
                                ;;; But a bit more complicated than that: a valid value may contain spaces, but NEVER at the start or end.
                                ;;; A valid value may be "M1 path", the space between "M1" and "path" MUST NOT be removed.

                                ;;; location 2: reg-expr. test lines (3):
                                ( ( currentChar == " " && rexMatchp( regExprMatchPattern previousChar ) && rexMatchp( regExprMatchPattern nextChar ) )
                                ;( ( currentChar == " " && pcreMatchp( regExprMatchPattern previousChar ) && pcreMatchp( regExprMatchPattern nextChar ) )
                                ;( ( currentChar == " " && pcreExecute( rexPatternSingleSpace previousChar ) && pcreExecute( rexPatternSingleSpace nextChar ) )
                                    cell = strcat( cell currentChar )
                                    previousChar = currentChar
                                )
                                ;;; All other whitespaces and tabs must be filtered out...
                                ( ( currentChar == " " || currentChar == "\t" )
                                    previousChar = currentChar
                                )
                                ;;; Filter out empty lines...
                                ;;; An empty line may contain whitespaces, but no valid characters.
                                ( ( currentChar == "\n" && cell == "" && cellList == nil )
                                    warn( "Empty line (%s - line %d) detected, please remove..." inFileName lineNr )
                                    previousChar = currentChar
                                    i = lineLength
                                )
                                ;;; Filter out all comments (a comment is all after a #)...
                                ;;; When the line only contains a comment, with or whithout whitespaces left of the #...
                                ( ( currentChar == "#" && cell == "" && cellList == nil )
                                    previousChar = currentChar
                                    i = lineLength
                                )
                                ;;; When the line contains a comment after one or more valid cells (with or without whitspaces in-between)...
                                ( ( currentChar == "#" && cell != "" )
                                    cellList = append( cellList list( cell ) )
                                    previousChar = currentChar
                                    i = lineLength
                                )
                                ;;; When the line contains a comment AND the last non-whitespace character was a "," (with or without whitspaces in-between)...
                                ;;; If the last non-whitespace character was NOT a "," "cell" would have had a value and thus the previous condition would have been true.
                                ;;; The current value of "cell" is therefor "" (empty string) thus it is safe to add this to the "cellList".
                                ;;; If there is a ",", there is at least 1 cell in cellList, either a valid cell value or ""
                                ( ( currentChar == "#" && cellList != nil )
                                    cellList = append( cellList list( cell ) )
                                    previousChar = currentChar
                                    i = lineLength
                                )
                                ;;; Detect the comma's that separate the values (that's why it is a CSV-file...)
                                ;;; If the first valid character of the line is a ",", so previousChar = "", with or without whitespaces
                                ;;; If empty in-between two ","'s (so no valid character(s))...
                                ( ( currentChar == "," && ( previousChar == "" || previousChar == "," ) )
                                    cell = ""
                                    cellList = append( cellList list( cell ) )
                                    previousChar = currentChar
                                )
                                ;;; Empty in-between a comma and new-line, so add "" to the list...
                                ( ( currentChar == "\n" && previousChar == "," )
                                    cell = ""
                                    cellList = append( cellList list( cell ) )
                                    previousChar = currentChar
                                )
                                ;;; A comma, and none of the special things in front of it, so construct the new cell...
                                ;;; A new-line, and none of the special things in front of it, so construct the new cell...
                                ( ( currentChar == "," || currentChar == "\n" )
                                    cellList = append( cellList list( cell ) )
                                    cell = ""
                                    previousChar = currentChar
                                )
                                ;;; Everything else must be part of a valid value...
                                ( t
                                    cell = strcat( cell currentChar )
                                    previousChar = currentChar
                                )
                            ) ;;; end of cond
                            i++
                        ) ;;; end of while validLine && i
                        when( cellList
                            println( cellList )
                            listCellList = append( listCellList list( cellList ) )
                        )
                        validLine = nil
                        i = 1
                        lineNr++
                    ) ;;; end of while
                    close( inPort )
                ) ;;; end of when inPort
            ) ;;; end of when isFile
            println( listCellList )
        ) ;;; end of let
    ) ;;; end of procedure testLineRead

     

    The Example input file, make sure that after the last line no EOL/LF/NL/CR is present:

    #This is a comment
     #This is a comment after a single whitespace
    abc#This is a comment after a valid cell value
    ,def#This is a comment after a valid cell value
    ghi,jkl #This is a comment after two valid cells and a whitespace
    ghi , jkl #This is a comment after two valid cells and a whitespace
     mno,pqr #This is a comment after a whitespace, two valid cells and a whitespace
    ,#This is a comment after a single ",", thus two "NA"'s
    ,,#This is a comment after two ","'s, thus three "NA"'s
     , , #This is a comment after two ","'s, thus three "NA"'s
    ,stu,#This is a comment after two ","'s, thus three "NA"'s
     ,vwx, #This is a comment after two ","'s, thus three "NA"'s
         

    #empty lines above, with and without spaces...
    ,
    ,,
    ,,,
     ,
     , ,
     , , ,
    yza,bcd,efg
     hij, klm , nop ,
    qrs,tu vw,xyz
    012,34 56,789
    012,34   56,789,abc # This should give a warning that there are too many spaces
    012,34$ ^56,789 # this is valid
    012,34" '56,789 # this is valid
    # The following line should be added to the output, but I don't know how to check for End of File without New Line...
    noNewLineAtEndOfFile

    Many thanks in advance.

    With kind regards,

    Sjoerd

    • Cancel
    • Vote Up 0 Vote Down
    • Cancel
Children
No Data

Community Guidelines

The Cadence Design Communities support Cadence users and technologists interacting to exchange ideas, news, technical information, and best practices to solve problems and get the most from Cadence technology. The community is open to everyone, and to provide the most value, we require participants to follow our Community Guidelines that facilitate a quality exchange of ideas and information. By accessing, contributing, using or downloading any materials from the site, you agree to be bound by the full Community Guidelines.

© 2025 Cadence Design Systems, Inc. All Rights Reserved.

  • Terms of Use
  • Privacy
  • Cookie Policy
  • US Trademarks
  • Do Not Sell or Share My Personal Information