views:

249

answers:

2

Hi,

How can I convert a table column definition into an array of columns using regex's without taking formatting into account? I'm confused how to split on "," as they can also appear in the datatype definition part (between matching parenthesis).

Sample input:

CREATE table test ( DISTRICT VARCHAR(3) CHARACTER SET LATIN NOT CASESPECIFIC, CUSTOMER_ACCOUNT DECIMAL(8,0), CUSTOMER_SUB_ACCOUNT DECIMAL(3,0), SERVICE_SEQ_NUM DECIMAL(7,0), EFFECTIVE_DATE TIMESTAMP(0), SUBSCRIBER_SEQ_NUM DECIMAL(7,0) )

Thanks!

Frederik

+1  A: 

Traditionally regular expressions are not capable of testing for/matching balanced parentheses. But several libraries have made extensions that allow some recursion. E.g. pcre with its ?R "quantifier". And Microsoft has added "balancing groups". You can find an example of that feature for matching (...) at http://oreilly.com/catalog/regex2/chapter/ch09.pdf

VolkerK
A: 

I think you could use negative look-behind for this scenario since the commas that are not of interest to you seem to be preceeded by a number:

$str = @'
CREATE table test ( DISTRICT VARCHAR(3) CHARACTER SET LATIN NOT CASESPECIFIC,
  CUSTOMER_ACCOUNT DECIMAL(8,0), 
  CUSTOMER_SUB_ACCOUNT DECIMAL(3,0), 
  SERVICE_SEQ_NUM DECIMAL(7,0), 
  EFFECTIVE_DATE TIMESTAMP(0), 
  SUBSCRIBER_SEQ_NUM DECIMAL(7,0) )
'@

$str -split '(?<!\d),'
CREATE table test ( DISTRICT VARCHAR(3) CHARACTER SET LATIN NOT CASESPECIFIC
 CUSTOMER_ACCOUNT DECIMAL(8,0)
 CUSTOMER_SUB_ACCOUNT DECIMAL(3,0)
 SERVICE_SEQ_NUM DECIMAL(7,0)
 EFFECTIVE_DATE TIMESTAMP(0)
 SUBSCRIBER_SEQ_NUM DECIMAL(7,0) )

Note this is using PowerShell 2.0's -split operator.

For paren matching, you might try something like this:

$re = [regex]@'
(?x)
\(
   (?>
       [^()]+
     |
       \( (?<DEPTH>)
     |
       \) (?<-DEPTH>)
   )*
   (?(DEPTH)(?!))
\)
'@

if ($str -match $re) {
    $matches[0]
}

Outputs:

( 
    DISTRICT VARCHAR(3) CHARACTER SET LATIN NOT CASESPECIFIC,  
    CUSTOMER_ACCOUNT DECIMAL(8,0),   
    CUSTOMER_SUB_ACCOUNT DECIMAL(3,0),   
    SERVICE_SEQ_NUM DECIMAL(7,0),   
    EFFECTIVE_DATE TIMESTAMP(0),   
    SUBSCRIBER_SEQ_NUM DECIMAL(7,0) 
)

See this blog post for more help on paren matching.

Keith Hill
Code below works fine on Powershell V1, so I'm now able to split the columns correctly:$txt = @"DISTRICT VARCHAR(3) CHARACTER SET LATIN NOT CASESPECIFIC, CUSTOMER_ACCOUNT DECIMAL(8,0), CUSTOMER_SUB_ACCOUNT DECIMAL(3,0)"@ [regex]::Split($txt,'(?<!\d),')A related question is this:How can I find the matching closing parenthesis for the opening parenthesis after the table name so I can identify the columnlist which is then splitted with the above regex?
frederik