tags:

views:

818

answers:

4

I've got an IronPython script that executes a bunch of SQL statements against a SQL Server database. the statements are large strings that actually contain multiple statements, separated by the "GO" keyword. That works when they're run from sql management studio and some other tools, but not in ADO. So I split up the strings using the 2.5 "re" module like so:

splitter = re.compile(r'\bGO\b', re.IGNORECASE)
for script in splitter.split(scriptBlob):
    if(script):
        [... execute the query ...]

This breaks in the rare case that there's the word "go" in a comment or a string. How in the heck would I work around that? i.e. correctly parse this string into two scripts:

-- this is a great database script!  go team go!
INSERT INTO myTable(stringColumn) VALUES ('go away!')
/*
  here are some comments that go with this script.
*/
GO
INSERT INTO myTable(stringColumn) VALUES ('this is the next script')


EDIT:

I searched more and found this SQL documentation: http://msdn.microsoft.com/en-us/library/ms188037(SQL.90).aspx

As it turns out, GO must be on its own line as some answers suggested. However it can be followed by a "count" integer which will actually execute the statement batch that many times (has anybody actually used that before??) and it can be followed by a single-line comments on the same line (but not a multi-line, I tested this.) So the magic regex would look something like:

"(?m)^\s*GO\s*\d*\s*$"

Except this doesn't account for:

  • a possible single-line comment ("--" followed by any character except a line break) at the end.
  • the whole line being inside a larger multi-line comment.

I'm not concerned about capturing the "count" argument and using it. Now that I have some technical documentation i'm tantalizingly close to writing this "to spec" and never having to worry about it again.

+5  A: 

since you can have comments inside comments, nested comments, comments inside queries, etc, there is no sane way to do it with regexes.

Just immagine the following script:

INSERT INTO table (name) VALUES (
-- GO NOW GO
'GO to GO /* GO */ GO' +
/* some comment 'go go go'
-- */ 'GO GO' /*
GO */
)

That without mentioning:

INSERT INTO table (go) values ('xxx') GO

The only way would be to build a stateful parser instead. One that reads a char at a time, and has a flag that will be set when it is inside a comment/quote-delimited string/etc and reset when it ends, so the code can ignore "GO" instances when inside those.

nosklo
A parser would be the best solution, but if you can guarantee that GO is always on a line by itself you are pretty safe, especial since SQL92 doesn't have multiline comments.
Chas. Owens
you just blew my mind.
Barry Fandango
+4  A: 

If GO is always on a line by itself you can use split like this:

#!/usr/bin/python

import re

sql = """-- this is a great database script!  go team go!
INSERT INTO myTable(stringColumn) VALUES ('go away!')
/*
  here are some comments that go with this script.
*/
GO 5 --this is a test
INSERT INTO myTable(stringColumn) VALUES ('this is the next script')"""

statements = re.split("(?m)^\s*GO\s*(?:[0-9]+)?\s*(?:--.*)?$", sql)

for statement in statements:
    print "the statement is\n%s\n" % (statement)
  • (?m) turns on multiline matchings, that is ^ and $ will match start and end of line (instead of start and end of string).
  • ^ matches at the start of a line
  • \s* matches zero or more whitespaces (space, tab, etc.)
  • GO matches a literal GO
  • \s* matches as before
  • (?:[0-9]+)? matches an optional integer number (with possible leading zeros)
  • \s* matches as before
  • (?:--.*)? matches an optional end-of-line comment
  • $ matches at the end of a line

The split will consume the GO line, so you won't have to worry about it. This will leave you with a list of statements.

This modified split has a problem: it will not give you back the number after the GO, if that is important I would say it is time to move to a parser of some form.

Chas. Owens
This is a good suggestion, and thanks for the detailed breakdown.
Barry Fandango
Chas, see edits above - how would I check for a single line comment at the end of the GO line? Please excuse my regex inexperience.
Barry Fandango
+6  A: 

Is "GO" always on a line by itself? You could just split on "^GO$".

mcassano
after turning on multiline matching, I threw in optional whitespace as well, just in case.
Chas. Owens
I think it is usually on its own line, and that might be a good enough solution for this script. Although strictly, that wouldn't protect against a GO on its own line inside a multiline comment or a multiline string (also very rare though.)
Barry Fandango
My now updated script remedies this flaw.
MizardX
+2  A: 

This won't detect if GO ever is used as a variable name inside some statement, but should take care of those inside comments or strings.

EDIT: This now works if GO is part of the statement, as long as it is not in it's own line.

import re

line_comment = r'(?:--|#).*$'
block_comment = r'/\*[\S\s]*?\*/'
singe_quote_string = r"'(?:\\.|[^'\\])*'"
double_quote_string = r'"(?:\\.|[^"\\])*"'
go_word = r'^[^\S\n]*(?P<GO>GO)[^\S\n]*\d*[^\S\n]*(?:(?:--|#).*)?$'

full_pattern = re.compile(r'|'.join((
    line_comment,
    block_comment,
    singe_quote_string,
    double_quote_string,
    go_word,
)), re.IGNORECASE | re.MULTILINE)

def split_sql_statements(statement_string):
    last_end = 0
    for match in full_pattern.finditer(statement_string):
        if match.group('GO'):
            yield statement_string[last_end:match.start()]
            last_end = match.end()
    yield statement_string[last_end:]

Example usage:

statement_string = r"""
-- this is a great database script!  go team go!
INSERT INTO go(go) VALUES ('go away!')
go 7 -- foo
INSERT INTO go(go) VALUES (
    'I have to GO " with a /* comment to GO inside a /* GO string /*'
)
/*
  here are some comments that go with this script.
  */
  GO
  INSERT INTO go(go) VALUES ('this is the next script')
"""

for statement in split_sql_statements(statement_string):
    print '======='
    print statement

Output:

=======

-- this is a great database script!  go team go!
INSERT INTO go(go) VALUES ('go away!')

=======

INSERT INTO go(go) VALUES (
    'I have to GO " with a /* comment to GO inside a /* GO string /*'
)
/*
  here are some comments that go with this script.
  */

=======

  INSERT INTO go(go) VALUES ('this is the next script')
MizardX
fails on 'I have to GO " with a /* comment to GO inside a /* GO string /*' GO
nosklo
No, it doesn't. It will find all non-overlapping matches, so the whole single-quote string will be detected as one match.
MizardX
No fail when I run it ...
John Fouhy
Very impressive! Thanks for taking the time to write this.
Barry Fandango
This is awesome! I have one scenario for you:-- GOBOB3 GOdoesn't catch the second go.
ferventcoder
well.......That didn't work out very well. --GO is on a separate line than BOB3 GO
ferventcoder