tags:

views:

614

answers:

6

Hello, I am writing a backend application which needs to be able to send multiple SQL commands to a MySQL server. MySQL >= 5.x support multiple statements, but unfortunately we are interfacing with MySQL 4.x.

I am trying to find a way (hint: regex) to split SQL statements by their semicolon, but it should ignore semicolons in single and double quotes strings.

http://www.dev-explorer.com/articles/multiple-mysql-queries has a very nice regex to do that, but doesn't support double quotes.

I'd be happy to hear your suggestions.

+1  A: 

Duplicate:

http://stackoverflow.com/questions/139926/regular-expression-to-match-common-sql-syntax http://stackoverflow.com/questions/633014/split-multiple-sql-statements-into-individual-sql-statements

cdonner
no, that topic is about validating SQL statements with regex's.what i'd need is to split multiple SQL statements by their semicolon.
Ok, it is related nevertheless. How about this one, then:http://stackoverflow.com/questions/633014/split-multiple-sql-statements-into-individual-sql-statements
cdonner
A: 

Can't be done with regex, it's insufficiently powerful to parse SQL. There may be an SQL parser available for your language — which is it? — but parsing SQL is quite hard, especially given the range of different syntaxes available. Even in MySQL alone there are many SQL_MODE flags on a server and connection level that can affect how basic strings and comments are parsed, making statements behave quite differently.

The example at dev-explorer goes to amusing lengths to try to cope with escaped apostrophes and trailing strings, but will still fail for many valid combinations of them, not to mention the double quotes, backticks, the various comment syntaxes, or ANSI SQL_MODE.

bobince
A: 

As bobince said, regular expressions are probably not going to be powerful enough to do this. They're certainly not going to be powerful enough to do it in any halfway elegant manner. The second link cdonner provided also does not address this; most answers there were trying to talk the questioner out of doing this without semicolons; if he had taken the general advice, then he'd have ended up where you are.

I think the quickest path to solving this is going to be with a string scanner function, that examines every character of the string in sequence, and reacts based on a bit of stored state. Rough pseudocode:

  1. Read in a character
  2. If the character is not special, CONTINUE
  3. If the character is escaped (checking this probably requires examining the previous character), CONTINUE
  4. If the character would start a new string or end an existing one, toggle a flag IN_STRING (you might need multiple flags for different string types... I've honestly tried and succeeded at remaining ignorant of the minutiae of SQL quoting/escaping) and CONTINUE
  5. If the character is a semicolon AND we are not currently in a string, we have found a query! OUTPUT it and CONTINUE scanning until the end of the string.

Language parsing is not any of my areas of experience, so you'll want to consider that approach carefully; nonetheless, it's going to be fast (with C-style strings, none of those steps are at all expensive, save possibly for the OUTPUT, depending on what "outputting" means in your context) and I think it should get the job done.

kquinn
A: 

I would suggest seeing if you can redefine the problem space so the need to send multiple queries separated only by their terminator is not required.

staticsan
A: 

Try this. Just replaced the 1st ' with \" and it seems to work for both ' and " ;+(?=([^\"|^\\']['|\\'][^'|^\\']['|\\'])[^'|^\\'][^'|^\\']$)

+1  A: 

maybe with the following Java Regexp? check the test...

@Test
public void testRegexp() {
    String s = //
        "SELECT 'hello;world' \n" + //
        "FROM DUAL; \n" + //
        "\n" + //
        "SELECT 'hello;world' \n" + //
        "FROM DUAL; \n" + //
        "\n";

    String regexp = "([^;]*?('.*?')?)*?;\\s*";

    assertEquals("<statement><statement>", s.replaceAll(regexp, "<statement>"));
}
mhoms