tags:

views:

70

answers:

4

Suppose I have the following string:

insert into table values ('text1;'); insert into table values ('text2')

How do I break those queries (get each individual query) using regular expressions?

I've found a very similar problem: http://stackoverflow.com/questions/958095/useregex-to-find-specific-string-not-in-html-tag ...but it uses a solution that is specific to .NET: behind lookup (in php it complains that is not fixed length).

I would be very grateful if someone could give me some hints on how to deal with this problem.

A: 

How you want to break?

You can use explode( ' ', $query ) to transform into an array.

Or if you want to get text1 and text2 values with regexp you can use preg_match( '/(\'([\w]+)\')/', $query, $matches ) where $matches[1] is your value.

preg_match_all( '/([\w ]+([\w \';]+))/', $queries, $matches ) will give to you all matches with this pattern of query.

reinaldons
I need to break on ; but there is the problem of having ; inside quotes. It's possible to write a small code to do that but I was wondering if it was possible using regular expressions.
Rafael
preg_match_all solution can help you.
reinaldons
+1  A: 

The trick is to count how many unescaped quote characters you've passed. Assuming that the SQL is syntactically correct, semicolons after an even number of unescaped quote characters will be the ones you want, and semicolons after an odd number of unescaped quote characters will be part of a string literal. (Remember, string literals can contain properly escaped quote characters.)

If you want 100% reliability, you'll need a real SQL parser, like this. (I just Googled "SQL parser in PHP". I don't know if it works or not.)


EDIT:

I don't think it's possible to find pairs of unescaped quote characters using nothing but regex. Maybe a regex guru will prove me wrong, but it just seems too damn difficult to distinguish between escaped and unescaped quote characters in so many possible combinations. I tried look-behind assertions and backrefs with no success.

The following is not a pure-regex solution, but I think it works:

preg_match_all("/(?:([^']*'){2})*[^']*;/U", str_replace("\\'", "\0\1\2", $input), $matches);
$output = array_map(function($str){ return str_replace("\0\1\2", "\\'", $str); }, $matches[0]);

Basically, we temporarily replace escaped quote characters with a string of bytes that is extremely unlikely to occur, in this case \0\1\2. After that, all the quote characters that remain are the unescaped ones. The regex picks out semicolons preceded by an even number of quote characters. Then we restore the escaped quote characters. (I used a closure there, so it's PHP 5.3 only.)

If you don't need to deal with quote characters inside string literals, yes, you can easily do it with pure regex.

kijin
Right, but the point of the question is how to do that using regular expressions (if it is possible, of course). Thanks anyway.
Rafael
+1  A: 

Assuming proper SQL syntax it would probably be best to split on the semicolon.

The following regexp will work but only if all quotes come in pairs.

/.+?\'.+?\'.*?;|.+?;/

To avoid escaped single quotes:

/.+?[^\\\\]\'.+?[^\\\\]\'.*?;|.+?;/

To handle Multiple pairs of single quotes.

/.+?(?:[^\\]\'.+?[^\\]\')+.*?;|.+?;/

Tested against the following data set:

insert into table values ('text1;\' ','2'); insert into table values ('text2'); insert into test3 value ('cookie\'','fly');

Returns:

insert into table values ('text1;\' ','2');

insert into table values ('text2');

insert into test3 value ('cookie\'','fly');

I have to admit this is a pretty dirty regexp. It would not handle any sort of SQL syntax errors at all. I enjoyed the challenge of coming up with a pure regex though.

CtRanger
+1 Nice work with this!
mellowsoon
A: 

Regex's aren't always good at this type of thing. The following function should work though:

function splitQuery($query) {
    $open = false;
    $buffer = null;
    $parts = array();
    for($i = 0, $l = strlen($query); $i < $l; $i++) {
        if ($query[$i] == ';' && !$open) {
            $parts[] = trim($buffer);
            $buffer = null;
            continue;
        }
        if ($query[$i] == "'") {
            $open = ($open) ? false: true;
        }

        $buffer .= $query[$i];
    }

    if ($buffer) $parts[] = trim($buffer);
    return $parts;
}

Usage:

$str = "insert into table values ('text1;'); insert into table values ('text2')";
$str = splitQuery($str);
print_r($str);

Outputs:

Array
(
    [0] => insert into table values ('text1;')
    [1] => insert into table values ('text2')
)
mellowsoon