tags:

views:

58

answers:

5

I have this string here:

CREATE UNIQUE INDEX index555 ON
SOME_TABLE
(
    SOME_PK          ASC
);

I want to match across the multiple lines and match the SQL statements (all of them, there will be many in 1 large string)... something like this, however I am only getting a match on CREATE UNIQUE INDEX index555 ON

(CREATE\s.+;)

note: I am trying to accomplish this in java if it matters.

+2  A: 

Check out the various flags that can be passed to Pattern.compile. I think DOTALL is the one you need.

Don Kirkby
+2  A: 

You'll want to use the Pattern.DOTALL flag to match across lines.

Kibbee
+3  A: 

Check this

The regular expression . matches any character except a line terminator unless the DOTALL flag is specified

So you need to do something like this

Pattern p = Pattern.compile("your pattern", Pattern.DOTALL);
lowercase
+2  A: 

You need to use DOTALL and MULTILINE flags when compiling a regular expression. Here is a Java code example:

import java.util.regex.*;

public class test
{
    public static void main(String[] args)
    {
        String s =
        "CREATE UNIQUE INDEX index555 ON\nSOME_TABLE\n(\n    SOME_PK          ASC\n);\nCREATE UNIQUE INDEX index666 ON\nOTHER_TABLE\n(\n    OTHER_PK          ASC\n);\n";

        Pattern p = Pattern.compile("([^;]*?('.*?')?)*?;\\s*", Pattern.CASE_INSENSITIVE | Pattern.DOTALL | Pattern.MULTILINE);

        Matcher m = p.matcher(s);

        while (m.find())
        {
        System.out.println ("--- Statement ---");
        System.out.println (m.group ());
        }
    }
}

The output will be:

--- Statement ---
CREATE UNIQUE INDEX index555 ON
SOME_TABLE
(
    SOME_PK          ASC
);

--- Statement ---
CREATE UNIQUE INDEX index666 ON
OTHER_TABLE
(
    OTHER_PK          ASC
);
Vlad Lazarenko
+1  A: 

The DOTALL flag lets the . match newlines, but if you simply apply it to your existing regex, you'll end up matching everything from the first CREATE to the last ; in one go. If you want to match the statements individually, you'll need to do more. One option is to use a non-greedy quantifier:

Pattern p = Pattern.compile("^CREATE\\b.+?;",
    Pattern.DOTALL | Pattern.MULTILINE | Pattern.CASE_INSENSITIVE);

I also used the MULTILINE flag to let the ^ anchor match after newlines, and CASE_INSENSITIVE because SQL is--at least, every flavor I've heard of. Note that all three flags have "inline" forms that you can use in the regex itself:

Pattern p = Pattern.compile("(?smi)^CREATE\\b.+?;");

(The inline form of DOTALL is s for historical reasons; it was called "single-line" mode in Perl, where it originated.) Another option is to use a negated character class:

Pattern p = Pattern.compile("(?mi)^CREATE\\b[^;]+;");

[^;]+ matches one or more of any character except ;--that includes newlines, so the s flag isn't needed.

So far, I've assumed that every statement starts at the beginning of a line and ends with a semicolon, as in your example. I don't think either of those things is required by the SQL standard, but I expect you'll know if you can count on them in this instance. You might want to start matching at a word boundary instead of a line boundary:

Pattern p = Pattern.compile("(?i)\\bCREATE\\b[^;]+;");

Finally, if you're thinking about doing anything more complicated with regexes and SQL, don't. Parsing SQL with regexes is a fool's game--it's an even worse fit than HTML and regexes.

Alan Moore