ansaurus

Question

Have regex ignore new lines and just match on a whole large string?

Answer 1

+2 A:

Check out the various flags that can be passed to Pattern.compile. I think DOTALL is the one you need.

Don Kirkby 2010-08-25 20:56:22

Answer 2

+2 A:

You'll want to use the Pattern.DOTALL flag to match across lines.

Kibbee 2010-08-25 20:56:49

Answer 3

+3 A:

Check this

The regular expression . matches any character except a line terminator unless the DOTALL flag is specified

So you need to do something like this

Pattern p = Pattern.compile("your pattern", Pattern.DOTALL);

lowercase 2010-08-25 21:13:46

Answer 4

+2 A:

You need to use DOTALL and MULTILINE flags when compiling a regular expression. Here is a Java code example:

import java.util.regex.*;

public class test
{
    public static void main(String[] args)
    {
        String s =
        "CREATE UNIQUE INDEX index555 ON\nSOME_TABLE\n(\n    SOME_PK          ASC\n);\nCREATE UNIQUE INDEX index666 ON\nOTHER_TABLE\n(\n    OTHER_PK          ASC\n);\n";

        Pattern p = Pattern.compile("([^;]*?('.*?')?)*?;\\s*", Pattern.CASE_INSENSITIVE | Pattern.DOTALL | Pattern.MULTILINE);

        Matcher m = p.matcher(s);

        while (m.find())
        {
        System.out.println ("--- Statement ---");
        System.out.println (m.group ());
        }
    }
}

The output will be:

--- Statement ---
CREATE UNIQUE INDEX index555 ON
SOME_TABLE
(
    SOME_PK          ASC
);

--- Statement ---
CREATE UNIQUE INDEX index666 ON
OTHER_TABLE
(
    OTHER_PK          ASC
);

Vlad Lazarenko 2010-08-25 21:30:06

Answer 5

+1 A:

The DOTALL flag lets the . match newlines, but if you simply apply it to your existing regex, you'll end up matching everything from the first CREATE to the last ; in one go. If you want to match the statements individually, you'll need to do more. One option is to use a non-greedy quantifier:

Pattern p = Pattern.compile("^CREATE\\b.+?;",
    Pattern.DOTALL | Pattern.MULTILINE | Pattern.CASE_INSENSITIVE);

I also used the MULTILINE flag to let the ^ anchor match after newlines, and CASE_INSENSITIVE because SQL is--at least, every flavor I've heard of. Note that all three flags have "inline" forms that you can use in the regex itself:

Pattern p = Pattern.compile("(?smi)^CREATE\\b.+?;");

(The inline form of DOTALL is s for historical reasons; it was called "single-line" mode in Perl, where it originated.) Another option is to use a negated character class:

Pattern p = Pattern.compile("(?mi)^CREATE\\b[^;]+;");

[^;]+ matches one or more of any character except ;--that includes newlines, so the s flag isn't needed.

So far, I've assumed that every statement starts at the beginning of a line and ends with a semicolon, as in your example. I don't think either of those things is required by the SQL standard, but I expect you'll know if you can count on them in this instance. You might want to start matching at a word boundary instead of a line boundary:

Pattern p = Pattern.compile("(?i)\\bCREATE\\b[^;]+;");

Finally, if you're thinking about doing anything more complicated with regexes and SQL, don't. Parsing SQL with regexes is a fool's game--it's an even worse fit than HTML and regexes.

Alan Moore 2010-08-26 08:00:59

ansaurus

tags:

views:

answers:

Have regex ignore new lines and just match on a whole large string?

related questions