The DOTALL
flag lets the .
match newlines, but if you simply apply it to your existing regex, you'll end up matching everything from the first CREATE
to the last ;
in one go. If you want to match the statements individually, you'll need to do more. One option is to use a non-greedy quantifier:
Pattern p = Pattern.compile("^CREATE\\b.+?;",
Pattern.DOTALL | Pattern.MULTILINE | Pattern.CASE_INSENSITIVE);
I also used the MULTILINE
flag to let the ^
anchor match after newlines, and CASE_INSENSITIVE
because SQL is--at least, every flavor I've heard of. Note that all three flags have "inline" forms that you can use in the regex itself:
Pattern p = Pattern.compile("(?smi)^CREATE\\b.+?;");
(The inline form of DOTALL
is s
for historical reasons; it was called "single-line" mode in Perl, where it originated.) Another option is to use a negated character class:
Pattern p = Pattern.compile("(?mi)^CREATE\\b[^;]+;");
[^;]+
matches one or more of any character except ;
--that includes newlines, so the s
flag isn't needed.
So far, I've assumed that every statement starts at the beginning of a line and ends with a semicolon, as in your example. I don't think either of those things is required by the SQL standard, but I expect you'll know if you can count on them in this instance. You might want to start matching at a word boundary instead of a line boundary:
Pattern p = Pattern.compile("(?i)\\bCREATE\\b[^;]+;");
Finally, if you're thinking about doing anything more complicated with regexes and SQL, don't. Parsing SQL with regexes is a fool's game--it's an even worse fit than HTML and regexes.