




I want to match the following: a zero length line, with the match continuing across lines of non-zero length until a particular string is matched in a line. E.g: the match starts with a zero length line and continues until STOP is reached:

Some random text I don't care about

The match starts at the beginning of this line
The match continues across this line
The match stops here STOP more
text I don't care about

Any suggestions?


+3  A: 

This should do it:

(?ms)^[ \t]*+$\s*+((?:(?!STOP).)*+)

A little demo:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main { 
    public static void main(String[] args) {
        String text = "Some random text I don't care about"      + "\n" +
                ""                                               + "\n" +
                "The match starts at the beginning of this line" + "\n" +
                "The match continues across this line"           + "\n" +
                "The match stops here STOP more"                 + "\n" +
                "don't care about"                               + "\n" +
                ""                                               + "\n" +
                ""                                               + "\n" +
                ""                                               + "\n" +
                "foo"                                            + "\n" +
                "barSTOP"                                        + "\n" +
                "text I don't care about";
        Matcher m = Pattern.compile("(?ms)^[ \t]*+$\\s*+(?:(?!STOP).)*+").matcher(text);
        while(m.find()) {
            System.out.println("match ->""<-");

which will output:

match ->
The match starts at the beginning of this line
The match continues across this line
The match stops here <-
match ->


A small explanation:

(?ms)               # enable mutli-line and dot-all
^[ \t]*+$           # match and empty line
\s*+                # match the line break
(                   # start group 1
  (?:(?!STOP).)     #   if the string 'STOP' cannot be seen, match any character
  *+                #   match the previous zero or more times (possessively)
)                   # stop group 1
Bart Kiers
Hi Bart - that's fantastic - thanks for your help. Dissecting it now.
Hi Bart, should have mentioned, should match from the first blank line before non-zero lines. If there are other blank lines earlier in the doc then the match starts from the first one.
My solution should do that, you may need to do a `group()` instead of `group(1)` to include the empty line(s) in your match.
Bart Kiers
See the edited version.
Bart Kiers
Hi Bart - in combination with your answer for my other question, I al using this: (?s)\n\n(?:(?!\n\n).)*STOP Thanks again.
You're welcome again!
Bart Kiers

Making sure you set the "match multi-line" flag, the expression is "\\n(\\n.*STOP)". The first (and only) match group yields your result. On DOS and windows systems use "\\r\\n" in place of "\\n".
