views:

148

answers:

3

Hi,

I want to match the following: a zero length line, with the match continuing across lines of non-zero length until a particular string is matched in a line. E.g: the match starts with a zero length line and continues until STOP is reached:

Some random text I don't care about

The match starts at the beginning of this line
The match continues across this line
The match stops here STOP more
text I don't care about

Any suggestions?

Thanks

+3  A: 

This should do it:

(?ms)^[ \t]*+$\s*+((?:(?!STOP).)*+)

A little demo:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main { 
    public static void main(String[] args) {
        String text = "Some random text I don't care about"      + "\n" +
                ""                                               + "\n" +
                "The match starts at the beginning of this line" + "\n" +
                "The match continues across this line"           + "\n" +
                "The match stops here STOP more"                 + "\n" +
                "don't care about"                               + "\n" +
                ""                                               + "\n" +
                ""                                               + "\n" +
                ""                                               + "\n" +
                "foo"                                            + "\n" +
                "barSTOP"                                        + "\n" +
                "text I don't care about";
        Matcher m = Pattern.compile("(?ms)^[ \t]*+$\\s*+(?:(?!STOP).)*+").matcher(text);
        while(m.find()) {
            System.out.println("match ->"+m.group()+"<-");
        }
    }
}

which will output:

match ->
The match starts at the beginning of this line
The match continues across this line
The match stops here <-
match ->


foo
bar<-

A small explanation:

(?ms)               # enable mutli-line and dot-all
^[ \t]*+$           # match and empty line
\s*+                # match the line break
(                   # start group 1
  (?:(?!STOP).)     #   if the string 'STOP' cannot be seen, match any character
  *+                #   match the previous zero or more times (possessively)
)                   # stop group 1
Bart Kiers
Hi Bart - that's fantastic - thanks for your help. Dissecting it now.
Richard
Hi Bart, should have mentioned, should match from the first blank line before non-zero lines. If there are other blank lines earlier in the doc then the match starts from the first one.
Richard
My solution should do that, you may need to do a `group()` instead of `group(1)` to include the empty line(s) in your match.
Bart Kiers
See the edited version.
Bart Kiers
Hi Bart - in combination with your answer for my other question, I al using this: (?s)\n\n(?:(?!\n\n).)*STOP Thanks again.
Richard
You're welcome again!
Bart Kiers
A: 

Making sure you set the "match multi-line" flag, the expression is "\\n(\\n.*STOP)". The first (and only) match group yields your result. On DOS and windows systems use "\\r\\n" in place of "\\n".

rsp
A: 
(?ms)^$(.+)STOP
highlycaffeinated