tags:

views:

91

answers:

2

Hi guys, I have a simple regexp question. I have the following multiline string:

description: line1\r\nline2\r\n...

And I am trying to find all the lines that come after the description:. I used the following regexp (and few more):

description: ((.*\r\n){1,})

...without any success. Then I found that there is a 'Regexp StackOverflow' bug (stated as won't fix) in Sun, see Bug #5050507. Can anyone please provide me with the magic formula to overcome this annoying bug? Please note that the total length of the lines must exceed 818 bytes!!

A: 

I can reproduce the error:

StringBuilder sb = new StringBuilder();
for (int i = 0; i < 1000; ++i)
{
    sb.append("j\r\n");
}
String s = "description: " + sb.toString(); 
Pattern pattern = Pattern.compile("description: ((.*\r\n){1,})");
//Pattern pattern = Pattern.compile("description: ((?:.*\r\n)++)");

Matcher matcher = pattern.matcher(s);
boolean b = matcher.find();
if (b) {
    System.out.println(matcher.group(1));
}

The quantifier {1,}is the same as + so you should use + instead, but this still fails. To fix it you can (as Bat K. points out) change the + to ++ making it possessive, which disables backtracking, preventing the stack overflow.

Mark Byers
+1  A: 

Since you are matching anything beyond the text description, you can simply allow the dot to match newlines with Pattern.DOTALL:

description:\s(.*)

So, in Java:

Pattern regex = Pattern.compile("description:\\s(.*)", Pattern.DOTALL);
Matcher regexMatcher = regex.matcher(subjectString);
if (regexMatcher.find()) {
    ResultString = regexMatcher.group(1);
}

The only semantic difference to your regex (apart from the facts that it won't blow your stack) is that it would also match if whatever follows after description: does not contain a newline. Also, your regex will not match the last line of the file unless it ends in a newline, mine will. Which behaviour is preferable is your decision.

Of course, your functionality could be emulated like this:

description:\s(.*\r\n)

but I doubt that that's really what you want. Or is it?

Tim Pietzcker
@Tim: I think the OP took that "818" number from the bug report he cited--i.e., he's just saying the strings he's working with will always be long enough to trigger this behavior.
Alan Moore
@Alan Moore: Oh, OK. That wasn't clear at all from his question, though :)
Tim Pietzcker