tags:

views:

184

answers:

1

In this Java code:

public class Main {
  public static void main(String[] args) {
    "".matches("(?<!((.{0,1}){0,1}))");
  }
}

the compiler (I'm using JVM 1.6.0_17-b04) shouts "Exception ... Look-behind group does not have an obvious maximum length". I saw here that:

Java takes things a step further by allowing finite repetition. You still cannot use the star or plus, but you can use the question mark and the curly braces with the max parameter specified. Java recognizes the fact that finite repetition can be rewritten as an alternation of strings with different, but fixed lengths.

But... in the code above there is very obvious finite maximum length - 1 (simple product).

The real problem is, of course, in more complex patterns, like:

(?<!bad(\s{1,99}(\S{1,99}\s{1,99}){0,6}))good

(good word, that has no bad word behind, in 7-words-range).

How can I fix it?

+1  A: 

If you remove the capture groups from the negative look-behind then it seems to compile. I'm not even sure what the intent was or what the capture groups should be doing in a negative look-behind. Is that intentional?

Edit to clarify:

You wrote the regex:

"(?<!((.{0,1}){0,1}))"

The "(?<!" part indicates a negative look-behind as in you want to find matches where this doesn't happen before it. Yet, it is chock full of capture groups... ie: all of those naked (). Which doesn't make any sense since those can't possibly capture anything since it's a negative look behind. (In case you aren't fluent in regex, capture groups are used to pull specific sub-ranges of the match after the match has happened.)

Take all of those parentheses out and you will no longer get the error... not to mention that they are unnecessary:

"(?<!.{0,1}{0,1})"

The above part will work without error, for example. If you really need parentheses in negative look behind then you should use non-capturing groups like "(?:mypattern)". In this simple example they don't really do anything for you either way and the double {0,1} is a bit redundant.

Edit 2:

So I tried to get your more complicated example to work and even switching to non-capturing groups doesn't get rid of Java regex's confusion. The only way to work-around it seems to be to get rid of the {0,6} as suggested in comments.

For example, this will compile:

"(?<!bad(?:\\s{1,99}(?:\\S{1,99}\\s{1,99})?(?:\\S{1,99}\\s{1,99})?(?:\\S{1,99}\\s{1,99})?(?:\\S{1,99}\\s{1,99})?(?:\\S{1,99}\\s{1,99})?(?:\\S{1,99}\\s{1,99})?))good"

...and do the same thing but it's a lot uglier.

This may be a case where regex is not the complete answer but just part of a larger solution that requires more than one pass.

PSpeed
I'm not sure I understood you. If you meant "?<!((.{0,1}){0,1})" - it's not look-behind, just some chrarcters. If you meant "(?<!(.{0,1}){0,1})" - there is no difference - it's not compiling.
Y. Shoham
Okay, now it's clear. I knew that my first example is silly one; I just gave it to simply demonstrate the problem with curly-quantifier-on-group-with-curly-quantifier. Anyway, I took your advise about using non-capturing groups, when capturing is unnecessary.Another thing: Regexes, by definition, can handle it. As stated in the link in my question, .NET Regex Engine and another one they mentioned can handle non-bounded look-behinds.
Y. Shoham
True. That's why I called it "Java regex's confusion"... but still, at some point it's sometimes better to tokenize the whole string into \S tokens and operate on that.
PSpeed