views:

174

answers:

2

Ruby 1.9 regex supports lookbehind assertion but I seem to have difficulty when passing anchors in the pattern. When anchors are passed in the lookahead assertion it runs just fine.

"well substring! "[/(?<=^|\A|\s|\b)substring!(?=$|\Z|\s|\b)/] #=> RegexpError: invalid pattern in look-behind: /(?<=^|\A|\s|\b)substring(?=$|\Z|\s|\b)/

Does anybody know how to make anchors work in lookbehind assertions as it does in lookahead?

Is there a special escape sequence or grouping that is required for lookbehind?

I have tested this behavior in 1.9.1-p243, p376 and 1.9.2-preview3 just in case it was patched.

A: 

Looks like the interpretation of the lookbehind is that of a range [] and not a group () like lookahead assertions. That possibly means \b is an invalid backspace character and not a word boundary.

"well substring! "[/(?<=^|\A|\s|[^\B])substring!(?=$|\Z|\s|\b)/]  #=> substring!
"well substring! "[/(?<=^|\A|\s|[^\B])substring(?=$|\Z|\s|\b)/]   #=> substring
"well !substring! "[/(?<=^|\A|\s|[^\B])substring(?=$|\Z|\s|\b)/]  #=> substring
"well !substring! "[/(?<=^|\A|\s|[^\B])!substring(?=$|\Z|\s|\b)/] #=> !substring

When all else fails... use a double negative!

klappy
+1  A: 

Looks like you're right: \b works as expected in a lookahead, but in a lookbehind it's treated as a syntax error.

It doesn't really matter in this case: if (?<=^|\A|\s|\b) would have yielded the desired result, \b is all you needed anyway. The character following the assertion has to be s--a word character--so \b means either (1) the previous character is not a word character, or (2) there is no previous character. That being the case, ^, \A and \s are all redundant.

However, if the string starts with ! it's a different story. ^ and \A still match the beginning of the string, before the !, but \b matches after it. If you want to match !substring! as a complete string you have to use /\A!substring!\Z/, but if you only want to match the whole word substring you have to use /\bsubstring\b/.

As for [^\B], that just matches any character except B. Like \b, \B is a zero-width assertion, and a character class has to match exactly one character. Some regex flavors would throw an exception for the invalid escape sequence \B, but Ruby (or Oniguruma, more likely) lets it slide.

Alan Moore