views:

66

answers:

3

Most languages allow fixed-length or finite-length lookbehind. One notable exception is .NET, which allows the use of the * operator.

However, .NET regexs can already recognize balanced parentheses using named capture, which is not a regular language. Are regexs still regular with * in lookbehind? Extended answers for subexpressions other than * (for example, additional lookaround!) would also be appreciated.

tl;dr: Do regexs stay regular with * in lookbehind?

A: 

.NET's unbounded lookbehind is merely a refinement of an already non-regular feature: fixed, finite or infinite, lookbehinds have no place in a regular grammar. Nor do lookaheads, capturing groups, backreferences, reluctant quantifiers, possessive quantifiers, atomic groups, conditionals, word boundaries, anchors...

If we had to limit ourselves to theoretically-pure regular expressions, 99.9% of current regex users would have no use for them. Asking if a feature is "regular" is a waste of breath; is it useful? That's all that matters.

Alan Moore
-1 Many of the features you just listed (including lookaround) are perfectly regular. Also calling the question a waste of breath is a) rude and b) wrong as the regularness of a regular expression has practical implications: there are some things you can do with regular regular expressions, that you can't do with irregular regular expressions. For example: find their intersection.
sepp2k
These features are not in the traditional formulation of regular expressions, but they are still regular. For anyone that reads this question at some later time:Lookaheads (infinite-length), reluctant quantifiers, possessive quantifiers, atomic groups, word boundries, and anchors are regular.Capturing groups are essentially irrelevant.Backreferences, given capturing groups, are non-regular.
Zachary Vance
A: 

Regular expressions are closed under intersection. Add a new symbol & and rewrite the lookbehind: A(?<B)C as (?:AC&.*BC), and we get that lookbehind is regular.

B can include clearly use anything that doesn't go past the A/C boundry. That is, anything except lookahead. What happens if lookbehind may use lookahead, or vice-versa? Start work on .*BC . You're still fine.

So, regular expressions could really add in intersection and infinite-length lookaround (which can include more lookaround to any depth) and it would still be just as efficient.

Zachary Vance
+1  A: 

I believe the answer here: http://stackoverflow.com/questions/2974210/does-lookaround-affect-which-languages-can-be-matched-by-regular-expressions/2991587#2991587 can be extended to prove that adding * in lookbehind (or even nesting such lookbehinds and lookaheads) does not affect the 'regularness' of the expressions. I haven't put more thought into it though.

Hope that helps!

Moron