Over the years, "regex" pattern matching has been getting more and more powerful to the point where I wonder: is it really just context-sensitive-grammar matching? Is it a variation/extension of context-free-grammar matching? Where is it right now and why don't we just call it that instead of the old, restrictive "regular expression"?
views:
410answers:
3In particular backreferences to capturing parentheses make regular expressions more complex than regular, context-free, or context-sensitive grammars. The name is simply historically grown (as many words). See also this section in Wikipedia and this explanation with an example from Perl.
The way I see it:
- Regular languages:
- Matched by state machines. Only one variable can be used to represent the current "location" in the grammar to be matched: Recursion cannot be implemented
- Context-free languages:
- Matched by a stack machine. The current "location" in the grammar is represented by a stack in one or another form. Cannot "remember" anything that occurred before
- Context-sensitive languages:
- Most programming languages
AllMost human languages
I do know of regular expression parsers that allow you to match against something the parser has already encountered, achieving something like a context-sensitive grammar.
Still, regular expression parsers, however sophisticated they may be, don't allow for recursive application of rules, which is a definite requirement for context-free grammars.
The term regex, in my opinion, mostly refers to the syntax used to express those regular grammars (the stars and question marks).
There are features in modern regular expression implementations that break the rules of the classic regular expression definition.
For example Microsoft’s .NET Balancing Group (?<
name1
-
name2
> … )
:
^(?:0(?<L>)|1(?<-L>))*(?(L)(?!))$
This does match the language L₀₁ = {ε, 01, 0011, 000111, … }. But this language is not regular according to the Pumping Lemma.