ansaurus

Question

Answer 1

+10 A:

“Strictly” regular expressions describe regular languages. But many features, such as the usage of backreferences in the expression itself or recursion for example, can be used to write regular expressions that accept non-regular languages.

For example, the language described by

(a+)b+\1

isn't regular, as you can't force that a appears the same number of times before and after the bs. At least not in a regular language. With context-free or even context-sensitive languages, that's a completely different matter.

However, regular expressions that only use elementary things such as the various quantifiers, character classes, etc. usually still describe regular languages.

Joey 2010-02-12 22:02:50

Answer 2

+7 A:

A regular expression can be recognized by a finite automaton. A finite automaton has finite memory (hence the name). A recursive regex requires a stack to do the recursion. An stack space can be infinite.

Mehrdad Afshari 2010-02-12 22:03:52

Answer 3

+4 A:

The basis of the other answers requires an understanding of the involved computation theory. If your only exposure to regular expressions is in a programming environment, you may not realize that regular expressions are mathematical constructs, as well. The wikipedia article on regular expressions may provide some background into the theoretical aspects of regular expressions.

Jonathon 2010-02-12 22:09:31

I was just contemplating to post such an answer to complement the other two (excellent!) answers... :) +1

Bart Kiers 2010-02-12 22:13:28

Answer 4

+4 A:

The strict definition of regular language from theoretical computer-science may seem abstract with little practical benefit, but if you're ever faced with the need to implement a state machine to recognize certain inputs, you can save yourself a lot of useless effort and hairpulling if you can prove up front that it can't be done.

An informal way to express it is recognizing regular languages cannot require memory. The pumping lemma for regular languages is useful for proving that a particular language (i.e., a set of strings) cannot be recognized by a deterministic finite automaton.

From An Introduction to Formal Languages and Automata by Peter Linz (pg. 115, 3rd ed.):

Theorem 4.8

Let L be an infinite regular language. Then there exists some positive integer m such that any w ∈ L with |w| ≥ m can be decomposed as

w = xyz,

with

|xy| ≤ m,

and

|y| ≥ 1,

such that

w_i = xyⁱz — Eq. (4.2)

is also in L for all i = 0, 1, 2, …

To recognize an infinite language, a finite automaton must “pump” or repeat some portion of its states, and that's the function of yⁱ (notation for some string y repeated i times).

Very nearly all proofs involving the pumping lemma involve proof by contradiction. On page 117, the author proves that the language L = { aⁿbⁿ : n ≥ 0 }—i.e., strings of the form aaa…bbb… where the all-a and all-b substrings are equal in length—is not regular:

Assume that L is regular, so that the pumping lemma must hold. We do not know the value of m, but whatever it is, we can always choose n = m. Therefore, the substring y must consist entirely of a's. Suppose |y| = k. Then the string obtained by using i = 0 in Equation (4.2) is

w₀ = a^m-kb^m

and is clearly not in L. This contradicts the pumping lemma and thereby indicates that the assumption that L is regular must be false.

Other examples of languages that are not regular:

L = { ww^R : w ∈ Σ^* } (palindromes)
L = { w ∈ Σ^* : n_a(*w*) < n_b(*w*) } (a's fewer than b's)
L = { a^n! : n ≥ 0 }
L = { aⁿb^l : n ≠ l }
L = { aⁿb^l : n + l is a prime number }

It turns out that what we loosely call regular expressions are considerably more powerful: matching regular expressions with backreferences is NP-hard!

Greg Bacon 2010-02-13 02:44:41

+1. Nice :-). I hated those proofs, actually; we had to do way too many of them ;-)

Joey 2010-02-13 10:34:17

ansaurus

tags:

views:

answers:

Why is recursive regex not regex?

Theorem 4.8

related questions