views:

206

answers:

4

In one of the StackOverflow Podcasts (the one where guys were discussing data generation for testing DBs -- either #11 or #12), Jeff mentioned something like "reverse regular expressions", which are used exactly for that purpose: given a regex, produce a string which will eventually match said regex.

What is the correct term for this whole concept? Is this a well-known concept?

+1  A: 

There is no such thing as a "reverse regular expression". A regular expression is just that: an expression. It expresses a language, which consists of a (possibly infinite) set of strings.

The reversal is in the use of the expression: where it is commonly used to report whether a string is in the language, it is now used to produce strings in the language.

Thomas
+3  A: 

Abstract: Recursive transition network (with the postmodernism generator as an interesting example)

One specialization would be your "reverse regex".


As to the terminology: A regular expression is a form of grammar that describes all the words belonging to a specific regular language (namely all the inputs matched by the expression).

Therefore one could call your question: "How can create a random word that matches a given regex" or "How can I obtain a random word belonging to a specified regular language".

Dario
A context free grammar (CFG) is not regular, as can be proved using the pumping lemma. However, the idea of an RTN can still be applied, using a nondeterministic finite-state automaton (NFA) instead.
Thomas
@Thomas: Yes, but a regular one is context free, so the same concept can be applied (being nothing but a specialization).
Dario
Ah, obviously, you are right.
Thomas
+1  A: 

The Perl module String::Random (in the CPAN) does this. Takes a subset of regular expressions, and does a random walk through it.

Randal Schwartz
A: 

A regular expression is a shorthand notation for a generative grammar that describes a regular language.

As the name says, a generative grammar can be used to generate all words in that language. Since a “regular expression” is just a different notation for such a grammar, I think it would be correct terminology to say “generate a random string for a given regular expression”.

And as Thomas has noted, it makes therefore no sense to speak of “reverse” regular expressions.

Konrad Rudolph