ansaurus

Question

create a program that inputs a regular expression and outputs string(s) that satisfy that regular expression

Answer 1

A:

The easiest way to implement but definitely most CPU time intensive approach would be to simply brute force it. Set up a character table with the characters that your string should contain and then just sequentially generate strings and do a Regex.IsMatch on them.

Tom Frey 2009-10-05 22:42:21

For a regex of any significant complexity this would probably take an unreasonable amount of processing power.

Rex M 2009-10-05 22:47:00

That could easily finish after the end of time.

Jonas Elfström 2009-10-05 22:47:09

Please don't do this.

Byron Whitlock 2009-10-05 22:54:31

According to the back of my napkin, if you can do 100,000 regex tests per second, you will be able to test all 12-character strings possible on a standard US keyboard by roughly the same time the Sun will burn out.

Rex M 2009-10-05 22:57:42

well, I didn't say that it will be fast or practical but it would be easy to implement and for short strings it would be a feasible approach :P

Tom Frey 2009-10-05 23:05:55

@Rex M, You might upgrade your cpu before the sun burns out, so make sure the program can be halted+resumed

gnibbler 2009-10-05 23:34:25

also, you can easily parallelize it, so e.g. on a dual quad core machine it would only take 1/8th of ~5.5bln years, not accounting for quantum computing in the near future

Tom Frey 2009-10-06 00:09:24

Answer 2

+9 A:

Well a regex is convertible to a DFA which can be thought of as a graph. To generate a string given this DFA-graph you'd just find a path from a start state to an end state. You'd just have to think about how you want to handle cycles (Maybe traverse every cycle at least once to get a sampling? n times?), but I don't see why it wouldn't work.

Falaina 2009-10-05 22:47:46

+1 great answer. [ DFA = Deterministic Finite Autonoma ]

Byron Whitlock 2009-10-05 22:56:23

Most modern regex engines supports backreferences and it seems that those could make it really hard to produce a match. http://perl.plover.com/NPC/

Jonas Elfström 2009-10-05 23:20:17

http://osteele.com/tools/reanimator/??? - May help you along the path of realizing the conversion to a DFA

gnarf 2009-10-05 23:49:19

Another way to look at this would be to convert the DFA to a markov model. That way taking a random walk through the DFA is easy. At each point you randomly choose a transition and you only stop when you reach a terminating state or you have a string of a desired length.A markov chain is very simple to represent as a square reachability matrix from states to states. If a state is reachable from some other state then there will be a non-zero value in their intersection in the matrix. To walk the DFA, just choose randomly from amongst the matches of the row of the current state.

Andrew Matthews 2009-10-05 23:59:02

Answer 3

A:

I, personally, believe that this is the holy grail of reg-ex. If you could implement this -- even only 3/4 working -- I have no doubt that you'd be rich in about 5 minutes.

All joking aside, I'm not sure that what you are truly going after is feasible. Reg-Ex is a very open, flexible language and giving the computer enough sample input to truly and accurately find what you need, is probably not feasible.

If I'm proven wrong, I wish kudos to that developer.

To look at this from a different perspective, this is almost (not quite) like giving a computer it's output, and having it -- based on that -- write a program for you. This is a little overboard, but it kind of illustrates my point.

Frank V 2009-10-05 22:53:02

Answer 4

+2 A:

This can be done by traversing the DFA (includes pseudocode) or else by walking the regex's abstract-syntax tree directly or converting to NFA first, as explained by Doug McIlroy: paper and Haskell code. (He finds the NFA approach to go faster, but he didn't compare it to the DFA.)

These all work on regular expressions without back-references -- that is, 'real' regular expressions rather than Perl regular expressions. To handle the extra Perl features it'd be easiest to add on a post-filter.

Darius Bacon 2009-10-05 23:40:15

Answer 5

+1 A:

This utility on UtilityMill will invert some simple regexen. It is based on this example from the pyparsing wiki. The test cases for this program are:

[A-EA]
[A-D]*
[A-D]{3}
X[A-C]{3}Y
X[A-C]{3}\(
X\d
foobar\d\d
foobar{2}
foobar{2,9}
fooba[rz]{2}
(foobar){2}
([01]\d)|(2[0-5])
([01]\d\d)|(2[0-4]\d)|(25[0-5])
[A-C]{1,2}
[A-C]{0,3}
[A-C]\s[A-C]\s[A-C]
[A-C]\s?[A-C][A-C]
[A-C]\s([A-C][A-C])
[A-C]\s([A-C][A-C])?
[A-C]{2}\d{2}
@|TH[12]
@(@|TH[12])?
@(@|TH[12]|AL[12]|SP[123]|TB(1[0-9]?|20?|[3-9]))?
@(@|TH[12]|AL[12]|SP[123]|TB(1[0-9]?|20?|[3-9])|OH(1[0-9]?|2[0-9]?|30?|[4-9]))?
(([ECMP]|HA|AK)[SD]|HS)T
[A-CV]{2}
A[cglmrstu]|B[aehikr]?|C[adeflmorsu]?|D[bsy]|E[rsu]|F[emr]?|G[ade]|H[efgos]?|I[nr]?|Kr?|L[airu]|M[dgnot]|N[abdeiop]?|Os?|P[abdmortu]?|R[abefghnu]|S[bcegimnr]?|T[abcehilm]|Uu[bhopqst]|U|V|W|Xe|Yb?|Z[nr]
(a|b)|(x|y)
(a|b) (x|y)

Paul McGuire 2010-09-11 03:35:29

Answer 6

+1 A:

Since it is trivially possible to write a regular expression that matches no possible strings, and I believe it is also possible to write a regular expression for which calculating a matching string requires an exhaustive search of possible strings of all lengths, you'll probably need an upper bound on requesting an answer.

Randal Schwartz 2010-09-11 03:39:48

ansaurus

tags:

views:

answers:

create a program that inputs a regular expression and outputs string(s) that satisfy that regular expression

related questions