ansaurus

Question

regular expression "contains" another regular expression

Answer 1

A:

Surely applying RegEX1 to RegEx2 would give a match (although that approach would only work in trivial cases like those given)

Rowland Shaw 2009-01-06 13:17:08

That would not work in general - RexEx1 would treat RegEx2's metacharacters as normal characters.

Avi 2009-01-06 13:30:22

Hence the comment saying it only works in trivial cases, such as that given

Rowland Shaw 2009-06-10 21:07:44

Answer 2

+5 A:

Yes.

This paper contains a detailed discussion of the topic (see section 4.4).

joel.neely 2009-01-06 13:20:29

Can you clarify your "yes". I think you are saying "Yes, you are wrong" and citing the paper that shows how it can be done (from a quick look at the paper). But it would be worth spelling that out explicitly.

Jonathan Leffler 2009-01-06 13:28:23

The paper mentioned only says "It is a well known result that for two regular expressions B and R, it is readily decidable whether B subsumes R" and then goes on to describe "content models." Also, the paper's method appears to be simply enumerating all strings with length < n (calculated somehow?) and checking whether they are in the second expression but not the first. Decidable, perhaps, but not exactly feasible with 26^n options even without considering case and punctuation.

Clueless 2010-02-24 06:25:55

Answer 3

A:

Converting the two expressions to the equivalent state machines, and checking all paths in both machines allow the same matches, should do the trick. The pumping lemme should obviously be minded, so avoid revisiting old nodes.

It would only work for "simple" regular expressions (or real, what have you, perls recursive expressions are much more expressive).

While a graph of the state machine could have a large number of paths, it should still be limited (esp if the source for the expressions are human). So you'd find all the allowable paths of RegEX1, and check, each in turn, if it's allowable in RegEX2. If all paths are valid, you'd know that the one is contained in the other.

Svend 2009-06-10 12:41:52

Is it possible (in a reasonable time) to run a test to get a hierarchy of regular expression (several hundreds of them)? can you provide pointers to code that does that?

Dror 2009-06-10 13:56:57

I don't see why it would not possible, and in decent time as well. You'd have to build this from scratch though, which is not trivial.

Svend 2009-06-10 15:00:55

checking "all paths are valid" for all pairs would probably take a very long time. "checking all paths in both machines" as you say may be infinite, or am i missing something?

Dror 2009-06-11 05:30:11

1. You shouldn't need to revisit nodes due to the pumping lemma2. Not all pairs need to be checked. You find the paths in one, and see if it's allowable in the other. A large class of paths will obviously share parts in common, those need only be checked once, etc.

Svend 2009-06-11 09:34:53

ansaurus

tags:

views:

answers:

regular expression "contains" another regular expression

related questions