Does an algorithm exist which can determine whether one regular language matches any input another regular language matches?

views:

153

answers:

+7 Q:

Does an algorithm exist which can determine whether one regular language matches any input another regular language matches?

Let's say we have regular expressions:

Hello W.*rld
Hello World
.* World
.* W.*

I would like to minimize the number of regexes required to match arbitrary input.

To do that, I need to find if one regular expression matches any input matched by another expression. Is that possible?

Billy3

+1 A:

Yes.

The problem of equivalence of two regular languages is decidable.

Sketch of an algorithm:

minimize both DFAs
check if they are isomorph

maxschlepzig 2010-09-02 19:12:25

@maxschlepzig: Graph isomorphism is not solvable in polynomial time, so I don't see how that helps.

Billy ONeal 2010-09-02 19:15:28

@Billy: I guess your answer is that this is a theoretically solvable problem that isn't practical to solve.

szbalint 2010-09-02 19:23:30

@szbalint: Well "theoretically" I could pose every possible input string to the languages and see if they match the same thing. If it's not solvable on reasonable consumer hardware, then there's little point.

Billy ONeal 2010-09-02 19:27:36

@Billy ONeal: What have graph isomorphism to do with it? And don't bash theoretical cs. The decidable results are of practical interest. E.g. the problem of equivalence of two context free languages is not decidable.

maxschlepzig 2010-09-02 19:47:55

@maxschlepzig: I'm not "bashing theoretical CS" -- but I think it's obvious enough from my question that proving that the problem is merely decidable would not be all that useful of an answer.

Billy ONeal 2010-09-02 20:00:44

+2 A:

Sure!. A regular expression can be represented as an FSM (Finite State Machine) and there are technically infinite number of FSM that can recognize the same string.

Isomorphism is the name that describes if two FSM are equivalent. There are a couple of algorigthm to minimize an FSM. For example the Hopcroft minimization algorithm can minimize two FSM in O(n log n), on an n state automaton.

Dani Cricco 2010-09-02 19:18:07

@Dani: Same problem with maxschlepzig's answer. Isomorphism is in class NP.

Billy ONeal 2010-09-02 19:28:16

@Billy ONeal: First, (graph) isomorphism is in NP (that's true), but is believed not to be NP-complete, although not in P. However, we are talking about DFA isomorphism, which is a _completely_ different thing.

jpalecek 2010-09-02 19:50:08

@jpalecek: How is DFA isomorphism any different? Is a DFA nothing more than a digraph?

Billy ONeal 2010-09-02 20:23:01

@Billy: DFA isomorphism is in class P

Dani Cricco 2010-09-02 20:25:48

@Billy DFA isomorphism is simpler. You only have one starting point

Dani Cricco 2010-09-02 20:28:52

@Billy ONeal: Basically, the biggest difference between a DFA and a graph is that a DFA is labeled, so you know what to match in the first place (initial/terminal states, edges with the same letter). Also, there are DFAs that are isomorphic (we are talking about isomorphism defined by equivalence of suffixes starting from a state, there are other possible definitions of isomorphism) but their graph representation is not.

jpalecek 2010-09-02 21:00:12

Just to be fair: To do what you want (which is equivalent to testing regexp equivalence), you need to do a possibly exponential NFA-to-DFA conversion. This problem is maybe not even in NP. However, this doesn't make it impractical; exponential algorithms are used all around. You just have to assess the trade-offs and if this is not what you need, you have to look for other ways to solve your primary problem (which we know nothing about).

jpalecek 2010-09-02 21:05:46

+4 A:

Any regular expression can be linked to a DFA - you can minimize the DFA and since the minimal form is unique, you can decide whether two expressions are equivalent. Dani Cricco pointed out the Hopcroft O(n log n) algorithm. There is another improved algorithm by Hopcroft and Craft which tests the equivalence of two DFAs in O(n).

For a good survey on the matter and an interesting approach to this, I reccomend the paper Testing the Equivalence of Regular Languages, from arXiv.

Later edit: if you are interested in inclusion rather than equivalence for regular expressions, I have come across a paper that might be of interest: Inclusion Problem for Regular Expressions - I have only skimmed through it but it seems to contain a polynomial time algorithm to the problem.

Lawrence 2010-09-02 19:31:59

@Lawrence: Hmmm.. interesting. One issue though is that `.*` and `Hello World` are decidedly not equivalent, though `.*` can match anything `Hello World` can match.

Billy ONeal 2010-09-02 19:37:56

I'm not sure of the meaning of "match" for you - it seems as though you don't want to test equivalence but rather inclusion. Can you be more exact with your question?

Lawrence 2010-09-02 19:41:45

@Lawrence: My difficulty is that I don't know exactly how to describe what I'm looking for -- I'm sorry for the runaround here. I have modified the question slightly -- from Wikipedia's description on set theory inclusion does seem to what I need.

Billy ONeal 2010-09-02 19:58:19

@Billy: I think the linked paper I just added is what you're looking for.

Lawrence 2010-09-02 20:00:43

@Billy ONeal: If you're OK with an approximation (which you probably are), the last paper in this answer is probably good for you.

jpalecek 2010-09-02 21:19:10

ansaurus

tags:

views:

answers:

Does an algorithm exist which can determine whether one regular language matches any input another regular language matches?

related questions