views:

138

answers:

3

Hi,

I'm developing a software to generate a Turing Machine from a regular expression.

[ EDIT: To clarify, the OP wants to take a regular expression as input, and programmatically generate a Turing Machine to perform the same task. OP is seeking to perform the task of creating a TM from a regular expression, not using a regular expression. ]

First I'll explain a bit what I have done and then what is my specific problem:

I've modeled the regular expression as follows:

RegularExpression (interface): the classes below implements this interface

Simple (ie: "aaa","bbb","abcde"): this is a leaf class it does not have any subexpressions

ComplexWithoutOr (ie: "a(ab)*","(a(ab)c(b))*"): this class contains a list of RegularExpression.

ComplexWithOr (ie: "a(a|b)","(a((ab)|c(b))"): this class contains an Or operation, which contains a list of RegularExpression. It represents the "a|b" part of the first example and the "(ab)|c(b)" of the second one.

Variable (ie: "awcw", where w E {a,b}*): this is not yet implemented, but the idea is to model it as a leaf class with some different logic from Simple. It represents the "w" part of the examples.

It is important that you understand and agree with the model above. If you have questions make a comment, before continue reading...

When it comes to MT generation, I have different levels of complexity:

Simple: this type of expression is already working. Generates a new state for each letter and moves right. If in any state, the letter read is not the expected, it starts a "rollback circuit" that finishes with the MT head in the initial position and stops in a not final state.

ComplexWithoutOr: here it comes my problem. Here, the algorithm generates an MT for each subexpression and concat them. This work for some simple cases, but I have problems with the rollback mechanism.

Here is an example that does not work with my algorithm:

"(ab)abac": this is a ComplexWithoutOr expression that contains a ComplexWithOr expression "(ab)" (that has a Simple expression inside "ab") and a Simple expression "abac"

My algorithm generates first an MT1 for "ab". This MT1 is used by the MT2 for "(ab)*", so if MT1 succeed it enters again in MT1, otherwise MT1 rollbacks and MT2 finishes right. In other words, MT2 cannot fail.

Then, it generates an MT3 for "abac". The output of MT2 it is the input of MT3. The output of MT3 is the result of the execution

Now, let suppose this input string: "abac". As you can see it matches with the regular expression. So let see what happens when the MT is executed.

MT1 is executed right the first time "ab". MT1 fails the second time "ac" and rollback, putting the MT head in the 3rd position "a". MT2 finishes right and input is forwarded to MT3. MT3 fails, because "ac"!="abac". So MT does not recognize "abac".

Do you understand the problem? Do you know any solution for this?

I'm using Java to develop it, but the language it is not important, I'd like to discuss the algorithm.

A: 

in the chomsky hierarchy a regex is Level3, whereas a TM is Level1. this means, that a TM can produce any regex, but not vice versa.

guest
If I give you any regex, you are able to generate a TM that recognize that language. I want to automate that process.
Neuquino
@guest: "from" is not "using". OP wants to take a regular expression as input, and programmatically generate a Turing Machine to perform the same task. OP is seeking to perform the task of creating a TM from a regular expression, not using a regular expression.
CPerkins
+1  A: 

It is not entirely clear to me what exactly you are trying to implement. It looks like you want to make a Turing Machine (or any FSM in general) that accepts only those strings that are also accepted by the regular expression. In effect, you want to convert a regular expression to a FSM.

Actually that is exactly what a real regex matcher does under the hood. I think this series of articles by Russ Cox covers a lot of what you want to do.

MAK
Thank you @MAK, those series of articles were very helpful. Now I'm trying to rewrite the regex from the user to a simpler one (but equivalent). So then I can use my algorithm to generate the TM. I think that rewriting the regex, it should be possible to avoid this problem I'm stack right now.
Neuquino
@Neuquino: Notice that all proper regexes can be reduced to ones using only the symbols `*+?|.` and literal characters. So, if your algorithms can make a TM/FSM for regexes using that simpler set of symbols, it is basically as good as any regex engine.
MAK
@MAK: do you know if is there any library or source code that makes that reduction? Or at least, some theory on which the assertion of "all proper regexes can be reduced to ones using only the symbols *+?|. and literal characters" is based on, so I can build an algorithm to reduce the input regex.
Neuquino
@Neuquino: Basically you have to allow for character classes and the `^$` operators, as well as allowing escape sequences. A character class like `\w` or `[abcdefghijklmnopqrstuvwxyz]` or `[a-z]` or `[^a-m]` can be trivially converted into a string like `a|b|c|...|z`. You can augment algorithm to handle special cases like `$`,`^` and `\b` by adding extra states for them. Escape sequences like `\n` and `\t` or even `\\` are just a matter of substituting them for the proper ASCII character.
MAK
+1  A: 

Michael Sipser, in Introduction to the Theory of Computation, proves in chapter 1 that regular expressions are equivalent to finite automata in their descriptive power. Part of the proof involves constructing a nondeterministic finite automaton (NDFA) that recognizes the language described by a specific regular expression. I'm not about to copy half that chapter, which would be quite hard due to the notation used, so I suggest you borrow or purchase the book (or perhaps a Google search using these terms will turn up a similar proof) and use that proof as the basis for your algorithm.

As Turing machines can simulate an NDFA, I assume an algorithm to produce an NDFA is good enough.

Confusion
Thanks @Confusion. I downloaded a digital version of the book. In these days I'll start reading it.
Neuquino