views:

1612

answers:

8

Recently I've been thinking about finite state machines and how I would implement them in software (programming language doesn't matter).

My understanding is that deterministic state machines are in widespread use (parses/lexers, compilers and so on). Okay, that's great. But what's the matter with non-deterministic state machines?

I know that is possible to convert all non-deterministic state machines to deterministic ones (even programmatically). That's not my point. I also imagine that non-deterministic state machines are much more complicated to implement.

Anyway. Does it make any sense to implement a non-deterministic state machine? Are there any special applications I don't know about? What could be the reasons to do that? Maybe optimized and specialized non-deterministic state machines are faster?

+7  A: 

Most regular expression engines use non-deterministic automata since they offer much greater flexibility. DFAs are much more restricted. Have a look at some implementations and you'll see this. Microsoft even underlines this fact in their documentation of the .NET Regex class:

The .NET Framework regular expression engine is a backtracking regular expression matcher that incorporates a traditional Nondeterministic Finite Automaton (NFA) engine such as that used by Perl, Python, Emacs, and Tcl.

Matching behavior (first paragraph) – this article also offers a rationale for the employment of an NFA rather than the more efficient DFA.

Konrad Rudolph
A: 

Cayuga utilizes non-deterministic finite state machines under the hood for complex event processing. Well, it looks like they call it "Stateful Publish/Subscribe for Event Monitoring", but I believe it is CEP.

I believe some of their papers even discuss why they are using an automata model. You might want to poke around their site.

...Cayuga automata, extended from standard non-deterministic finite automata.

Thomas Owens
As a follow up, I still want to get a system like the one I pointed out here and make something cool that the common person could use...but I'm the lazy.
Thomas Owens
Cayuga automata != NFA. It is a modification which is meant to approach NFA behavior.
ceretullis
"Extended from standard non-deterministic finite automata" means they started with an NFA and modified it.
Thomas Owens
"Extended from standard non-deterministic" != NFA...
ceretullis
+4  A: 

As you know, NFAs and DFAs are computationally equivalent. It's one of the first theorems in automata theory. There are algorithms to convert one to another(unlike Pushdown or turing machines).

So. Why one over the other? Because representation of a given problem with a NFA is far easier than the equivalent DFA.

edit: in terms of actually computing the machine, DFAs are going to go faster because they don't have to backtrack. But they will take more memory to represent. (Mem vs CPU tradeoff)

Paul Nathan
Agreed. But try to implement the state machine yourself - in that respect the non-deterministic ones are much more complicated. What do you think?
Christoph Schiessl
DFAs are going to go faster because they don't have to backtrack. But they will take more memory to represent. (Mem vs CPU tradeoff)
Paul Nathan
With DFAs it's not possible to find the beginning and ending of subexpressions reliably because of possible ambiguities. Thus, backtracking is required. This is the reason why all these languages use NFAs.
Meinersbur
A: 

The viterbi algorithm operates on Hidden Markov Models by treating them much like an NFA. Not entirely identical, but certainly analogous.

They're useful in applications like speech and text recognition.

Nick Johnson
A: 

Very often it is much more easier to create a NFA and then work with it (the only difference is that you hold a set of states instead of one state). If you want to have it fast, you can make DFA, but don't forget, that the time to do it is exponential (because of the resultant automaton can be exponentially bigger!).

On the other hand, if you want to make a complement language, you have no choice, you need a det. variant.

It is the reason why the negation is in none of regular-expression-engine, only in classes ([^...]), where you can be sure that the automaton is deterministic.

A: 

Correct me if I'm wrong but from my compilers class I remember that sometimes you simply can't use DFA as it would lead to an "explosion" of states.

Blago
A: 

I think the main reason for choosing a non-deterministic finite automaton would be to actually get the chosen match back. It's likely a lot harder to do it with a deterministic version.

If all you want to know is IF they match or not, and no other details, I would think compiling down to a finite automaton would be better.

sharth
A: 

My advice = take a look at the manual for Adrian Thurstons Ragel.

There are simple, ways to generate a DFA directly, but I believe they only support a limited range of operators - basically the EBNF usual suspects. Ragel uses non-deterministic methods to compose complex automata from simpler ones, then uses epsilon elimination and minimisation to create efficient deterministic automata. No matter how many wierd operators you need, the conversion to a minimal deterministic automata is always the same, and each operator implementation is kept simple by using nondeterministic methods.

Steve314