How often does the state machine change? If it stays constant for a while, what I do is translate it into a hard-coded routine in whatever language I prefer.
Where does the transition diagram of the state machine come from? Does it come from a regular expression? You know you can translate that directly into structured code without first building an arc-set. In fact, it might just be easier to write that code to begin with.
Then I either just make that part of the app, or I dynamically compile and link it into a dll and dynamically load it. The latter takes just a couple seconds.
Finite state machines are so simple that they should basically run in a small fraction of the time it takes to load the I/O buffer. They shouldn't be doing any unnecessary memory allocation, data structure building, any of that OO hoo-haw.
Remember, a finite state machine represented as a set of states and arc tuples is a theoretical model. It exists, like Turing machines, in order to prove theorems about it. Just like a Turing machine, it is not necessarily a good implementation technique in code.
Just to show you what I mean, consider a regular expression for decimal integers:
dd*
where 'd' means digit. As a finite-state machine (tuples) it would be:
A d -> B
B d -> B
as code it would be:
char c = getc();
if (DIGIT(c)){
c = getc();
while(DIGIT(c)){
c = getc();
}
}
See the similarity of this code with the regular expression?
Don't think of c = getc()
as "get the next character c".
Think of it as "accept the current character c".
I hope you can see the code can't really get much faster than this.
If you're really starting from an arc-set, you could generate this:
c = getc();
A:
if (DIGIT(c)){c = getc(); goto B;}
goto END;
B:
if (DIGIT(c)){c = getc(); goto B;}
goto END;
END:
which is spaghetti code, but not any more than the arc-set is, and it will be just as fast as the structured code.
(In fact, that's more or less what the compiler converts your structured code into.)