views:

166

answers:

4

Hey all,
I am designing a program that will accept from input a series of tokens and feed them to a finite state machine which I have designed. I have designed a test finite state machine in object-oriented style, with structs for the machine itself and transitions, etc. but the application I am writing is one where speed is very important.

So far, using the machine, adding new states and the like, has proven to be easy and not very complicated. It is easy to comprehend and leaving for a month and coming back to the code will not be very disorienting. However, I am not sure what the speed trade off is with the current OO approach. Will the allocation of the objects, storage of data, etc. take away much of the speed that would be had by using a bunch of labels and goto statements?

+3  A: 

I would prefer well structured organized, maintainable, comprehendable (even after leaving for a month, thats quite a quality) code instead of a "bunch of labels and goto statements." Being said that, I would also run a profiler on the code to detect the bottlenecks when doing a speed analysis.

ArunSaha
Depends. If by "well structured" you mean "without goto's", I'd use goto's if appropriate. If you need a "JUMP" it cannot be replaced by a "CALL".
ruslik
@ruslik: Godo one :-), but no, I did not meaned that. I meant "well organized."
ArunSaha
+3  A: 

Rather than thinking about it in terms of OO being slower than functional or procedural programming think about it in terms of operations. Calling a function, branching, fetching, storing etc... all take some time and to get an idea of the performance of each solution you'd need to tally up how much of each of these you'll need to do.

The best way to do this is to use your OO test solution and see if it's fast enough. If not then profile it, work out which branches or stores are costing you the most and see if you can avoid or streamline them. Slowly re-architect your solution until it meets the performance goals you're after. It may be that in some cases you'll have to adopt a more functional or procedural style.

Lastly if you do code up a single function that consists of hundreds of stack variables and goto you've done it wrong. Code must always be readable and maintainable.

Extra thought: Can you just spend another $5,000 on better hardware to avoid $50,000 in development costs optimising?

Daniel
Although I do understand what you mean, this program will not run on one machine, it is for people to download and run on their home computers. So for most people, no, they will not spend any money to buy better hardware to help my program run faster. Also this program is in a market niche where it must compete with other similar products, and faster is better. So the speed goal is "as fast as we can bloody make it" and by definition, the program will never be "fast enough". That said, profiling will probably be the final word on it. Thanks.
SuperSoaper
+1  A: 

Simply put, the processor is faster at doing table lookups upon table lookups than branches. What you describe as the OO style will be faster, as long as the table is tractable and small enough to be fit into cache. (I wouldn't certainly say that either paradigm is associated with either implementation, though.)

To be very moderately more technical: The processor has 32-64 K of L1 cache and a few to a couple dozen megabytes of L2-L3. The branch caches amount to a few kilobytes at most.

Potatoswatter
This is actually the answer I am looking for. In other notes, a simple test program that runs the machine 1,000,000 times was profiled with gprof. gprof says all the functions together took 0.1 time. The function that took the most time out of all the child functions of the machine is the one that determines whether a state is acceptable as the next state by looping through an array of acceptable previous states and comparing the previous state with each one. This sounds like I could perhaps use bit packing to get rid of the loop.
SuperSoaper
@Super: Well, if you like the answer, you can click the up arrow at top left, and if it's the best one and resolves the question, click the checkmark. Welcome to SO!
Potatoswatter
@SuperSoaper: There are better ways to profile than gprof. http://stackoverflow.com/questions/1777556/alternatives-to-gprof/1779343#1779343
Mike Dunlavey
@Super: I didn't read that closely enough. If you get through a million runs of the program in 0.1 seconds, then you aren't outstripping any cache. Either you don't need to optimize at all, or you need a larger testcase to expose the actual behavior of the program.
Potatoswatter
+1  A: 

How often does the state machine change? If it stays constant for a while, what I do is translate it into a hard-coded routine in whatever language I prefer.

Where does the transition diagram of the state machine come from? Does it come from a regular expression? You know you can translate that directly into structured code without first building an arc-set. In fact, it might just be easier to write that code to begin with.

Then I either just make that part of the app, or I dynamically compile and link it into a dll and dynamically load it. The latter takes just a couple seconds.

Finite state machines are so simple that they should basically run in a small fraction of the time it takes to load the I/O buffer. They shouldn't be doing any unnecessary memory allocation, data structure building, any of that OO hoo-haw.

Remember, a finite state machine represented as a set of states and arc tuples is a theoretical model. It exists, like Turing machines, in order to prove theorems about it. Just like a Turing machine, it is not necessarily a good implementation technique in code.


Just to show you what I mean, consider a regular expression for decimal integers:

dd*

where 'd' means digit. As a finite-state machine (tuples) it would be:

A d -> B
B d -> B

as code it would be:

char c = getc();
if (DIGIT(c)){
  c = getc();
  while(DIGIT(c)){
    c = getc();
  }
}

See the similarity of this code with the regular expression? Don't think of c = getc() as "get the next character c". Think of it as "accept the current character c". I hope you can see the code can't really get much faster than this.

If you're really starting from an arc-set, you could generate this:

c = getc();

A:
if (DIGIT(c)){c = getc(); goto B;}
goto END;

B:
if (DIGIT(c)){c = getc(); goto B;}
goto END;

END:

which is spaghetti code, but not any more than the arc-set is, and it will be just as fast as the structured code. (In fact, that's more or less what the compiler converts your structured code into.)

Mike Dunlavey