views:

628

answers:

9

Hello everyone.

This question may sound cliched but I am in a situation here.

I am trying to implement a finite state automaton to parse a certain string in C. As I started writing the code, I realised the code may be more readable if I used labels to mark the different states and use goto to jump from one state to another as the case comes.

Using the standard breaks and flag variables is quite cumbersome in this case and hard to keep track of the state.

What approach is better ? More than anything else I am worried it may leave a bad impression on my boss, as I am on an internship.

+14  A: 

I'd use a FSM generator, like Ragel, if I wanted to leave a good impression on my boss.

The main benefit of this approach is that you are able to describe your state machine at a higher level of abstraction and don't need to concern yourself of whether to use goto or a switch. Not to mention in the particular case of Ragel that you can automatically get pretty diagrams of your FSM, insert actions at any point, automatically minimize the amount of states and various other benefits. Did I mention that the generated FSMs are also very fast?

The drawbacks are that they're harder to debug (automatic visualization helps a lot here) and that you need to learn a new tool (which is probably not worth it if you have a simple machine and you are not likely to write machines frequently.)

Vinko Vrsalovic
+1 - don't reinvent the wheel.
Eric Petroelje
The problems I've had in using non-trivial code generators are 1) the debugging, and 2) the need to learn a new tool... If the tool gives me something useful that would be difficult to achieve without it, or if I'll use the tool enough to make up for the time investment in learning it (note debuggability as an important factor here), great, I'll do it. I wouldn't bother for a trivial FSM, but might for a more complicated one.
Aidan Cully
I wouldn't advocate GOTO because the FSM generators do it. There is a difference between a code generator (which presumably has been tested) creating GOTOs and a programmer doing it. I am not sure which way to go here. But I wouldn't let the fact that the generators use GOTOs affect the decision either way (except of course if I was writing a generator to do it...).
Cervo
@Aidan: I mostly agree. Although some generators have debugging facilites that alleviate the debugging problem, they do not alleviate the 'learn a new tool' problem. For simple machines they are probably overkill, unless you already know the tool, which might be an incentive to learn it anyway :-)
Vinko Vrsalovic
@Cervo: Correct. My intention wasn't to advocate the use of goto for user generated code, although rereading the paragraph it does seem that way.
Vinko Vrsalovic
@Aidan, @Cervo: Edited reflecting your comments.
Vinko Vrsalovic
@Vinko Thanks, that's really a very useful tool. I will definitely explore it for. Right now I only need to build a small state machine consisting of not more than 6 states, so I might better use a state variable and switch cases approach.
Abhinav Upadhyay
+9  A: 

I would use a variable that tracks what state you are in and a switch to handle them:

fsm_ctx_t ctx = ...;
state_t state = INITIAL_STATE;

while (state != DONE)
{
    switch (state)
    {
    case INITIAL_STATE:
    case SOME_STATE:
        state = handle_some_state(ctx)
        break;

    case OTHER_STATE:
        state = handle_other_state(ctx);
        break;
    }
}
R Samuel Klatchko
+3  A: 

I don't know your specific code, but is there a reason something like this:

typedef enum {
    STATE1, STATE2, STATE3
} myState_e;

void myFsm(void)
{
    myState_e State = STATE1;

    while(1)
    {
        switch(State)
        {
            case STATE1:
                State = STATE2;
                break;
            case STATE2:
                State = STATE3;
                break;
            case STATE3:
                State = STATE1;
                break;
        }
    }
}

wouldn't work for you? It doesn't use goto, and is relatively easy to follow.

Edit: All those State = fragments violate DRY, so I might instead do something like:

typedef int (*myStateFn_t)(int OldState);

int myStateFn_Reset(int OldState, void *ObjP);
int myStateFn_Start(int OldState, void *ObjP);
int myStateFn_Process(int OldState, void *ObjP);

myStateFn_t myStateFns[] = {
#define MY_STATE_RESET 0
   myStateFn_Reset,
#define MY_STATE_START 1
   myStateFn_Start,
#define MY_STATE_PROCESS 2
   myStateFn_Process
}

int myStateFn_Reset(int OldState, void *ObjP)
{
    return shouldStart(ObjP) ? MY_STATE_START : MY_STATE_RESET;
}

int myStateFn_Start(int OldState, void *ObjP)
{
    resetState(ObjP);
    return MY_STATE_PROCESS;
}

int myStateFn_Process(int OldState, void *ObjP)
{
    return (process(ObjP) == DONE) ? MY_STATE_RESET : MY_STATE_PROCESS;
}

int stateValid(int StateFnSize, int State)
{
    return (State >= 0 && State < StateFnSize);
}

int stateFnRunOne(myStateFn_t StateFns, int StateFnSize, int State, void *ObjP)
{
    return StateFns[OldState])(State, ObjP);
}

void stateFnRun(myStateFn_t StateFns, int StateFnSize, int CurState, void *ObjP)
{
    int NextState;

    while(stateValid(CurState))
    {
        NextState = stateFnRunOne(StateFns, StateFnSize, CurState, ObjP);
        if(! stateValid(NextState))
            LOG_THIS(CurState, NextState);
        CurState = NextState;
    }
}

which is, of course, much longer than the first attempt (funny thing about DRY). But it's also more robust - failure to return the state from one of the state functions will result in a compiler warning, rather than silently ignore a missing State = in the earlier code.

Aidan Cully
...isn't that just goto in effect?
Vuntic
No, because the compiler handles the labels, the jumping, and the stack. It's like suggesting that a million lines of an expressive language like Lisp or Python could be assembler, in effect. Sure, they might do the same task after everything is said and done, but they're not remotely the same.
DeadMG
You could achieve the same effect with `goto`, but it's not the same. The control flow is much more rigidly defined than with goto - you have to get all the way out of the `switch` statement before you start evaluating the code for the new state. (For example, it's impossible for some undesired chunk of this function to branch directly to one of the case blocks.)
Aidan Cully
@Aidan +1 thanks. That code will really work for me.
Abhinav Upadhyay
+11  A: 

Using a goto for implementing a state machine often makes good sense. If you're really concerned about using a goto, a reasonable alternative is often to have a state variable that you modify, and a switch statement based on that:

typedef enum {s0,s1,s2,s3,s4,...,sn,sexit} state;

state nextstate;
int done = 0;

nextstate = s0;  /* set up to start with the first state */
while(!done)
   switch(nextstate)
      {
         case s0:
            nextstate = do_state_0();
            break;
         case s1:
            nextstate = do_state_1();
            break;
         case s2:
            nextstate = do_state_2();
            break;
         case s3:
             .
             .
             .
             .
         case sn:
            nextstate = do_state_n();
            break;
         case sexit:
            done = TRUE;
            break;
         default:
            /*  some sort of unknown state */
            break;
      }
Jerry Coffin
+2  A: 

I would recommend you the "Dragon book": Compilers, Principles-Techniques-Tools from Aho, Sethi and Ullman. (It is rather expensive to buy but you for sure will find it in a library). There you will find anything you will need to parse strings and build finite automatons. There is no place I could find with a goto. Usually the states are a data table and transitions are functions like accept_space()

jdehaan
+6  A: 

Goto isn't neccessary evil, and I have to strongly disagree with Denis, yes goto might be a bad idea in most cases, but there are uses. The biggest fear with goto is so called "spagetti-code", untraceable code paths. If you can avoid that and if it will always be clear how the code behaves and you don't jump out of the function with a goto, there is nothing against goto. Just use it with caution and if you are tempted to use it, really evaluate the situation and find a better solution. If you unable to do this, goto can be used.

Femaref
+5  A: 

Avoid goto unless the complexity added (to avoid) is more confusing.

In practical engineering problems, there's room for goto used very sparingly. Academics and non-engineers wring their fingers needlessly over using goto. That said, if you paint yourself into an implementation corner where a lot of goto is the only way out, rethink the solution.

A correctly working solution is usually the primary objective. Making it correct and maintainable (by minimizing complexity) has many life cycle benefits. Make it work first, and then clean it up gradually, preferably by simplifying and removing ugliness.

wallyk
+21  A: 

There is nothing inherently wrong with goto. The reason they are often considered "taboo" is because of the way that some programmers (often coming from the assembly world) use them to create "spaghetti" code that is nearly impossible to understand. If you can use goto statements while keeping your code clean, readable, and bug-free, then more power to you.

Using goto statements and a section of code for each state is definitely one way of writing a state machine. The other method is to create a variable that will hold the current state and to use a switch statement (or similar) to select which code block to execute based on the value of the state variable. See Aidan Cully's answer for a good template using this second method.

In reality, the two methods are very similar. If you write a state machine using the state variable method and compile it, the generated assembly may very well resemble code written using the goto method (depending on your compiler's level of optimization). The goto method can be seen as optimizing out the extra variable and loop from the state variable method. Which method you use is a matter of personal choice, and as long as you are producing working, readable code I would hope that your boss wouldn't think any different of you for using one method over the other.

If you are adding this code to an existing code base which already contains state machines, I would recommend that you follow whichever convention is already in use.

bta
there is wisdom in following whichever convention is already in use
Cervo
+1: Sanity and wisdom
Vinko Vrsalovic
@bta Thanks. That's a pretty useful advice . This state machine problem just came inside my own project, and the code provided by others is really clean, so I will take that approach.
Abhinav Upadhyay
+1  A: 

I can't see much of a difference between goto and switch. I might prefer switch/while because it gives you a place guaranteed to execute after the switch (where you could throw in logging and reason about your program). With GOTO you just keep jumping from label to label, so to throw in logging you'd have to put it at every label.

But aside from that there shouldn't be much difference. Either way, if you didn't break it up into functions and not every state uses/initializes all local variables you may end up with a mess of almost spaghetti code not knowing which states changed which variables and making it very difficult to debug/reason about.

As an aside, can you maybe parse the string using a regular expression? Most programming languages have libraries that allow using them. The regular expressions often create an FSM as part of their implementation. Generally regular expressions work for non arbitrarily nested items and for everything else there is a parser generator(ANTLR/YACC/LEX). It is generally much easier to maintain a grammar/regex than the underlying state machine. Also you said you were on an internship, and generally they might give you easier work than say a senior developer, so there is a strong chance that a regex may work on the string. Also regular expressions generally aren't emphasized in college so try using Google to read up on them.

Cervo