tags:

views:

99

answers:

3

Here was my original code:

#include <stdio.h>

#define IN  1   // inside a word
#define OUT 0   // outside a word

// program to print input one word per line

int main(void)
{
  int c, state;

  state = OUT;
  while ((c = getchar()) != EOF) {
    if (c == ' ' || c == '\n' || c == '\t') {
      state = OUT;
      printf("\n");
    }
    else if (state == OUT) {
      state = IN;
    }
    if (state == IN) {
      putchar(c);
    }
  }
  return 0;
}

But the problem was if there were multiple blanks (spaces) or multiple tabs next to each other a newline would be printed for both. So I used a variable (last) to keep track of where I was:

#include <stdio.h>

#define IN  1   // inside a word
#define OUT 0   // outside a word

// program to print input one word per line, corrected bug if there was
// more than one space between words to only print one \n

int main(void)
{
  int c, last, state;

  last = EOF;
  state = OUT;
  while ((c = getchar()) != EOF) {
    if (c == ' ' || c == '\n' || c == '\t') {
      if (last != c) {
        state = OUT;
        printf("\n");
      }
    }
    else if (state == OUT) {
      state = IN;
    }
    if (state == IN) {
      putchar(c);
    }
    last = c;
  }
  return 0;
}

That solved it, except now if there is [blank][tab] next to each other, a newline gets printed for both.

Could someone please help?

+5  A: 

Your problem with your original code is that you will output your newline for every whitespace character. You only want to do it when transitioning from word to non-word:

Change:

if (c == ' ' || c == '\n' || c == '\t') {
    state = OUT;
    printf("\n");
}

to:

if (c == ' ' || c == '\n' || c == '\t') {
  if (state == IN) printf("\n");
  state = OUT;
}

In fact, what I originally thought I'd suggest would be an enumeration for the states along the lines of:

enum eState {IN, OUT};
:
enum eState state = OUT;

but, for a simple finite state machine with only two states, you can just use an boolean:

#include <stdio.h>

#define FALSE (1==0)
#define TRUE  (1==1)
// Or: enum eBoolean {FALSE = 0, TRUE = 1};

int main (void) {
    int ch;
    int inWord = FALSE;     // Or: enum eBoolean inWord = FALSE;

    while ((ch = getchar()) != EOF) {
        if (ch == ' ' || ch == '\n' || ch == '\t') {
            if (inWord)
                printf("\n");
            inWord = FALSE;
        } else {
            inWord = TRUE;
        }
        if (inWord) {
            putchar(ch);
        }
    }
    return 0;
}
paxdiablo
+1, very simply put
Santiago Lezica
+1 the boolean macro `FALSE (1==0)` is simply awesome!
Andrew-Dufresne
@Andrew, that's just paranoia on my part. I think in my early years I wondered what would happen if I struck a platform where FALSE and TRUE weren't 0 and non-zero. Just checking C99, I see that `a==b` will either yield 0 if they're not equal or 1 otherwise so my little trick isn't needed but it at least saves me from having to remember which is which. At my advancing age, the less things I have to remember, the better :-)
paxdiablo
The result of all comparative operators is either 0 or 1. I'm quite certain that was standardised in C89 even, but I can't remember. Also, although `isspace` isn't the same as checking for space, linefeed and tab, surely such a program would benefit from its use, especially when looking for word boundaries.
dreamlax
@paxdiablo: C99 also has another nice thing that you could consider, namely a standard type `_Bool` and a header file `"stdbool.h"` that defines `bool`, `false` and `true`.
Jens Gustedt
schot
Matt2012
+1  A: 

See when you check in your second code

if (last != c) {

You are not checking for all conditions.last could be equal to space, tab or new line. In all such cases it should not print new line. Lets call the set of these three special characters as X.

Now when printing new line, you need to make sure that last character printed does not bring to set X. But you check that last!=current. Now current could be space, tab or new line. But it is only one value. It does not serve our need, our purpose.

So instead replace it with

 if (last != ' ' && last != '\n' && last != '\t' ) {

You can see the code here:

#include <stdio.h>

#define IN  1                   // inside a word
#define OUT 0                   // outside a word

// program to print input one word per line, corrected bug if there was
// more than one space between words to only print one \n

int main(void)
{
    int c, last, state;

    last = 0;  // We need it to make sure that a newline is not printed in case first
               // char is space, tab or new line.
    state = OUT;
    while ((c = getchar()) != EOF) {
        if (c == ' ' || c == '\n' || c == '\t') {
            // if (last != c)
            if (last != ' ' && last != '\n' && last != '\t' && last != 0 )
            {
                state = OUT;
                printf("\n");
            }
        } else if (state == OUT) {
            state = IN;
        }
        if (state == IN) {
            putchar(c);
        }
        last = c;
    }
    return 0;
}

Edit
Fixed the bug paxdiablo pointed out in comments.

Andrew-Dufresne
@Andrew, I bought the code over since I think SO should be as self-contained as possible. Hope you don't mind. You probably also want to set `last` initially to one of the whitespace characters - I think as it stands now, if the first character in the input is a space, you'll get an initial newline (it assumes it's a transition from word to non-word). If you do that (or tell me why I'm wrong, which is always a possibility), I'll give you a vote.
paxdiablo
Never mind, I'll do you a favour. If I'm wrong, you can just revert my changes. Cheers and +1.
paxdiablo
@paxdiablo: No problem, in fact its a great idea to keep SO `as self-contained as possible`. AS for your next point, I am working on it, didn't thought of it earlier, and thanks for +1.
Andrew-Dufresne
@paxdiablo, thanks for pointing out `if the first character in the input is a space, you'll get an initial newline`, really appreciated, Its members like you who have made SO so valuable.
Andrew-Dufresne
+2  A: 

As paxdiablo said, your program is a typical finite state automata (FSA). You have to print a new line in transitions from state OUT to state IN and only then.

Below is how I would write such code. In this particular case it can be made simpler, but the structure is interesting because typical and it applies to any FSA. You have a big external switch with a case for each state. Inside each case, you get another one that materialize transitions, here transition event are input characters. All is left to do is think about what should be done for each transition. Also this structure is quite efficient.

You should keep it in mind, it's really a very common one to have in your toolkit of pre-thought program structures. I certainly do it.

#include <stdio.h>

#define IN  1   // inside a word
#define OUT 0   // outside a word

// program to print input one word per line

int main(void)
{
  int c, state;

  state = OUT;
  while ((c = getchar()) != EOF) {
    switch (state){
    case OUT:
        switch (c){
        case ' ': case '\n': case '\t':
        break;
        default:
            putchar(c);
            state = IN;
        }
    break;
    case IN:
        switch (c){
        case ' ': case '\n': case '\t':
            putchar('\n');
            state = OUT;
        break;
        default:
            putchar(c);
        }
    break;
    }        
  }
  return 0;
}
kriss
+1 for a canonical FSM structure. Probably not needed in this case due to the simplicity but it _is_ a tool every developer should be aware of.
paxdiablo
I haven't learned 'switch' 'case' or 'break' yet, but when I do I'll reread this code and try to keep it in mind. Thank you!
Matt2012