tags:

views:

72

answers:

2

I had a general question earlier and got great responses, but now (even though it's the same exercise) I have a more specific question and thought it deserved it's own Q&A page.

Here's my program (not completed, but it compiles and runs ok) :

#include <stdio.h>

#define IN  1 // inside a word
#define OUT 0 // outside a word

// #define MAXWORDLENGTH 10

// print a histogram of the length of words in input. horizontal bar version

int main(void)
{
  int c, state, length;
  int one, two, three, more;

  length = 0;
  one = two = three = more = 0;

  state = OUT;
  while ((c = getchar()) != EOF) {
    if (c == ' ' || c == '\t' || c == '\n')
      state = OUT;
    else state = IN;
    if (state == IN) {
      ++length;
      while ((c = getchar()) != EOF && state != OUT) { // get next character in word
        if (c != ' ' && c != '\t' && c != '\n') // if character is not whitespace...
          ++length; // ...add one to length
        else state = OUT; // otherwise word is over
      }
    if (length != 0) {
      if (length == 1)
        ++one;
      if (length == 2)
        ++two;
      if (length == 3)
        ++three;
      if (length > 3)
        ++more;
    }
    length = 0;
    }
  }

  printf("----------------------------------------------------------------\n");
  // print histogram
  printf("ONE: %d\tTWO: %d\tTHREE: %d\tMORE: %d\n", one, two, three, more); // just making sure program collects data right, which it isn't...
  printf("----------------------------------------------------------------\n");

  return 0;
}

This should be self-explanatory with the comments. I'm using integers for the moment (will switch to an array) and only printing the tallies to make sure it's collecting data right.

Here's the actual program being compiled, run, and tested...

[matt@localhost 1.6]$ cc -Wall e-1-13.c
[matt@localhost 1.6]$ ./a.out 
1 22 333 4444 55555
----------------------------------------------------------------
ONE: 2  TWO: 1  THREE: 1    MORE: 1
----------------------------------------------------------------
[matt@localhost 1.6]$ ./a.out 
1 1 1 1 1 1
----------------------------------------------------------------
ONE: 3  TWO: 0  THREE: 0    MORE: 0
----------------------------------------------------------------
[matt@localhost 1.6]$ ./a.out 
22 22 22 22 22
----------------------------------------------------------------
ONE: 4  TWO: 1  THREE: 0    MORE: 0
----------------------------------------------------------------
[matt@localhost 1.6]$ 

... as you can see it's not. Sometimes the tally is correct, sometimes it's missing some, sometimes it's got too many. Any idea why?

+4  A: 

I think you need to break out of the loop immediately when you hit the else state = OUT; in the while loop. You're consuming an extra char.

Try changing else state = OUT; to

else {
    state = OUT;
    break;
}

EDIT To elaborate on why this works, let's evaluate this code:

  // 1. while loop conditional
  while ((c = getchar()) != EOF && state != OUT) { // get next character in word
    // 2. conditional succeeded
    if (c != ' ' && c != '\t' && c != '\n') // if character is not whitespace...
      ++length; // ...add one to length
    else state = OUT; // otherwise word is over
  }

The while loop condition (1) is determined by calling getchar and then evaluating state. If this condition succeeds, the code enters into the success block (2). The problem is that you assign state = OUT in the else clause, but require the conditional to be evaluated again, which means you're going to call getchar, and then see that state == OUT. In effect, you skip processing a character you consume.

Short circuiting deals with the order in which a conditional is evaluated. Take a simple example:

int foo = 0;
int bar == 1;

if (foo && bar) { ... }

Since foo is false (0), there's no need to evaluate bar because a false value causes an && to evaluate to false regardless of the rest of the conditions. Similarly if we wrote:

if (bar || foo) {...}

then there's no need to evaluate foo since bar is true, and a true value causes an || to evaluate to true regardless of the rest of the conditionals. The ability to skip evaluating the rest of a condition is called short circuiting.

In your case, if you had swapped the order of the while loop condition to

while (state != OUT && (c = getchar()) != EOF) {

then state != OUT would evaluate to false, which would prevent evaluation of c = getchar() due to short circuiting. Here's a tutorial on it if you want to know more. Hope that helps.

SB
Yep. Another option is to swap the order of the conditionals on the inner while and take advantage of the short-circuiting.
Dusty
agreed - exactly what JoshD is proposing in his answer. Might as well break immediately if you know the short circuit will fail anyway.
SB
This works great, thank you. I'm not sure I understand it (or why) it works though? I'm sorry I'm new, but also what does short circuiting mean anyway? [Edit] And what do you mean by I'm "consuming an extra char."
Matt2012
+3  A: 

Change the order of your condition in the while loops:

No:

while(c = getchar()) != EOF && state != OUT)

Yes:

while(state != OUT && c = getchar()) != EOF)
JoshD
I used the above `break` instead, but it's my understanding that both do the same thing. I'm curious as to why this (the order) matters, if **both** conditions must be met for the loop to begin?
Matt2012
JoshD