views:

424

answers:

3

I'm writing a program which reads text from a file, and determines the number of sentences, words, and syllables of that file. The trick is, it must only read one character a time, and work with that. Which means it can't just store the whole file in an array.

So, with that in mind, heres how my program works:

while(character != EOF)
{
    check if the character is a end-of-sentence marker (?:;.!)
    check if the character is whitespace (' ' \t \n)
    (must be a letter now)
    check if the letter is a vowel
}

Using a state-machine approach, each time the loop goes through, certain triggers are either 1 or 0, and this effects the count. I have had no trouble counting the sentences or the words, but the syllables are giving my trouble. The definition for syllable that I am using is any vowel or group of vowels counts as 1 syllable, however a single e at the end of a word does not count as a syllable.

With that in mind, I've created code such that

if character = 'A' || 'E' ... || 'o' || 'u'
    if the last character wasnt a vowel then
    set the flag for the letter being a vowel.
    (so that next time through, it doesnt get counted)
    and add one to the syllable count.
    if the last character was a vowel, then dont change the flag and don't
    add to the count. 

Now the problem i have, is my count for a given text file, is very low. The given count is 57 syllables, 36 words, and 3 sentences. I get the sentences correct, same with the words, but my syllable count is only 35.

I also have it setup so that when the program reads a !:;.? or whitespace it will look at the last character read, and if that is an e, it will take one off the syllable count. This takes care of the e being at the end of a word not counting as a vowel.

So with this in mind, I know there must be something wrong with my methodology to get such a vast difference. I must be forgetting something.

Does anyone have some suggestions? I didn't want to include my entire program, but I can include certain blocks if necessary.

EDIT: Some code...

I have if ( end-of-sentence marker), then else if (whitespace), then the final else which entails that only letters which can form words will be in this block. This is the only block of code which should have any effect on the counting of syllables...

if(chrctr == 'A' || chrctr == 'E' || chrctr == 'I' || chrctr == 'O' || chrctr == 'U' || chrctr == 'a' || chrctr == 'e' || chrctr == 'i' || chrctr == 'o'  || chrctr == 'u')
        {
            if(chrctr == 'E' || chrctr == 'e')
            {
                isE = 1;
            }
            else
            {
                isE = 0;
            }
            if(skipSylb != 1)
            {
                endSylb = 1;
                skipSylb = 1;
            }
            else
            {
                endSylb = 0;
                skipSylb = 1;
            }
        }
        else
        {
            endSylb = 0;
            skipSylb = 0;

        }

So to explain... endSylb if 1, later in the program will add one to the count of syllables. skipSylb is used to flag if the last character was also a syllable. If skipSylb = 1, then this is a block of vowels and we only want to add one to the counter. Now I have an isE variable, which just tells the program next time around that the last letter was an E. This means, next time through the while loop, if it is an end of sentence, or whitespace, and the last letter was E (so isE = 1), then we have added one too many syllables.

Hopefully that helps.

Since the value is actually lower then what it should be, i thought perhaps the statements where i minus from the count are important too. I use this if statement to decide when to minus from the count:

 if(isE == 1)
       {
           countSylb --;
       } 

This statement happens when the character is whitespace, or an end of sentence character. I can't think of anything else relevant, but i still feel like im not including enough. Oh well, let me know if something is unclear.

+2  A: 

I also have it setup so that when the program reads a !:;.? or whitespace it will look at the last character read, and if that is an e, it will take one off the syllable count.

This sounds wrong. What about words like "die" and "see"? Obviously you can only decrement the count if the word counted for more than one syllable.

In your case decrementing if the 'e' at the end was not part of a vowel group might suffice.

If that doesn't help: Maybe you don't clear the vowel flag after reading a consonant? I can't tell from your code.

What could really help you is debugging outputs. Let the program tell you what it is doing like:

"Read a vowel: e"

"Not counting the vowel e because [...]"

zockman
Thanks for the input, i've added some of my code above. But honestly think I'm just goinng to have to do some debugging, see if I can narrow it down to a block of code that isn't working.
Blackbinary
+1  A: 

You need a Finite State Machine


In a sense, every program is a state machine, but typically in the programming racket by "state machine" we mean a strictly organized loop that does something like:

while (1) {
  switch(current_state) {
    case STATE_IDLE:
      if (evaluate some condition)
        next_state = STATE_THIS;
      else
        next_state = STATE_THAT;
      break
    case STATE_THIS:
      // some other logic here
      break;
    case STATE_THAT:
      // yet more
      break;
  }
  state = next_state;
}

Yes, you can solve this kind of program with general spaghetti code. Although legacy spaghetti code with literal jumps isn't seen any more, there is a school of thought which resists grouping lots and lots of conditionals and nested conditionals in a single function, in order to minimize cyclomatic complexity. To mix metaphors, a big rat's-nest of conditionals is kind of the modern version of spaghetti code.

By at least organizing the control flow into a state machine you compress some of the logic into a single plane and it becomes much easier to visualize the operations and make individual changes. A structure is created that, while rarely the shortest possible expression, is at least easy to modify and incrementally alter.

DigitalRoss
This is a C question, I think your (pseudo)code would clearer with break; in appropriate places.
John Knoeller
Roger, done....
DigitalRoss
A: 

Looking at your code, I suspect some of the logic has gotten lost in the excessive size. Your main snippet appears equivalent to something like this:

chrctr = tolower(chrctr);

if (strchr(chrctr, "aeiou")) {
    isE = (chrctr == 'e');
    endSylb = !skipSylb;
    skipSylb = 1; // May not be you want, but it's what you have.
}
else {
    skipSylb = endSylb = 0;
}

Personally, I think trying to count syllables algorithmically is nearly hopeless, but if you really want to, I'd take a look at the steps in the Porter stemmer for some guidance about how to break up English words in a semi-meaningful way. It's intended to strip off suffixes, but I suspect the problems being solved are similar enough that it might provide at least a little inspiration.

Jerry Coffin
since this is for an assignment, I've defined in my original post just what a syllable is. I understand that in practice it is much harder to define. To understand my code, this is what happens:As soon as a vowel is found, it adds one to the syllable count. This will cover lone vowels and groups, by setting skipSylb = 1. This means if the next letter is a vowel, it does NOT add to the count again.I'm not sure what you mean by `endSlyb = !skipSylb;` as i've never seen that notation before.
Blackbinary
In `enySylb=!skipSylb`, the '!' means "not", so if `skipSylb==0`, `endSylb` becomes a 1, and if `skipSylb==1`, `endSylb` becomes 0.
Jerry Coffin