I'm writing a program which reads text from a file, and determines the number of sentences, words, and syllables of that file. The trick is, it must only read one character a time, and work with that. Which means it can't just store the whole file in an array.
So, with that in mind, heres how my program works:
while(character != EOF)
{
check if the character is a end-of-sentence marker (?:;.!)
check if the character is whitespace (' ' \t \n)
(must be a letter now)
check if the letter is a vowel
}
Using a state-machine approach, each time the loop goes through, certain triggers are either 1 or 0, and this effects the count. I have had no trouble counting the sentences or the words, but the syllables are giving my trouble. The definition for syllable that I am using is any vowel or group of vowels counts as 1 syllable, however a single e at the end of a word does not count as a syllable.
With that in mind, I've created code such that
if character = 'A' || 'E' ... || 'o' || 'u'
if the last character wasnt a vowel then
set the flag for the letter being a vowel.
(so that next time through, it doesnt get counted)
and add one to the syllable count.
if the last character was a vowel, then dont change the flag and don't
add to the count.
Now the problem i have, is my count for a given text file, is very low. The given count is 57 syllables, 36 words, and 3 sentences. I get the sentences correct, same with the words, but my syllable count is only 35.
I also have it setup so that when the program reads a !:;.? or whitespace it will look at the last character read, and if that is an e, it will take one off the syllable count. This takes care of the e being at the end of a word not counting as a vowel.
So with this in mind, I know there must be something wrong with my methodology to get such a vast difference. I must be forgetting something.
Does anyone have some suggestions? I didn't want to include my entire program, but I can include certain blocks if necessary.
EDIT: Some code...
I have if ( end-of-sentence marker), then else if (whitespace), then the final else which entails that only letters which can form words will be in this block. This is the only block of code which should have any effect on the counting of syllables...
if(chrctr == 'A' || chrctr == 'E' || chrctr == 'I' || chrctr == 'O' || chrctr == 'U' || chrctr == 'a' || chrctr == 'e' || chrctr == 'i' || chrctr == 'o' || chrctr == 'u')
{
if(chrctr == 'E' || chrctr == 'e')
{
isE = 1;
}
else
{
isE = 0;
}
if(skipSylb != 1)
{
endSylb = 1;
skipSylb = 1;
}
else
{
endSylb = 0;
skipSylb = 1;
}
}
else
{
endSylb = 0;
skipSylb = 0;
}
So to explain... endSylb if 1, later in the program will add one to the count of syllables. skipSylb is used to flag if the last character was also a syllable. If skipSylb = 1, then this is a block of vowels and we only want to add one to the counter. Now I have an isE variable, which just tells the program next time around that the last letter was an E. This means, next time through the while loop, if it is an end of sentence, or whitespace, and the last letter was E (so isE = 1), then we have added one too many syllables.
Hopefully that helps.
Since the value is actually lower then what it should be, i thought perhaps the statements where i minus from the count are important too. I use this if statement to decide when to minus from the count:
if(isE == 1)
{
countSylb --;
}
This statement happens when the character is whitespace, or an end of sentence character. I can't think of anything else relevant, but i still feel like im not including enough. Oh well, let me know if something is unclear.