I am having to read a text file; however, one certain text file is giving me issues. Not only is the text file huge (an entire ebook), but there are also several accented letters. I am reading in the words one letter at a time stopping on appropriate punctuation or spaces. I do this by testing appropriate ASCII for letters and punctuation such as an apostrophe. is there a way I can read in the accented letters as well but keep them separate from other letters? Do I need to add any random libraries?
Here is my code to get the word:
string GetNextWord(){
string w=""; // used to store each word temporarly
char c; // used for each individual character
int i=0; // a counter
input.get(c); // gets first character
c=tolower(c); // forces c to lowercase
while(c>=97 && c<=122 || c==39){ // loops while the character is a lowercase letter or '
w=w+c; // adds character to word string
input.get(c); // gets next character
c=tolower(c); // forces c to lowercase
++i; // increments counter
}
if(i>0) // if there is a word
return w; // return the word
else // otherwise string is NULL
return "NOT A WORD!"; // returns a flag to main
}
Works on every file so far except, this one.
You can see the input here-> http://www.gutenberg.org/cache/epub/244/pg244.txt