views:

65

answers:

3

Hello, once again I ask for help. I haven't coded anything for sometime!

Now I have a text file filled with random gibberish. I already have a basic idea on how I will count the number of occurrences per word.

What really stumps me is how I will determine what line the word is in. Gut instinct tells me to look for the newline character at the end of each line. However I have to do this while going through the text file the first time right? Since if I do it afterwords it will do no good.

I already am getting the words via the following code:

vector<string> words;
string currentWord;

while(!inputFile.eof())
{
inputFile >> currentWord;
words.push_back(currentWord); 
}

This is for a text file with no set structure. Using the above code gives me a nice little(big) vector of words, but it doesn't give me the line they occur in.

Would I have to get the entire line, then process it into words to make this possible?

+3  A: 

Use a std::map<std::string, int> to count the word occurrences -- the int is the number of times it exists.

If you need like by line input, use std::getline(std::istream&, std::string&), like this:

std::vector<std::string> lines;
std::ifstream file(...) //Fill in accordingly.
std::string currentLine;
while(std::getline(file, currentLine))
    lines.push_back(currentLine);

You can split a line apart by putting it into an std::istringstream first and then using operator>>. (Alternately, you could cobble up some sort of splitter using std::find and other algorithmic primitaves)

EDIT: This is the same thing as in @dash-tom-bang's answer, but modified to be correct with respect to error handing:

vector<string> words;
int currentLine = 1; // or 0, however you wish to count...

string line;
while (getline(inputFile, line))
{
   istringstream inputString(line);
   string word;
   while (inputString >> word)
      words.push_back(pair(word, currentLine));
}
Billy ONeal
Thanks for the answer, mine can go away. ;) It's amazing how quickly this stuff evaporates from one's brain without use.
dash-tom-bang
Thanks again. I am very sorry to have changed the coding on the old example. I'll give this a try and post back later with results.
Trygle
@Trygle: This is not intended to be a drop in chunk of code. We aren't going to write your program for you. However, we will give you pointers in the right direction.
Billy ONeal
Oh I know. What fun is coding if everyone does it for you?Thanks for the help though. I should close this by now.
Trygle
A: 

You're going to have to abandon reading into strings, because operator >>(istream&, string&) discards white space and the contents of the white space (== '\n' or != '\n', that is the question...) is what will give you line numbers.

This is where OOP can save the day. You need to write a class to act as a "front end" for reading from the file. Its job will be to buffer data from the file, and return words one at a time to the caller.

Internally, the class needs to read data from the file a block (say, 4096 bytes) at a time. Then a string GetWord() (yes, returning by value here is good) method will:

  • First, read any white space characters, taking care to increment the object's lineNumber member every time it hits a \n.
  • Then read non-whitespace characters, putting them into the string object you'll be returning.
  • If it runs out of stuff to read, read the next block and continue.
  • If the you hit the end of file, the string you have is the whole word (which may be empty) and should be returned.
  • If the function returns an empty string, that tells the caller that the end of file has been reached. (Files usually end with whitespace characters, so reading whitespace characters cannot imply that there will be a word later on.)

Then you can call this method at the same place in your code as your cin >> line and the rest of the code doesn't need to know the details of your block buffering.

An alternative approach is to read things a line at a time, but all the read fucntions that would work for you require you to create a fixed-size buffer to read into beforehand, and if the line is longer than that buffer, you have to deal with it somehow. It could get more complicated than the class I described.

Mike DeSimone
Billy ONeal
Modularity and data hiding. Putting the ugliness of the buffering in a class so the rest of the code doesn't have to deal with it. There's plenty of good stuff in C++ that doesn't involve `virtual`.
Mike DeSimone
A: 

Short and sweet.

vector< map< string, size_t > > line_word_counts;

string line, word;
while ( getline( cin, line ) ) {
    line_word_counts.push_back();
    map< string, size_t > &word_counts = line_word_counts.back();

    istringstream line_is( line );
    while ( is >> word ) ++ word_counts[ word ];
}

cout << "'Hello' appears on line 5 " << line_word_counts[5-1]["Hello"]
     << " times\n";
Potatoswatter