tags:

views:

61

answers:

4

Let say i have a text file with

today is today but
tomorrow is today tomorrow

then using maps how can i keep track of the words that are repeated? and on which line it repeats? so far i have each string in the file read in as a temp and it is stored in the following way:

    map<string,int> storage;

    int count = 1 // for the first line of the file

    if(infile.is_open()){
     while( !infile.eof() ){ 
      getline(in, line);
      istringstream my_string(line);
      while(my_string.good()){
         string temp;
         my_string >> temp;

    storage[temp] = count
    }
    count++;// so that every string read in the next line will be recorded as that line.
}
}
   map<string,int>::iterator m;
   for(int m = storage.begin(); m!= storage.end(); m++){
      out<<m->first<<": "<<"line "<<m->second<<endl;
}

right now the output is just

but: line 1
is: line 2
today: line 2
tomorrow: line 2

But instead.. it should print out(no repeating strings):

today : line 1 occurred 2 times, line 2 occurred 1 time.
is: line 1 occurred 1 time, line 2 occurred 1 time.
but: line 1 occurred 1 time.
tomorrow: line 2 occurred 2 times.

Note: the order of the string does not matter.

Any help would be appreciated. Thanks.

+1  A: 

you're trying to get 2 items of information out of the collection, when you only store 1 item of information in there.

The easiest way to extend your current implementation is to store a struct instead of an int.

So instead of:

storage[temp] = count

you'd do:

storage[temp].linenumber = count;
storage[temp].wordcount++;

where the map is defined:

struct worddata { int linenumber; int wordcount; };
std::map<string, worddata> storage;

print the results using:

out << m->first << ": " << "line " << m->second.linenumber << " count: " << m->second.wordcount << endl;

edit: use a typedef for the definitions, eg:

typedef MYMAP std::map<std::string, struct worddata>;
MYMAP storage;

then MYMAP::iterator iter;

gbjbaanb
I think what the user wants to do is report how many times a word appears in each line within the file. If they plug in your data type into their algorithm they'll get the last line a word appeared in along with the total count it appeared within the file.
Noah Roberts
Hmm i did that but the loop to go through the map doesnt seem to work anymorefor(m = storage.begin(); m != storage.end(); m++){error C2679: binary '=' : no operator found which takes a right-hand operand of type 'std::
eNetik
It was late and I'd just got back from the pub - working out what the user wanted the algorithm to be too was just too much :). @eNetik - you need to modify the iterator definition to match the original map definition. Hint: use a typedef. (I'll update my answer)
gbjbaanb
+2  A: 

map stores a (key, value) pair with a unique key. Meaning that if you assign to the same key more than once, only the last value that you assigned will be stored.

Sounds like what you want to do is instead of storing the line as the value, you want to store another map of lines->occurances.

So you could make your map like this:

typedef int LineNumber;
typedef int WordHits;
typedef map< LineNumber, WordHits> LineHitsMap;
typedef map< string, LineHitsMap > WordHitsMap;
WordHitsMap storage;

Then to insert:

WordHitsMap::iterator wordIt = storage.find(temp);
if(wordIt != storage.end())
{
    LineHitsMap::iterator lineIt = (*wordIt).second.find(count);
    if(lineIt != (*wordIt).second.end())
    {
        (*lineIt).second++;
    }
    else
    {
        (*wordIt).second[count] = 1;
    }
}
else
{
    LineHitsMap lineHitsMap;
    lineHitsMap[count] = 1;
    storage[temp] = lineHitsMap;
}
bshields
how do i store the linenumber and wordHits though?can you show a simple example how to actually store it into the map?
eNetik
@eNetik I updated with insert code.
bshields
A: 

Your storage data type is insufficient to store all the information you want to report. You could get there by using a vector for count storage but you'd have to do a lot of book-keeping to make sure you actually insert a 0 when a word is not encountered and create the vector with the right size when a new word is encountered. Not a trivial task.

You could switch your count part to a map of numbers, first being line and second being count... That would reduce the complexity of your code but wouldn't exactly be the most efficient method.

At any rate, you can't do what you need to do with just a std::map

Edit: just thought of an alternative version that would be easier to generate but harder to report with: std::vector< std::map<std::string, unsigned int> >. For each new line in a file you'd generate a new map<string,int> and push it onto the vector. You could create a helper type set<string> to contain all the words that appear in a file to use in your reporting.

That's probably how I'd do it anyway except I'd encapsulate all that crap in a class so that I'd just do something like:

my_counter.word_appearance(word,line_no);
Noah Roberts
he could use a multimap to store many instances of each word, use count() to get the number of words, and store the line number for each occurrence
gbjbaanb
A: 

Apart from anything else, your loops are all wrong. You should never loop on the eof or good flags, but on the success of the read operation. You want something like:

while( getline(in, line) ){ 
      istringstream my_string(line);
      string temp;
      while(my_string >> temp ){
           // do something with temp
      }
}
anon