views:

45

answers:

2

Hello guys, I'm writing a program to parse some data saved as text files. What I am trying to do is find the location of every needle in a haystack. I already can read the file in and determine the number of occurrences, but I am looking to find the index also.

Thanks, -Tom

A: 
string str,sub; // str is string to search, sub is the substring to search for

vector<size_t> positions; // holds all the positions that sub occurs within str

size_t pos = str.find(sub, 0);
while(pos != string::npos)
{
    positions.push_back(pos);
    pos = str.find(sub,pos+1);
}

Edit I misread your post, you said substring, and I assumed you meant you were searching a string. This will still work if you read the file into a string.

PigBen
@PigBen = what if the file is 100GB long? Does that still work?
Steve Townsend
The file is not very long. This should work perfect :) thanks!
Thomas Havlik
@Steve -- If he's able to read the 100GB file into a string like I said, then yes, it will work.
PigBen
A: 

I know an answer has been accepted, but this will also work, and will save you having to load in the file to a string..

#include <iostream>
#include <fstream>
#include <vector>
#include <algorithm>

using namespace std;

int main(void)
{
  const char foo[] = "foo";
  const size_t s_len = sizeof(foo) - 1; // ignore \0
  char block[s_len] = {0};

  ifstream f_in(<some file>);

  vector<size_t> f_pos;

  while(f_in.good())
  {
    fill(block, block + s_len, 0); // pedantic I guess..
    size_t cpos = f_in.tellg();
    // Get block by block..
    f_in.read(block, s_len);
    if (equal(block, block + s_len, foo))
    {
      f_pos.push_back(cpos);
    }
    else
    {
      f_in.seekg(cpos + 1); // rewind
    }
  }
}
Nim