tags:

views:

103

answers:

1

I have an application that wants to read word by word, delimited by whitespace, from a file. I am using code along these lines:

std::istream in;
string word;
while (in.good()) {
    in>>word;
    // Processing, etc. 
    ...
}

My issue is that the processing on the words themselves is actually rather light. The major time consumer is a set of mySQL queries I run.

What I was thinking is writing a buffered class that reads something like a kilobyte from the file, initializes a stringstream as a buffer, and performs extraction from that transparently to avoid a great many IO operations.

Thoughts and advice?

+2  A: 

An istream works with a buffer class, so it'll normally read in fairly large chunks (though the exact size isn't guaranteed). As such, you're probably already getting the effect you're looking for. If you handle the buffering on your own, it's somewhat non-trivial -- when you reach the end of a buffer, chances are that you'll be in the middle of a word, so you'll have to copy the current word to the beginning of your buffer and read more to fill the rest of the buffer before you can process that word.

Chances are you should just use a corrected loop like:

while (in>>word) {
    // process word
}

...but you might improve speed a bit by reading the file directly into a stringstream, and processing the words from there:

std::istream in;
std::istringstream buffer;

buffer << in.rdbuf();
while (buffer >> word) {
    // process word
}

This can be detrimental with a really large input file though.

Jerry Coffin