tags:

views:

1401

answers:

9

How to slurp a file into a std::string, i.e., read the whole file at once? Text or binary mode should be specified by the caller. The solution should be standard-compliant, portable and efficient. It should not needlessly copy the string's data, and it should avoid reallocations of memory while reading the string.

One way to do this would be to stat the filesize, resize the std::string and fread() into the std::string's const_cast<char*>()'ed data(). This requires the std::string's data to be contiguous which is not required by the standard but appears to be the case for all known implementations. What is worse, if the file is read in text mode, the std::string's size may not equal the file's size.

Fully correct, standard-compliant and portable solutions could be constructed using std::ifstream's rdbuf() into a std::ostringstream and from there into a std::string. However, this could copy the string data and/or needlessly reallocate memory. Are all relevant standard library implementations smart enough to avoid all unnecessary overhead? Is there another way to do it? Did I miss some hidden boost function that already provides the desired functionality?

Please show your suggestion how to implement void slurp(std::string& data, bool is_binary), taking into account the discussion above.

A: 

Note that you still have some things underspecified. For example, what's the character encoding of the file? Will you attempt to auto-detect (which works only in a few specific cases)? Will you honor e.g. XML headers telling you the encoding of the file?

Also there's no such thing as "text mode" or "binary mode" -- are you thinking FTP?

Jason Cohen
Ferruccio
Usually such things are treated by routines that break strings into lines rather than routines that read data from files. That is, in every environment I've programmed in there's some kind of readAsLines() or breakIntoLines() that is intelligent about such things.
Jason Cohen
+3  A: 
#include <iostream>
#include <sstream>
#include <fstream>

int main()
{
  std::ifstream input("file.txt");
  std::stringstream sstr;

  while(input >> sstr);

  std::cout << sstr.str() << std::endl;
}

or something very close. I don't have a stdlib reference open to double-check myself.

edit: updated for syntax errors. whoops.

edit: yes, I understand I didn't write the slurp function as asked.

Ben Collins
You have a syntax error there, I think you missed a << before std::endl.
Leon Timmermans
The end of file bit isn't set and sstr.cout() doesn't exist. It must be sstr.str()
nutario
+1 for using string streams. Comments above are both valid syntax corrections.
Corin
@nutario: there is a bool() operator in ifstream that returns false when the eof bit is set.
Ben Collins
Shouldn't the while loop be "while(input >> sstr.rdbuf());" ?
John Dibling
+2  A: 

Never write into the std::string's const char * buffer. Never ever! Doing so is a massive mistake.

Reserve() space for the whole string in your std::string, read chunks from your file of reasonable size into a buffer and append() it. How large the chunks have to be depends on your input file size. I'm pretty sure all other portable and STL-compliant mechanisms will do the same (yet may look prettier)

Thorsten79
+7  A: 

The shortest variant:

string str((istreambuf_iterator<char>(ifs)), istreambuf_iterator<char>());

Notice the extra parenthesis around the first argument. They're necessary for disambiguation: else, C++ will handle this as a function declaration.

Also, it requires the header <iterator>.

/EDIT: Don't take this variant too seriously. It's mainly show and quite slow.

Konrad Rudolph
Could you exapnd on this answer. How efficent is it, does it read a file a char at a time, anyway to preallocate the stirng memory?
Martin Beckett
+2  A: 

Something like this shouldn't be too bad:

void slurp(std::string& data, const std::string& filename, bool is_binary)
{
    std::ios_base::openmode openmode = ios::ate | ios::in;
    if (is_binary)
        openmode |= ios::binary;
    ifstream file(filename.c_str(), openmode);
    data.clear();
    data.reserve(file.tellg());
    file.seekg(0, ios::beg);
    data.append(istreambuf_iterator<char>(file.rdbuf()), 
                istreambuf_iterator<char>());
}

The advantage here is that we do the reserve first so we won't have to grow the string as we read things in. The disadvantage is that we do it char by char. A smarter version could grab the whole read buf and then call underflow.

Matt Price
You should checkout the version of this code that uses std::vector for the initial read rather than a string. Much much faster.
ceretullis
+1  A: 

You can use the 'std::getline' function, and specify 'eof' as the delimiter. The resulting code is a little bit obscure though:

std::string data;
std::ifstream in( "test.txt" );
std::getline( in, data, std::string::traits_type::to_char_type( 
                  std::string::traits_type::eof() ) );
Martin Cote
+12  A: 

And the fastest (that I know of, discounting memory-mapped files):

string str(static_cast<stringstream const&>(stringstream() << in.rdbuf()).str());

This requires the additional header <sstream> for the string stream. (The static_cast is necessary since operator << returns a plain old ostream& but we know that in reality it’s a stringstream& so the cast is safe.)

Split into multiple lines, moving the temporary into a variable, we get a more readable code:

string slurp(ifstream& in) {
    stringstream sstr;
    sstr << in.rdbuf();
    return sstr.str();
}

Or, once again in a single line:

string slurp(ifstream& in) {
    return static_cast<stringstream const&>(stringstream() << in.rdbuf()).str();
}
Konrad Rudolph
Might be worth noting which headers need to be included for this to work.
Xiong Chiamiov
+2  A: 

See this answer on a similar question.

For your convenience, I'm reposting CTT's solution:

string readFile2(const string &fileName)
{
    ifstream ifs(fileName.c_str(), ios::in | ios::binary | ios::ate);

    ifstream::pos_type fileSize = ifs.tellg();
    ifs.seekg(0, ios::beg);

    vector<char> bytes(fileSize);
    ifs.read(&bytes[0], fileSize);

    return string(&bytes[0], fileSize);
}

This solution resulted in about 20% faster execution times than the other answers presented here, when taking the average of 100 runs against the text of Moby Dick (1.3M). Not bad for a portable C++ solution, I would like to see the results of mmap'ing the file ;)

ceretullis
A: 
Kim Gybels