I've got various std::vector instances with numeric data in them, primarily int16_t, int32_t, etc. I'd like to dump this data to a file in as fast a manner as possible. If I use an ostream_iterator, will it write the entire block of memory in a single operation, or will it iterate over the elements of the vector, issuing a write operation for each one?
I guess that's implementation dependent. If you don't get the performance you want, you can always memmap the result file and memcpy the std::vector data to the memmapped file.
if you construct the ostream_iterator with an ofstream, that will make sure the output is buffered:
ofstream ofs("file.txt");
ostream_iterator<int> osi(ofs, ", ");
copy(v.begin(), v.end(), osi);
the ofstream object is buffered, so anything written to the stream will get buffered before written to disk.
The quickest (but most horrible) way to dump a vector will be to write it in one operation with ostream::write:
os.write( (char *) &v[0], v.size() * sizeof( value_type) );
You can make this a bit nicer with a template function:
template <typename T>
std::ostream & DumpVec( std::ostream & os, const std::vector <T> & v ) {
return os.write( &v[0], v.size() * sizeof( T ) );
}
which allows you to say things like:
vector <unsigned int> v;
ofstream f( "file.dat" );
...
DumpVec( f, v );
Reading it back in will be a bit problematic, unless you prefix the write with an the size of the vector somehow (or the vectors are fixed-sized), and even then you will have problems on different endian and/or 32 v 64 bit architectures, as several people have pointed out.
A stream iterator and a vector will definitely not use a block copy in any implementation I'm familiar with. If the vector item type was a class rather than POD, for example, a direct copy would be a bad thing. I suspect the ostream will format the output as well, rather than writing the values directly (i.e., ascii instead of binary output).
You might have better luck with boost::copy
, as it's specifically optimized to do block writes when possible, but the most practical solution is to operate on the vector memory directly using &v[0]
.
It will iterate over the elements. Iterators don't let you mess with more than one item at a time. Also, IIRC, it will convert your integers to their ASCII representations.
If you want to write everything in the vector, in binary, to the file in one step via an ostream
, you want something like:
template<class T>
void WriteArray(std::ostream& os, const std::vector<T>& v)
{
os.write(static_cast<const char*>(&v[0]), v.size() * sizeof(T));
}
Most ofstream
implementations I know of do buffer data, so you probably will not end up doing an excessive number of writes. The buffer in the ofstream()
has to fill up before an actual write is done, and most OS's buffer file data underneath this, too. The interplay of these is not at all transparent from the C++ application level; selection of buffer sizes, etc. is left up to the implementation.
C++ does provide a way to supply your own buffer to an ostream
's streambuf. You can try calling pubsetbuf
like this:
char *mybuffer = new char[bufsize];
os.rdbuf()->pubsetbuf(mybuffer, bufsize);
The downside is that this doesn't necessarily do anything. Some implementations just ignore it.
The other option you have if you want to buffer things and still use ostream_iterator
is to use an ostringstream
, e.g.:
ostringstream buffered_chars;
copy(data.begin(), data.end(), ostream_iterator<char>(buffered_chars, " ");
string buffer(buffered_chars.str());
Then once all your data is buffered, you can write the entire buffer using one big ostream::write()
, POSIX I/O, etc.
This can still be slow, though, since you're doing formatted output, and you have to have two copies of your data in memory at once: the raw data and the formatted, buffered data. If your application pushes the limits of memory already, this isn't the greatest way to go, and you're probably better off using the built-in buffering that ofstream
gives you.
Finally, if you really want performance, the fastest way to do this is to dump the raw memory to disk using ostream::write()
as Neil suggests, or to use your OS's I/O functions. The disadvantage here is that your data isn't formatted, your file probably isn't human-readable, and it isn't easily readable on architectures with a different endianness than the one you wrote from. But it will get your data to disk fast and without adding memory requirements to your application.
You haven't written how you want to use the iterators (I'll presume std::copy
) and whether you want to write the data binary or as strings.
I would expect a decent implementation of std::copy
to fork into std::memcpy
for PODs and with dumb pointers as iterators (Dinkumware, for example, does so). However, with ostream iterators, I don't think any implementation of std::copy
will do this, as it doesn't have direct access to the ostream's buffer to write into.
The streams themselves, though, buffer, too.
In the end, I would write the simplest possible code first, and measure this. If it's fast enough, move on to the next problem. If this is code of the sort that cannot be fast enough, you'll have to resort to OS-specific tricks anyway.