tags:

views:

225

answers:

7

I've got various std::vector instances with numeric data in them, primarily int16_t, int32_t, etc. I'd like to dump this data to a file in as fast a manner as possible. If I use an ostream_iterator, will it write the entire block of memory in a single operation, or will it iterate over the elements of the vector, issuing a write operation for each one?

A: 

I guess that's implementation dependent. If you don't get the performance you want, you can always memmap the result file and memcpy the std::vector data to the memmapped file.

pau.estalella
No reason to think that would be any faster.
anon
A: 

if you construct the ostream_iterator with an ofstream, that will make sure the output is buffered:

ofstream ofs("file.txt");
ostream_iterator<int> osi(ofs, ", ");
copy(v.begin(), v.end(), osi);

the ofstream object is buffered, so anything written to the stream will get buffered before written to disk.

jspcal
+2  A: 

The quickest (but most horrible) way to dump a vector will be to write it in one operation with ostream::write:

   os.write( (char *) &v[0], v.size() * sizeof( value_type) );

You can make this a bit nicer with a template function:

template <typename T> 
std::ostream & DumpVec( std::ostream & os, const std::vector <T> & v ) {
    return os.write( &v[0], v.size() * sizeof( T ) );
}

which allows you to say things like:

vector <unsigned int> v;
ofstream f( "file.dat" );
...
DumpVec( f, v );

Reading it back in will be a bit problematic, unless you prefix the write with an the size of the vector somehow (or the vectors are fixed-sized), and even then you will have problems on different endian and/or 32 v 64 bit architectures, as several people have pointed out.

anon
This is the fastest way to get the job done; but note that if you read the file in from another computer with a different endianness, you'll basically get garbage values. If you're concerned about such things, you should probably use a fully-functional, robust serialization library.
Charles Salvia
And if you want to walk on the Dark Side, replace `DumpVec` with `operator <<` so you can do `f << v`. Also, the only way reading it back in that will not be problematic is if the vector is the entirety of the file, in which case EOF indirectly tells you the size. It would involve some creative use of `std::vector`'s `reserve()`, `capacity()`, and `resize()`, using a double-capacity-every-time-you-run-out algorithm working together with the `ostream::read()` calls.
Mike D.
+2  A: 

A stream iterator and a vector will definitely not use a block copy in any implementation I'm familiar with. If the vector item type was a class rather than POD, for example, a direct copy would be a bad thing. I suspect the ostream will format the output as well, rather than writing the values directly (i.e., ascii instead of binary output).

You might have better luck with boost::copy, as it's specifically optimized to do block writes when possible, but the most practical solution is to operate on the vector memory directly using &v[0].

Tim Sylvester
Ah ok that's what I was worried about. I guess since vectors are contiguous I can just dump the memory to the file myself and call it a day.
gct
A: 

It will iterate over the elements. Iterators don't let you mess with more than one item at a time. Also, IIRC, it will convert your integers to their ASCII representations.

If you want to write everything in the vector, in binary, to the file in one step via an ostream, you want something like:

template<class T>
void WriteArray(std::ostream& os, const std::vector<T>& v)
{
    os.write(static_cast<const char*>(&v[0]), v.size() * sizeof(T));
}
Mike D.
You need to take the size of the contained values into account.
anon
Fixed. Thanks! (blah blah 15 char minimum blah blah)
Mike D.
+1  A: 

Most ofstream implementations I know of do buffer data, so you probably will not end up doing an excessive number of writes. The buffer in the ofstream() has to fill up before an actual write is done, and most OS's buffer file data underneath this, too. The interplay of these is not at all transparent from the C++ application level; selection of buffer sizes, etc. is left up to the implementation.

C++ does provide a way to supply your own buffer to an ostream's streambuf. You can try calling pubsetbuf like this:

char *mybuffer = new char[bufsize];
os.rdbuf()->pubsetbuf(mybuffer, bufsize);

The downside is that this doesn't necessarily do anything. Some implementations just ignore it.

The other option you have if you want to buffer things and still use ostream_iterator is to use an ostringstream, e.g.:

ostringstream buffered_chars;
copy(data.begin(), data.end(), ostream_iterator<char>(buffered_chars, " ");
string buffer(buffered_chars.str());

Then once all your data is buffered, you can write the entire buffer using one big ostream::write(), POSIX I/O, etc.

This can still be slow, though, since you're doing formatted output, and you have to have two copies of your data in memory at once: the raw data and the formatted, buffered data. If your application pushes the limits of memory already, this isn't the greatest way to go, and you're probably better off using the built-in buffering that ofstream gives you.

Finally, if you really want performance, the fastest way to do this is to dump the raw memory to disk using ostream::write() as Neil suggests, or to use your OS's I/O functions. The disadvantage here is that your data isn't formatted, your file probably isn't human-readable, and it isn't easily readable on architectures with a different endianness than the one you wrote from. But it will get your data to disk fast and without adding memory requirements to your application.

tgamblin
A: 

You haven't written how you want to use the iterators (I'll presume std::copy) and whether you want to write the data binary or as strings.

I would expect a decent implementation of std::copy to fork into std::memcpy for PODs and with dumb pointers as iterators (Dinkumware, for example, does so). However, with ostream iterators, I don't think any implementation of std::copy will do this, as it doesn't have direct access to the ostream's buffer to write into.

The streams themselves, though, buffer, too.

In the end, I would write the simplest possible code first, and measure this. If it's fast enough, move on to the next problem. If this is code of the sort that cannot be fast enough, you'll have to resort to OS-specific tricks anyway.

sbi