views:

863

answers:

5

For some graphics work I need to read in a large amount of data as quickly as possible and would ideally like to directly read and write the data structures to disk. Basically I have a load of 3d models in various file formats which take too long to load so I want to write them out in their "prepared" format as a cache that will load much faster on subsequent runs of the program.

Is it safe to do it like this? My worries are around directly reading into the data of the vector? I've removed error checking, hard coded 4 as the size of the int and so on so that i can give a short working example, I know it's bad code, my question really is if it is safe in c++ to read a whole array of structures directly into a vector like this? I believe it to be so, but c++ has so many traps and undefined behavour when you start going low level and dealing directly with raw memory like this.

I realise that number formats and sizes may change across platforms and compilers but this will only even be read and written by the same compiler program to cache data that may be needed on a later run of the same program.

#include <fstream>
#include <vector>

using namespace std;

struct Vertex
{
    float x, y, z;
};

typedef vector<Vertex> VertexList;

int main()
{
    // Create a list for testing
    VertexList list;
    Vertex v1 = {1.0f, 2.0f,   3.0f}; list.push_back(v1);
    Vertex v2 = {2.0f, 100.0f, 3.0f}; list.push_back(v2);
    Vertex v3 = {3.0f, 200.0f, 3.0f}; list.push_back(v3);
    Vertex v4 = {4.0f, 300.0f, 3.0f}; list.push_back(v4);

    // Write out a list to a disk file
    ofstream os ("data.dat", ios::binary);

    int size1 = list.size();
    os.write((const char*)&size1, 4);
    os.write((const char*)&list[0], size1 * sizeof(Vertex));
    os.close();


    // Read it back in
    VertexList list2;

    ifstream is("data.dat", ios::binary);
    int size2;
    is.read((char*)&size2, 4);
    list2.resize(size2);

     // Is it safe to read a whole array of structures directly into the vector?
    is.read((char*)&list2[0], size2 * sizeof(Vertex));

}
+4  A: 

std::vector is guaranteed to be continuous in memory, so, yes.

Laurynas Biveinis
+6  A: 

As Laurynas says, std::vector is guaranteed to be contiguous, so that should work, but it is potentially non-portable.

On most systems, sizeof(Vertex) will be 12, but it's not uncommon for the struct to be padded, so that sizeof(Vertex) == 16. If you were to write the data on one system and then read that file in on another, there's no guarantee that it will work correctly.

Peter Alexander
And reading/writing the padded bytes will only slow you down. I'd write a very simple operator<<(ostream ..) and read one float at a time (conceptually).
Jan
Not to mention making your files 33% bigger in this case.
Peter Alexander
@Jan Reading (presumably) a text file and converting the numbers to floats is going to be more expensive than reading a binary file directly, even if it is larger. The text file would also likely be larger, unless all of the values were less than four digits long.
KeithB
I'm merging together multiple 3d models to make one model which not only involves reading multiple complex format disk files, but also quite a lot of math to translate, rotate and scale coordinates but this only needs doing once so caching is a big gain... Although I've now modified my code to read directly into a direct3d vertex buffer so the answer is no longer relevant althogh still interesting to me :)
John Burton
@KeithB: What I meant to propose was implementing the stream operator like JB originally did:`ostream...` In a file full of vertices, you would only be responsible of reading them in the right order, just like before, but it aids in reading, and takes care of padding.
Jan
+1  A: 

If this is used for caching by the same code, I don't see any problem with this. I've used this same technique on multiple systems without a problem (all Unix based). As an extra precaution, you might want to write a struct with known values at the beginning of the file, and check that it reads ok. You might also want to record the size of the struct in the file. This will save a lot of debugging time in the future if the padding ever changes.

KeithB
Yes I would write a header on the file to ensure that it's only reading back what I expect.
John Burton
+3  A: 

You might be interested in the Boost.Serialization library. It knows how to save/load STL containers to/from disk, among other things. It might be overkill for your simple example, but it might become more useful if you do other types of serialization in your program.

Here's some sample code that does what you're looking for:

#include <algorithm>
#include <fstream>
#include <vector>
#include <boost/archive/binary_oarchive.hpp>
#include <boost/archive/binary_iarchive.hpp>
#include <boost/serialization/vector.hpp>

using namespace std;

struct Vertex
{
    float x, y, z;
};

bool operator==(const Vertex& lhs, const Vertex& rhs)
{
    return lhs.x==rhs.x && lhs.y==rhs.y && lhs.z==rhs.z;
}

namespace boost { namespace serialization {
    template<class Archive>
    void serialize(Archive & ar, Vertex& v, const unsigned int version)
    {
        ar & v.x; ar & v.y; ar & v.z;
    }
} }

typedef vector<Vertex> VertexList;

int main()
{
    // Create a list for testing
    const Vertex v[] = {
        {1.0f, 2.0f,   3.0f},
        {2.0f, 100.0f, 3.0f},
        {3.0f, 200.0f, 3.0f},
        {4.0f, 300.0f, 3.0f}
    };
    VertexList list(v, v + (sizeof(v) / sizeof(v[0])));

    // Write out a list to a disk file
    {
        ofstream os("data.dat", ios::binary);
        boost::archive::binary_oarchive oar(os);
        oar << list;
    }

    // Read it back in
    VertexList list2;

    {
        ifstream is("data.dat", ios::binary);
        boost::archive::binary_iarchive iar(is);
        iar >> list2;
    }

    // Check if vertex lists are equal
    assert(list == list2);

    return 0;
}

Note that I had to implement a serialize function for your Vertex in the boost::serialization namespace. This lets the serialization library know how to serialize Vertex members.

I've browsed through the boost::binary_oarchive source code and it seems that it reads/writes the raw vector array data directly from/to the stream buffer. So it should be pretty fast.

Emile Cormier
Thanks. Maybe overkill for what i need but I'll certain look into it
John Burton
+1  A: 

Another alternative to explicitly reading and writing your vector<> from and to a file is to replace the underlying allocator with one that allocates memory from a memory mapped file. This would allow you to avoid an intermediate read/write related copy. However, this approach does have some overhead. Unless your file is very large it may not make sense for your particular case. Profile as usual to determine if this approach is a good fit.

There are also some caveats to this approach that are handled very well by the Boost.Interprocess library. Of particular interest to you may be its allocators and containers.

Void