ansaurus

Question

Optimal datafile format loading on a game console.

Answer 1

A:

Consider storing your data as BLOBs in a SQLite DB. SQLite is extremely portable and lighweight, ANSI C, has both C++ and Python interfaces. This will take care of large files, no fragmentation, variable-length records with fast access, and so on. The rest is just serialization of structs to these BLOBs.

Eli Bendersky 2009-11-13 07:40:50

Thanks, but consider what happens if you try to work with a database stored on a CD spinning in a 4x slow drive with mere kilobytes of buffer. No MMU, no MMAP, no RAM to spare, and no real OS. Actually 16MB RAM on my specific plattform, but it i want to use it to the brim. ;)Also i want to nearly completely avoid de-serialization overhead.I was vague with the specifics of the device because i actually want to use parts of the tech on a number of different game consoles starting from mid 90ies to the current handheld ones.No offense please, i like SQLite a lot, but it would be a suicide.

3yE 2009-11-13 08:31:49

Answer 2

+3 A:

On platforms like the Nintendo GameCube and DS, 3D models are usually stored in a very simple custom format:

A brief header, containing a magic number identifying the file, the number of vertices, normals, etc., and optionally a checksum of the data following the header (Adler-32, CRC-16, etc).
A possibly compressed list of 32-bit floating-point 3-tuples for each vector and normal.
A possibly compressed list of edges or faces.
All of the data is in the native endian format of the target platform.
The compression format is often trivial (Huffman), simple (Arithmetic), or standard (gzip). All of these require very little memory or computational power.

You could take formats like that as a cue: it's quite a compact representation.

My suggestion is to use a format most similar to your in-memory data structures, to minimize post-processing and copying. If that means you create the format yourself, so be it. You have extreme needs, so extreme measures are needed.

greyfade 2009-11-13 09:13:56

Answer 3

+5 A:

I note that nowhere in your description do you ask for "ease of programming". :-)

Thus, here's what comes to mind for me as a way of creating this:

The data should be in the same on-disk format as it would be in the target's memory, such that it can simply pull blobs from disk into memory with no reformatting it. Depending on how much freedom you want in putting things into memory, the "blobs" could be the whole file, or could be smaller bits within it; I don't understand your data well enough to suggest how to subdivide it but presumably you can. Because we can't rely on the same endianness and alignment on the host, you'll need to be somewhat clever about translating things when writing the files on the host-side, but at least this way you only need the cleverness on one side of the transfer rather than on both.
In order to provide a bit of assurance that the target-side and host-side code matches, you should write this in a form where you provide a single data description and have some generation code that will generate both the target-side C code and the host-side Python code from it. You could even have your generator generate a small random "version" number in the process, and have the host-side code write this into the file header and the target-side check it, and give you an error if they don't match. (The point of using a random value is that the only information bit you care about is whether they match, and you don't want to have to increment it manually.)

Brooks Moses 2009-11-13 09:14:45

Thanks a lot, this is going in the right direction. :)And as succintly noted, much more ease, flexibility and reliability of use than "ease of programming" :) and performance way above that.This is going to be a huge long-term project, and i don't want to run into malloc fiasco at some point, subdivision would not be a perfect guarantee, so the idea is to have a separate large and generally loaded object heap that i can compact, perhaps even shrink if normal heap full. Then i guess the need for subdivision disappears.If you have any other suggestions, they would be highly welcome.

3yE 2009-11-13 10:59:50

Answer 4

+3 A:

This is a common game development pattern.

The usual approach is to cook the data in an offline pre-process step. The resulting blobs can be streamed in with minimal overhead. The blobs are platform dependent and should contain the proper alignment & endian-ness of the target platform.

At runtime, you can simply cast a pointer to the in-memory blob file. You can deal with nested structures as well. If you keep a table of contents with offsets to all the pointer values within the blob, you can then fix-up the pointers to point to the proper address. This is similar to how dll loading works.

I've been working on a ruby library, bbq, that I use to cook data for my iphone game.

Here's the memory layout I use for the blob header:

// Memory layout
//
// p begining of file in memory.
// p + 0 : num_pointers
// p + 4 : offset 0
// p + 8 : offset 1
// ...
// p + ((num_pointers - 1) * 4) : offset n-1
// p + (num_pointers * 4) : num_pointers   // again so we can figure out 
//                                            what memory to free.
// p + ((num_pointers + 1) * 4) : start of cooked data
//

Here's how I load binary blob file and fix up pointers:

void* bbq_load(const char* filename)
{
    unsigned char* p;
    int size = LoadFileToMemory(filename, &p);
    if(size <= 0)
     return 0;

    // get the start of the pointer table
    unsigned int* ptr_table = (unsigned int*)p;
    unsigned int num_ptrs = *ptr_table;
    ptr_table++;

    // get the start of the actual data
    // the 2 is to skip past both num_pointer values
    unsigned char* base = p + ((num_ptrs + 2) * sizeof(unsigned int));

    // fix up the pointers
    while ((ptr_table + 1) < (unsigned int*)base)
    {
     unsigned int* ptr = (unsigned int*)(base + *ptr_table);
     *ptr = (unsigned int)((unsigned char*)ptr + *ptr);
     ptr_table++;
    }

    return base;
}

My bbq library isn't quite ready for prime time, but it could give you some ideas on how to write one yourself in python.

Good Luck!

hyperlogic 2009-11-14 05:25:59

ansaurus

tags:

views:

answers:

Optimal datafile format loading on a game console.

related questions