ansaurus

Question

Creating a packed Binary Representation of a Set of Files?

Answer 1

+2 A:

That's fine. If you can load everything into virtual memory and let swapping handle it, then you can use any format, really. If you want random access to just one record (so e.g. you can load lazily, although uncompressed memmap is also lazy), then you probably want to keep the index in memory.

Most people use a library that gives them access to a .zip, .jar, .pak (quake format), or other similar (compressed or not) archive format, as if it were part of the filesystem (that is, records are accessed by string keys). I'd definitely go that way if you can find an already-made library, e.g. truezip for Java. Apache Commons has one, but I don't know how easy it is to integrate w/ .NET (it's a big C code base I believe). ZipFS looks like it's an actual .NET zip file mounter that only holds the headers in memory.

Or, with probably only a little less convenience, you could use DotNetZip directly

wrang-wrang 2009-09-12 22:06:45

Answer 2

+1 A:

Do not waste your time inventing your own storage format.

You can use SharpZipLib or another free compression library for .net. With it you can also pack multiple files into one archive and extract the files you want separately on demand.

codymanix 2009-09-12 23:25:36

I am doing this for learning purposes. Not everyone hates reinventing the wheel.

UberJumper 2009-09-12 23:32:24

That's fine, but even the pros often just use off-the-shelf compression. Besides which, you'll have plenty of learning to do on the game itself! :)

Kylotan 2009-09-14 13:04:48

Being able to load data efficiently can have a significant impact on the game. It IS part of the game itself. Loading data is from disk is about the slowest operation you can do, there are good reasons for optimizing it.

BigSandwich 2009-09-16 00:56:50

I think standard zip algorithms are beeing optimized for decades now..

codymanix 2009-09-16 13:57:19

Answer 3

A:

If you want to do this for learning purposes then the WAD format is a good place to start. However, I'd propose using a chunked file format.
So it would basically follow your proposed format ( i.e header, TOC etc ) but for each data entry you'd have a chunk ID which identifies what type of data it is.
This has lots of benefits, mainly that you can vary your data format against your code format by setting your code to skip chunks that it doesn't understand - this allows your tools development to proceed whilst keeping backwards compatibility on your data in your game.

I'd also recommend having an extra 32 bit 'flags' entry in your TOC which would allow you to use a bitfield to enable options like compression type, encryption etc

Hope that helps

zebrabox 2009-09-13 21:42:17

Answer 4

+1 A:

Your design looks good to me, though I assume that you meant 32 bits for size rather than 32 bytes!

I think that your design would be best for situations where you want to load up all your assets in one go, because it's kind of a sequential design. If you want to load up just a few assets at a time (maybe because each game level uses only a subset of the assets) then it would be somewhat less efficient, because you would have to read through each asset in turn to find the ones that you want.

In that case you might want to try a more indexed design, maybe something like this:

[HEADER]
[Miscellaneous header stuff]
[Offset to index from start of file]
[Number of entries in index]
[RECORD 1]
[Asset data]
[RECORD 2]
[Asset data]
.
.
[RECORD N]
[Asset data]
[INDEX]
[ID or filename of asset 1]
[Size of asset 1]
[Offset to asset 1 from start of file]
[Other asset 1 flags or whatever]
[ID or filename of asset 2]
[Size of asset 2]
[Offset to asset 2 from start of file]
[Other asset 2 flags or whatever]
.
.

This would allow for better random access of assets, because now you just have to search through your index (which you would load into memory) rather than through your whole file (which might not fit into memory). If you wanted to get fancy you could use a tree or hashtable for the index.

The reason for putting the index at the end of the file rather than the front is that it makes it easier to add another asset to your pack file without having to rebuild the whole thing. Otherwise, the extra entry in your index would throw out all your offsets.

EDIT: to respond to comments...

What I had in mind was that you would only access the assets via the index, so hopefully you would never run off the end of the assets when reading them. Perhaps an example of a typical use case would help.

Say you wanted to read in the texture called "TankTexture.png". Here is how I think that you would go about it:

Open the pack file.
Read in the fixed-size header.
Extract the index offset and number of entries from the header.
Seek to the start of the index.
Read the index into an array (fixed index entry size times number of entries).
Search through the index for the asset called "TankTexture.png".
Extract the asset offset and size from the index entry.
Seek to the start of the asset.
Read in the number of bytes given by the asset size.

Of course, for subsequent assets you would need only steps 6-9.

I hope that helps to explain what I was thinking. Let me know if you have any other questions.

Incredulous Monk 2009-09-15 07:54:15

How would could i make it so that the program would know when to stop reading asset data?

UberJumper 2009-09-15 16:22:20

Answer 5

A:

I say your format is a good choice. Ideally, you want to pull in all your assets in one read. For instance you'd want all your data for level 3 in the same package, that way you can load all your level data in one read without seeking. Its really ok to have a single asset in more than one package. You just need to handle the case that an asset is already loaded and skip over it.

How you split up your data should be dependent on the dependencies between your data (ie if a script needs a certain model, they should both be in the same package) and how granular you need to make your reads (ex. can you read all your level data in one go? Then you can put your enemies in the level package. But if your game streams in the world, maybe you need separate packages for enemies.)

Really, tracking you data dependencies is the hard part. At build time you want to know the dependencies of every piece of data you pull in. At run time you just want to read in your package and have the assets show up in memory. You also need to track dependencies at run time because you'll need to know whats safe to UNLOAD at any give time.

BigSandwich 2009-09-16 01:18:03

ansaurus

tags:

views:

answers:

Creating a packed Binary Representation of a Set of Files?

related questions