views:

38

answers:

1

I need to come up with a file format for new application I am writing. This file will need to hold a bunch other text files which are mostly text but can be other formats as well. Naturally, a compressed tar file seems to fit the bill. The problem is that I want to be able to retrieve some data from the file very quickly and getting just a particular file from a tar.gz file seems to take longer than it should. I am assumeing that this is because it has to decompress the entire file even though I just want one. When I have just a regular uncompressed tar file I can get that data real quick. Lets say the file I need quickly is called data.dat

For example the command...

tar -x data.dat -zf myfile.tar.gz

... is what takes a lot longer than I'd like.

MP3 files have id3 data and jpeg files have exif data that can be read in quickly without opening the entire file. I would like my data.dat file to be available in a similar way.

I was thinking that I could leave it uncompressed and seperate from the rest of the files in myfile.tar.gz I could then create a tar file of data.dat and myfile.tar.gz and then hopefully that data would be able to be retrieved faster because it is at the head of outer tar file and is uncompressed.

Does this sound right?... putting a compressed tar inside of a tar file?

Basically, my need is to have an archive type of file with quick access to one particular file. Tar does this just fine, but I'd also like to have that data compressed and as soon as I do that, I no longer have quick access. Are there other archive formats that will give me that quick access I need?

As a side note, this application will be written in Python. If the solution calls for a re-invention of the wheel with my own binary format I am familiar with C and would have no problem writing the Python module in C. Idealy I'd just use tar, dd, cat, gzip, etc though.

Thanks, ~Eric

+2  A: 

ZIP seems to be appropriate for your situation. Files are compressed individually, which means you access them without streaming through everything before.

In Python, you can use zipfile.

Matthew Flaschen
Thanks for the idea. I'll give it a go tomorrow and see how that works out performance-wise (quick access) and see how zip stacks up against a gzipped tar.
eric.frederich