views:

2470

answers:

8

What version control systems would be useful or have extra features to support projects that are mainly binary files like mp3, wav, or proprietary application-specific file types?

+8  A: 

Most VC systems just punt when it comes to binary files -- they can't be diffed for a human to compare, and they can't be merged. The best feature to have is binary-delta storage of changes, so a few small tweaks to a 100MB file doesn't become 300MB in your repository.

I found this post on storing binary files in Bazaar. In summary, it's difficult to store binary files without massive storage use because most formats can't be effectively diffed:

Short answer, sure... we store binary deltas, but I wouldn't call them optimal binary deltas.

[...]

If you are having problems with SVN, then I don't think binary diffs would help you much anyway, considering SVN has binary diffs.

[...]

But the truth is, (most?, many?) binary files don't binary diff that well anyway. Frequently they are compressed, which means a modification near the beginning tends to have a chain reaction over a large distance (possibly the whole rest of the file).

So if the files are small or don't change much, go ahead and use VCS. If they're large and change often, find or write a specialized tool for managing them.

John Millikin
my answer below @Owen discusses a special tool I wrote to handle versioning a binary. could be helpful.
Owen
doh - I thought putting @Owen will automatically link to my answer. Guess not.
Owen
+1 For the quote and the wisdom. I'd think just like that (Though I have VCS'd a 1G disk image from time to time and it was painful (although I could get back previous revisions without fear which was nice to know)
Adam Hawes
+1  A: 

Subversion could do it. The file size of the repository may get big rather quickly though.

RKitson
+1  A: 

For generic binaries, as @John Millikin said, it's basically a punt. Some systems, such as Perforce integrate (http://www.perforce.com/perforce/products/integrations.html) with several third-party products, such as image manipulation programs.

Kris Kumler
+2  A: 

Most VCS just store the current version, partly because binary diffs can be very large, so the cost of reconstituting a binary file from deltas can make it not worth storing the deltas. Things are different nowadays with super-fast CPUs, and subversion now stores binary files as deltas.

The rsync algorithm tends to work well with text files that change, but binary files (eg zipped) do not 'compress' nearly so well. I don't know how well the subversion algorithm works, but they say it works equally well on binary as on text.

gbjbaanb
+1  A: 

A few years ago my project was using SVN to version control an app we wrote in VBA for Microsoft Access. So, all of the code was inside a Access database which is a binary file. Not good to have all your code inside that. As pointed out, SVN doesn't do a great job of handling binary files. You can certainly check out and commit but there's no diff or merging.

I ended up writing a custom program in VB6 that extracted all the forms, reports, and code from the binary Access file and then versioning that. Then I had to write another custom program in VB to piece it all together back into a function Access file to be deployed.

If you're working with MP3 files maybe you can do something similar to extract the ID3 info to text files and then version those text files.

Owen
+2  A: 

try Git. It won't store reverse-delta, but chances are you're really not wanting to. It will allow your repository to contain binaries, and will track when you put a new one in. It's not trying to be space-efficient, but it is effective.

Tim Ottinger
A: 

You could check out bsdiff, which claims to support executable diffs (specificly) well. I am still testing integrating it into a custom ASP mini-VCS, so not much experience using it - but from the paper, it looks like it would handle most non-stream compressed binary files well.

Simon Buchan
A: 

I store all my pictures in Git. They don't change much, so space is not as much of an issue as it would be for someone who edits heavily. My favourite features are that Git stores blobs by their hash and that it does diffs efficiently (where possible) over (effectively) all files in the repository. This means that I can move files around, make copies and change metadata without bloating my repository.

Andrew Aylett