views:

460

answers:

2

Possible Duplicate:
compress binaries in SVN ?

exact duplicate by same author: compress binaries in SVN?


Hi,

I want to build a script to wrap the issues of commit and checkout. I want to compress binary files before commiting and to uncompress right after checkout.

What is the way to do it? is the IMPORT command instead of COMMIT preferd because there is no delta comparrison? I know it wouldn't be space-efficient, but still?

thanks, Oded.

+1  A: 

Compressing files will actually increase the space taken by your SVN repository.

Why? The SVN server tries to only store the deltas resulting from binary diffing. So normally only the parts of the file that were changed need to be stored.

If you compress the files however, then the slightest change will change the compression result completely. The complete compressed file will need to be stored by the SVN server for each commit, instead of just the changed part.

Wim Coenen
Not exactly, because I only compress binary files, not folder that contain all types of files. and if I do that, I want to use the Import command. I take into consideration the disk space, but I'm still thinking about it.I'm still looking for a way to do this.
Oded
I doesn't matter that the files are binary. SVN can generate deltas for uncompressed binary files. Compressing files only makes sense if you add them once, then never commit any changes. Or if the files are changed completely each time anyway, e.g. when editing images or video.
Wim Coenen
+1  A: 

The interaction between Subversion's binary delta algorithms, compression in tracked files and the server's own internal use of compression can be complex.

Here's an example

I took a copy of the an x86 emacs binary (about 10MB, 4MB compressed with gzip) as my "binary file". I wrote a little program which "edits" a binary file by overwriting 4 consecutive bytes at a random position with random data.

I then wrote three scripts to simulate 100 commits in the following three fashions:

the file is compressed with gzip in the repository

For each repetition: we decompress the file, then perform our edit, then recompress it and then check it in.

Final repository size: 9.6 MB

(This was better than I expected until I realized that because of the way gzip works, the bytes before the random edit (half the file, on average) will be identical to those of the previous version, even after compression.)

the file is not compressed in the repository

For each repetition: We simply perform our edit and then check in the changes.

Final repository size: 5.1 MB

the file is imported from scratch every time

For each repetition: we copy the binary (not using svn copy) to a new file, edit this copy, add it and commit the changes. This is equivalent to an import since there is no historical connection to the previous copy of the file.

Final repository size: 403 MB

Just to give you a feel for Subversion's server-side compression, I repeated this test, only this time I compressed the binary files on the client side before adding and committing them each time.

Final repository size: 392 MB

So, whatever subversion is doing, it appears to about as good as gzip.


Your questions make it sound like you're assuming that compression on the client side will help you. It may very well not do so.

In my experience it's only worth doing when:

  1. The file is large.
  2. The compression you are using is considerably tighter than what Subversion manages. (e.g. if you're using bzip2 or lzma)
  3. The file is rarely edited.
bendin