views:

92

answers:

1

I want something in git that is similar to Mercurial's Bigfiles Extension (note: I know of git-bigfiles, but that is unrelated).

Basically I want to store large binaries in my git repository, but I don't want to get every version ever of the large binary when I do a clone. I only want to download the large binaries when I checkout a specific revision containing those large files.

+2  A: 

Here are a few options to consider:

Shallow clones: You can add the --depth <depth> parameter to git clone to get a shallow clone of the repository. e.g. if <depth> is 1, this means that the clone will only fetch the files needed for the most recent commit. However, such repositories have awkward restrictions on what you can do with them, as outlined in the git clone man page:

        --depth 
           Create a shallow clone with a history truncated to the specified
           number of revisions. A shallow repository has a number of
           limitations (you cannot clone or fetch from it, nor push from nor
           into it), but is adequate if you are only interested in the recent
           history of a large project with a long history, and would want to
           send in fixes as patches.

In fact, as discussed in this thread that's something of an overstatement - there are useful situations where pushing from a shallow clone will still work, and it's possible that will fit your workflow.

Scott Chacon's "git media" extension: the author describes this in answer to this similar question and in the README on github: http://github.com/schacon/git-media .

Shallow submodules: you could keep all your large files in a separate git repository and add that as a shallow submodule to your main repository. This would have the advantage that you don't have the restrictions of shallow clones for your code, just the repository with the large files.

There also are any number of ways of doing this by adding hooks that (for example) rsync over your large files in from git hooks, but I assume that there are good reasons that you want to keep these files under git's control in the first place.

I hope that's of some help.

Mark Longair
git-media looks exactly what I want, but it's not quite finished. I think I got it working with a little hacking though. Now I just need to code in support for uploading files to where I want them to go.
davr
For future reference, here's my fork of git-media with some needed changes: http://github.com/davr/git-media
davr
Great, that looks very useful! Just as another point of interest, the thread here (and particularly the slides linked to in the first mail) are interesting on the subject of git dealing with large files http://git.661346.n2.nabble.com/Git-and-Media-repositories-td1446700.html
Mark Longair
Wow...so apparently ruby sucks. I ported the 'filter' scripts from ruby to a bash shell script, and it runs about 20x as fast now (no more 10 second delays when running 'git status', it's now more like half a second. Much more usable.)
davr