views:

575

answers:

3

We have recently started using git and had a nasty problem when someone committed a large (~1.5GB file), that then caused git to crash on various 32bit OSes. This seems to be a known bug (git mmaps files into memory, which doesn't work if it can't get enough contingous space), which isn't going to get fixed any time soon.

The easy (for us) solution would be to get git to reject any commits larger than 100MB or so, but I can't figure out a way to do that.

EDIT: The problem comes from accidental submission of large file, in this case a large dump of program output. The aim is to avoid accidental submission, just because if a developer does accidentally submit a large file, trying to then get it back out the repository is an afternoon where no-one can do any work, and has to fix up all local branches they have.

+1  A: 

If you have control over your committers' toolchain, it may be straightforward to modify git commit so that it performs a reasonableness test on the file size prior to the "real" commit. Since such a change in the core would burden all git users on every commit, and the alternative strategy of "banish anyone who would commit a 1.5GB change" has an appealing simplicity, I suspect such a test will never be accepted in the core. I suggest you weigh the burden of maintaining a local fork of git -- nannygit -- against the burden of repairing a crashed git following an overambitious commit.

I must admit I am curious about how a 1.5 GB commit came to be. Are video files involved?

Thomas L Holaday
+1  A: 

When exactly did the problem occur? When they committed the file originally or when it got pushed elsewhere? If you have a staging repo that everyone pushes to, you could implement an update hook to scan changing refs for large files, along with other permissions etc checking.

Very rough and ready example:

git --no-pager log --pretty=oneline --name-status $2..$3 -- | \
  perl -MGit -lne 'if (/^[0-9a-f]{40}/) { ($rev, $message) = split(/\s+/, $_, 2) }
     else { ($action, $file) = split(/\s+/, $_, 2); next unless $action eq "A"; 
       $filesize = Git::command_oneline("cat-file", "-s", "$rev:$file");
       print "$rev added $file ($filesize bytes)"; die "$file too big" if ($filesize > 1024*1024*1024) }';

(just goes to show, everything can be done with a Perl one-liner, although it might take multiple lines ;))

Called in the way that $GIT_DIR/hooks/update is called (args are ref-name, old-rev, new-rev; e.g. "refs/heads/master master~2 master") this will show the files added and abort if one is added that is too big.

Note that I'd say that if you're going to police this sort of thing, you need a centralised point at which to do it. If you trust your team to just exchange changes with each other, you should trust them to learn that adding giant binary files is a bad thing.

araqnid
+1  A: 

You can distribute a pre-commit hook that prevents commits. On central repositories you can have a pre-receiev hook that rejects large blobs by analyzing the received data and prevent it from being referenced. Data will be received, but since you reject updates to refs, all new objects received will be unreferences and can be picked up and dropped by git gc.

I don't have a script for you though.

robinr