ansaurus

Question

How to remove unreferenced blobs from my git repo

Answer 1

+2 A:

Try to use git-filter-branch - it's not remove big blobs, but can remove big files which you specifies from whole repo. For me it reduces repo size from hundreds MB to 12 MB.

W55tKQbuRu28Q4xv 2009-12-15 04:23:55

Now _that_ is a scary command :) I'll have to give it a try when my git-fu feels stronger.

kkrugler 2009-12-15 14:36:45

Answer 2

+2 A:

As mentioned in this SO answer, git gc can actually increase the size of the repo!

This also this thread

Now git has a safety mechanism to not delete unreferenced objects right away when running 'git gc'.
By default unreferenced objects are kept around for a period of 2 weeks. This is to make it easy for you to recover accidentally deleted branches or commits, or to avoid a race where a just-created object in the process of being but not yet referenced could be deleted by a 'git gc' process running in parallel.

So to give that grace period to packed but unreferenced objects, the repack process pushes those unreferenced objects out of the pack into their loose form so they can be aged and eventually pruned.
Objects becoming unreferenced are usually not that many though. Having 404855 unreferenced objects is quite a lot, and being sent those objects in the first place via a clone is stupid and a complete waste of network bandwidth.

Anyway... To solve your problem, you simply need to run 'git gc' with the --prune=now argument to disable that grace period and get rid of those unreferenced objects right away (safe only if no other git activities are taking place at the same time which should be easy to ensure on a workstation).

And BTW, using 'git gc --aggressive' with a later git version (or 'git repack -a -f -d --window=250 --depth=250')

The same thread mentions:

 git config pack.deltaCacheSize 1

That limits the delta cache size to one byte (effectively disabling it) instead of the default of 0 which means unlimited. With that I'm able to repack that repository using the above git repack command on an x86-64 system with 4GB of RAM and using 4 threads (this is a quad core). Resident memory usage grows to nearly 3.3GB though.

If your machine is SMP and you don't have sufficient RAM then you can reduce the number of threads to only one:

git config pack.threads 1

Additionally, you can further limit memory usage with the --window-memory argument to 'git repack'.
For example, using --window-memory=128M should keep a reasonable upper bound on the delta search memory usage although this can result in less optimal delta match if the repo contains lots of large files.

On the filter-branch front, you can consider (with cautious) this script

#!/bin/bash
set -o errexit

# Author: David Underhill
# Script to permanently delete files/folders from your git repository.  To use 
# it, cd to your repository's root and then run the script with a list of paths
# you want to delete, e.g., git-delete-history path1 path2

if [ $# -eq 0 ]; then
    exit 0are still
fi

# make sure we're at the root of git repo
if [ ! -d .git ]; then
    echo "Error: must run this script from the root of a git repository"
    exit 1
fi

# remove all paths passed as arguments from the history of the repo
files=$@
git filter-branch --index-filter "git rm -rf --cached --ignore-unmatch $files" HEAD

# remove the temporary history git-filter-branch otherwise leaves behind for a long time
rm -rf .git/refs/original/ && git reflog expire --all &&  git gc --aggressive --prune

VonC 2009-12-15 16:06:23

http://stackoverflow.com/questions/359424/detach-subdirectory-into-separate-git-repository is also a good start for the `filter-branch` command usage.

VonC 2009-12-15 16:28:04

Hi VonC - NI'd tried git gc prune=now with no luck. It really looks like a git bug, in that I wound up with unreferenced blobs locally following a branch deletion, but these aren't there with a fresh clone of the GitHub repo...so it's just a local repo problem.But I have additional files that I want to clear out, so the script you referenced above is great - thanks!

kkrugler 2009-12-16 17:01:40

Answer 3

+1 A:

git gc --prune=now, or low level git prune --expire now.

Jakub Narębski 2009-12-16 20:14:06

Answer 4

+1 A:

Each time your HEAD moves, git tracks this in the reflog. If you removed commits, you still have "dangling commits" because they are still referenced by the reflog for ~30 days. This is the safety-net when you delete commits by accident.

You can use the git reflog command remove specific commits, repack, etc.., or just the high level command:

git gc --prune=now

vdboor 2009-12-18 12:36:06

ansaurus

tags:

views:

answers:

How to remove unreferenced blobs from my git repo

related questions