tags:

views:

922

answers:

3

I have a 33 MB large file where I want to permanently delete the oldest revisions of that file, so I only the latest X revisions are kept around. How to do it?

My bare repository has grow huge because of it.

I have tried the following.. but it removes the file entirely

git filter-branch --index-filter 'git rm --cached --ignore-unmatch big_manual.txt' HEAD

To identify the large files in my repository I use git-large-blob by Aristotle Pagaltzis.

+7  A: 

I think you are on the right track with the "git filter-branch" command you tried. The problem is you haven't told it to keep the file in any commits, so it is removed from all of them. Now, I don't think there is a way to directly tell git-filter-branch to skip any commits. However, since the commands are run in a shell context, it shouldn't be too difficult to use the shell to remove all but the last X number of revisions. Something like this:

KEEP=10 I=0 NUM_COMMITS=$(git rev-list master | wc -l) git filter-branch --index-filter 'if [[ ${I} -lt $((NUM_COMMITS - KEEP)) ]]; then git rm --cached --ignore-unmatch big_manual.txt; fi; I=$((I + 1))'

That would keep big_manual.txt in the last 10 commits.

That being said, like Charles has mentioned, I'm not sure this is the best approach, since you're in effect undoing the whole point of VCS by deleting old versions.

Have you already tried optimizing the git repository with 'git-gc' and/or 'git-repack'? If not, those might be worth a try.

Dan Moulding
this is the solution! It walked through all 312 revisions and discarded the oldest revisions, perfect. This was very educational. For loops, rev-list.. and calling filter-branch without any commit id which seems unintuitive (will have to investigate how that magic works), but it worked. Thank you for that. Sometimes I use git-gc and fsck, but its not yet something I have automated. Let's not talk about my opinion about VCS :-)
neoneye
>>Let's not talk about my opinion about VCS :-)Fair enough :)I'm glad this worked for you. As for the magic of not specifying a revision, git-filter-branch internally calls git-rev-list to get the list of commits to rewrite. It will pass "HEAD" to git-rev-list as a default ref if you don't specify one. So not specifying anything is the same as specifying "HEAD" (as you did in your example).
Dan Moulding
+6  A: 
Jakub Narębski
interesting. will try.
neoneye
yeah, the grafts mechanism indeed seems to be the intended way to do it. Thank you for making me aware of this. Unfortunately I don't have time to experiment with it today.
neoneye
The grafts method gould work in some cases, but it will get rid of the history for all files. In this case, neoneye wants to only remove history for *some* files. So I'm not sure grafts would be a suitable solution. And shallow clone is out of the question because shallow repositories are crippled (see git-clone docs for a description of their limitations).
Dan Moulding
Dan, yes good point, a solution that only remove history for a single file. Ok, so I won't do any experimenting with grafts.
neoneye
+3  A: 

You might want to consider using git submodules. That way you can keep the images and other big files in another git repository, and the repository that has the source codes can refer to a particular revision of that other repository.

That will help you to keep the repository revisions in sync, because the parent repository contains a link to a particular sub repository revision. It will also let you to remove/rebase old revisions in the sub repository, without affecting the parent repository where your source code is - the removals of old revisions in a sub repository will not mess up the history of the parent repository, because you just update that to which revision the sub repository link in the parent repository points to.

Esko Luontola
good point. I didn't knew about git submodules.
neoneye