ansaurus

Question

How to split a git repository while preserving subdirectories?

Answer 1

+2 A:

You could indeed use the subdirectory filter followed by an index filter to put the contents back into a subdirectory, but why bother, when you could just use the index filter by itself?

Here's an example from the man page:

git filter-branch --index-filter 'git rm --cached --ignore-unmatch filename' HEAD

This just removes one filename; what you want to do is remove everything but a given subdirectory. If you want to be cautious, you could explicitly list each path to remove, but if you want to just go all-in, you can just do something like this:

git filter-branch --index-filter 'git ls-tree --name-only --full-tree $GIT_COMMIT | grep -v "^directory-to-keep$" | xargs git rm --cached -r' -- --all

I expect there's probably a more elegant way; if anyone has something please suggest it!

A few notes on that command:

filter-branch internally sets GIT_COMMIT to the current commit SHA1
I wouldn't have expected --full-tree to be necessary, but apparently filter-branch runs the index-filter from the .git-rewrite/t directory instead of the top level of the repo.
grep is probably overkill, but I don't think it's a speed issue.
--all applies this to all refs; I figure you really do want that. (the -- separates it from the filter-branch options)

Edit: thanks to Thomas, here's a commit filter to remove the now-empty commits. It can be used in the same command (just place it between the index filter and the --):

--commit-filter 'if [ "$1" = "$(git rev-parse $3^{tree})" ]; then skip_commit "$@"; else git commit-tree "$@"; fi' "$@" --remap-to-ancestor

The --remap-to-ancestor option keeps you from losing refs which pointed to skipped commits. (For example, if the tag v2.0 pointed to a commit which didn't touch this subdirectory, you'd probably want it remapped to the nearest ancestor which did, instead of just removing it.)

Jefromi 2010-05-10 17:57:09

Apart from the nested single quotes (that I took the liberty to replace), this worked almost perfectly. The only problem was that empty commits to now nonexistent directories remained in the log. I removed these using `git filter-branch -f --commit-filter 'if [ z$1 = z\`git rev-parse $3^{tree}\` ]; then skip_commit "$@"; else git commit-tree "$@"; fi' "$@"` that I found at http://github.com/jwiegley/git-scripts/blob/master/git-remove-empty-commits

Thomas 2010-05-10 18:43:44

@Thomas: Thanks for fixing my careless mistake! Also, you should be able to use the commit filter in the same command as the index filter. The filters are run in the order shown in the documentation; commit-filter is naturally after the filters which modify the contents of the commit. You probably also want to use `--remap-to-ancestor`, which will cause refs pointing to skipped commits to be moved to the nearest ancestor instead of excluding them.

Jefromi 2010-05-10 19:01:40

ansaurus

tags:

views:

answers:

How to split a git repository while preserving subdirectories?

related questions