views:

7958

answers:

8

I have a Git repository which contains a number of subdirectories. Now I have found that one of the subdirectories is unrelated to the other and should be detached to a separate repository.

How can I do this while keeping the history of the files within the subdirectory?

I guess I could make a clone and remove the unwanted parts of each clone, but I suppose this would give me the complete tree when checking out an older revision etc. This might be acceptable, but I would prefer to be able to pretend that the two repositories doesn't have a shared history.

Just to make it clear, I have the following structure:

XYZ/
    .git/
    XY1/
    ABC/
    XY2/

But I would like this instead:

XYZ/
    .git/
    XY1/
    XY2/
ABC/
    .git/
+83  A: 

You want to clone your repository and then use git filter-branch to mark everything but the subdirectory you want in your new repo to be garbage-collected. To clone your local repository:

 $ git clone --no-hardlinks /XYZ /ABC

The --no-hardlinks switch makes git use real file copies instead of hardlinking when cloning a local repository. The garbage collection and pruning actions will only work on blobs (file contents), not links.

Then just filter-branch and reset to exclude the other files, so they can be pruned:

 $ git filter-branch --subdirectory-filter ABC HEAD
 $ git reset --hard
 $ git gc --aggressive
 $ git prune

and now you have a local git repository of the ABC sub-directory with all its history preserved.

EDIT -- For most uses, git filter-branch should have the added parameter -- --all. (Yes that's really dash dash space dash dash all. This needs to be the last parameters for the command.) As Matli discovered, this keeps the project branches and tags included in the the new repo.

Paul
Very good answer. Thanks! And to really get exactly what I wanted, I added "-- --all" to the filter-branch command.
matli
Good point! That would apply to most people using git filter-branch this way. In my case, I was segregating a library which had its own series of release numbers in tags, so I didn't want the project tags in my new repo. I'll edit that into the answer.
Paul
As of today, filter-branch is not supported on Windows. It looks like it is coming soon though. Check the msysgit discussion group (at google groups) for details.
Osman
Why do you need `--no-hardlinks`? Removing one hardlink won't affect the other file. Git objects are immutable too. Only if you'd change owner/file permissions you need `--no-hardlinks`.
vdboor
An additional step I would recommend would be "git remote rm origin". This would keep pushes from going back to the original repository, if I'm not mistaken.
Tom
+15  A: 

Paul's answer above creates a new repository containing /ABC, but does not remove /ABC from within /XYZ. The following command will remove /ABC from within /XYZ:

git filter-branch --tree-filter "rm -rf ABC" --prune-empty HEAD

Of course, test it in a 'clone --no-hardlinks' repository first, and follow it with the reset, gc and prune commands Paul lists.

pgs
make that `git filter-branch --index-filter "git rm -r -f --cached --ignore-unmatch ABC" --prune-empty HEAD`and it will be **much** faster.index-filter works on the index while tree-filter has to checkout and stage **everything for every commit**.
fmarc
in some cases messing up the history of repository XYZ is overkill ... just a simple "rm -rf ABC; git rm -r ABC; git commit -m'extracted ABC into its own repo'" would work better for most people.
Evgeny
A: 

You might need something like "git reflog expire --expire=now --all" before the garbage collection to actually clean the files out. git filter-branch just removes references in the history, but doesn't remove the reflog entries that hold the data. Of course, test this first.

My disk usage dropped dramatically in doing this, though my initial conditions were somewhat different. Perhaps --subdirectory-filter negates this need, but I doubt it.

+3  A: 

To add to Paul's answer, I found that to ultimately recover space, I have to push HEAD to a clean repository and that trims down the size of the .git/objects/pack directory.

i.e.

$ mkdir ...ABC.git
$ cd ...ABC.git
$ git init --bare

After the gc prune, also do:

$ git push ...ABC.git HEAD

Then you can do

$ git clone ...ABC.git

and the size of ABC/.git is reduced

Actually, some of the time consuming steps (e.g. git gc) aren't needed with the push to clean repository, i.e.:

$ git clone --no-hardlinks /XYZ /ABC
$ git filter-branch --subdirectory-filter ABC HEAD
$ git reset --hard
$ git push ...ABC.git HEAD
Case Larsen
+6  A: 

I’ve found that in order to properly delete the old history from the new repository, you have to do a little more work after the filter-branch step.

  1. Do the clone and the filter:

    git clone --no-hardlinks foo bar; cd bar
    git filter-branch --subdirectory-filter subdir/you/want
    
  2. Remove every reference to the old history. “origin” was keeping track of your clone, and “original” is where filter-branch saves the old stuff:

    git remote rm origin
    git update-ref -d refs/original/refs/heads/master
    git reflog expire --expire=now --all
    
  3. Even now, your history might be stuck in a packfile that fsck won’t touch. Tear it to shreds, creating a new packfile and deleting the unused objects:

    git repack -ad
    
jleedev
+7  A: 

Apparently I require 50 reputation points to 'comment', which prevents me from being appropriately helpful.

Would someone with reputation please add this as a comment to Paul's answer above, and refer to it from pgs' answer as well?

Thanks!


git-filter-branch set_ident() calls LANG=C LC_ALL=C sed ... for maximum compatibility with non UTF-8 aware sed.

Therefore, git-filter-branch dies on commits with UTF-8 characters in the author: or committer: fields (see git show --pretty=raw <commit SHA1>).

A workaround, use a UTF-8 aware sed (e.g. GNU sed) and edit git-filter-branch:

LANG=en_US.UTF-8 sed -ne ...

RIchard Michael
A: 

Use this filter command to remove a subdirectory, while preserving your tags and branches:

git filter-branch --index-filter "git rm -r -f --cached --ignore-unmatch DIR" --prune-empty --tag-name-filter cat -- --all
Casey