views:

72

answers:

2

Update: I tried to simplify the real example here to get a clear explanation of my options, but that didn't really work. The linked examples below so far are too general to get even this simple example working.

I was able to do this type of thing with SVN all the time and got quite skilled at it. Now I'm finding it extremely difficult in Git and starting to believe that my history is basically too munged together to be able to pull it apart.

Real world problem: I around a dozen files that have been moved and renamed. Their history is intermixed with the history of hundreds of other files for which I want to completely remove the history.

In SVN, I would be able to use a sequence of dump/include-filter/exclude-filter/load to get the repository trimmed down and rarely I might need to rename paths manually in the dump file itself before loading.

Something like this and I would have been done:

SET Includes=trunk/src/Foo.aaa trunk/src/Foo.bbb trunk/src/Foo trunk/src/Bar
SET Excludes=trunk/src/Bar/Blah.aaa trunk/src/Foo/Blah.aaa

svnadmin dump FooSrc > Full.dump 2> Dump.log
svndumpfilter include %Includes% --skip-missing-merge-sources --renumber-revs --drop-empty-revs < Full.dump > Filter_1.dump 2> Filter_1.log
svndumpfilter exclude %Excludes% --skip-missing-merge-sources --renumber-revs --drop-empty-revs < Filter_1.dump > Filter_2.dump 2> Filter_2.log
svnadmin create FooDest
svnadmin load FooDest --ignore-uuid < Filter_2.dump > Load.log 2> Load_Errors.log

Does anyone have a good example of this that is more than just a trivial removal of a single file or export of a single subdirectory?

The simplest way I can define the set of files is with a list of 7 directory paths. Everything inside of those directories is needing to be kept and everything outside needs to be pruned from the history.


Simplified problem:

I have a Git repository which has a handful of files that I'd like to extract into its own repository. The problem is these files were created and modified throughout the history of the original repository, so I am having trouble figuring out how to cleanly extract them.

Here is a gist of what my history looks like (only with more commits and lots more to ignore). As you can see I obviously didn't plan to have these files later be cherry picked out of the history:

commit 4a09d3f977a8595d9e3f61766a5fd743e4265a56

M    src/Foo/Bar/FileToExtract2.foo
A    src/Foo/Bar/FileToExtract3.bar
D    src/Foo/AnotherFileToIgnore.txt

commit 05d26f23518083270cc45bf037ced29bec45e064

M    src/Foo/Blah/IgnoreThisOneToo.foo
M    src/Foo/AnotherFileToIgnore.txt

commit 343187228f4bd8e4427395453034c34ebd9a95f3

M    src/Foo/Bar/FileToExtract1.txt
M    src/Foo/AnotherFileToIgnore.txt

commit 46a0129104ac31291462f657292aab43f8883d8d

A    src/Foo/Bar/FileToExtract1.txt
A    src/Foo/Bar/FileToExtract2.foo
M    src/Foo/FileToIgnore.txt

commit 3fe6af56f0d8dc42fcb5b0bafee41bff534ba2cc

A    src/ReadMe.txt
A    src/IgnoreMe.foo
A    src/Foo/FileToIgnore.txt
A    src/Foo/Blah/IgnoreThisOneToo.foo
A    src/Foo/AnotherFileToIgnore.txt

In the end, what I want to have is a clean repository with the complete history of just the files in src/Foo/Bar/. The rest can be ignored. I'm also okay with keeping this repository as is (i.e. no history rewrite) and just committing a delete for that entire directory.

In SVN, I would use svnadmin dump, svndumpfilter, and svnadmin load. If I was careful, I could even manually edit the dump file to clean up paths, etc.

I've been looking through the Git commands and am unable to see a way of doing this. Any help would be greatly appreciated.

+3  A: 

You can use git filter-branch and detach the directory Foo in its own directory.
See:

VonC
Excellent. I think this is exactly what I was looking for. I'll try it out tonight. Thanks!
McKAMEY
It looks like this is what I need, but what if I want to include more than one subdirectory in the new pruned repository? It appears that `git filter-branch --subdirectory-filter <directory>` only takes one value?
McKAMEY
@McKAMEY: http://stackoverflow.com/questions/1425892/how-do-you-merge-two-git-repositories is a good start: submodules or subtree merge.
VonC
Ahh I think I see so produce a repository for each subdirectory, then merge them back together into single? Do the histories get interlaced or are they stacked one after another?
McKAMEY
@McKAMEY: I believe the history is stacked when using subtree merge. But I am sure the history of two submodules remains independent one from another (see true nature of submodules: http://stackoverflow.com/questions/1979167/git-submodule-update/1979194#1979194). In your case though, subtree merge would make more sense.
VonC
@McKAMEY: They are independent (two roots, i.e. parentless commits).
Jakub Narębski
The histories of the 7 directories I need to extract are related (frequently changed within the same commit) so I don't think extracting them individually and then stacking their histories will work. The histories would be out of order.
McKAMEY
@McKAMEY: I believe Jakub was correcting me in his comment above by stating that subtree merge would generate 2 independent roots.
VonC
What if commits affect both subdirectories? Would this split those commits into independents?
McKAMEY
@McKAMEY: for subtree merge, past commits would be split, each in their own history branch. But future commits (after subtree merge) should not be split and affect all repositories in one new commit. All this needs to be tested though, as I haven't directly played with subtree merge in a long time.
VonC
+2  A: 

The equivalent of SVN's svnadmin dump, svndumpfilter and svnadmin load would be git fast-export, one own script (see examples) and git fast-import.

Jakub Narębski
@VonC: Thanks for providing links.
Jakub Narębski
I have 7 directories which contain all of the list of files I need to extract. What do I specify between export and import to filter out everything but these 7 directories? I need the equivalent of `svndumpfilter include`.
McKAMEY
The "see examples" link only provides one example of `sed "s|refs/heads/master|refs/heads/other|"` for the filter script. How would I go about telling Git to remove all files that didn't exist as one of my 7 subdirectories?
McKAMEY