tags:

views:

1117

answers:

4

Hi,

I started a project some months ago and stored everything within a main directory. In my main directory "Project" there are several subdirectories containing different things: Project/paper contains a document written in LaTeX Project/sourcecode/RailsApp contains my rails app.

"Project" is GITified and there have been a lot of commits in both "paper" and "RailsApp" directory. Now, as I'd like to use cruisecontrol.rb for my "RailsApp" I wonder if there is a way to make a submodule out of "RailsApp" without losing the history.

Any suggestions?

+1  A: 

If you want to transfer some subset of files to a new repository but keep the history, you're basically going to end up with a completely new history. The way this would work is basically as follows:

  1. Create new repository.
  2. For each revision of your old repository, merge the changes to your module into the new repository. This will create a "copy" of your existing project history.

It should be somewhat straightforward to automate this if you don't mind writing a small but hairy script. Straightforward, yes, but also painful. People have done history rewriting in Git in the past, you can do a search for that.

Alternatively: clone the repository, and delete the paper in the clone, delete the app in the original. This would take one minute, it's guaranteed to work, and you can get back to more important things than trying to purify your git history. And don't worry about the hard drive space taken up by redundant copies of history.

Dietrich Epp
+5  A: 

Checkout git filter-branch. The examples section of the man page shows how to extract a subdirectory into it's own project while keeping all of it's history and discarding history of other files/directories (just what you're looking for).

Pat Notz
+3  A: 

One way of doing this is the inverse - remove everything but the file you want to keep.

Basically, make a copy of the repository, then use git filter-branch to remove everything but the file/folders you want to keep.

For example, I have a project from which I wish to extract the file tvnamer.py to a new repository:

git filter-branch --tree-filter 'for f in *; do if [ $f != "tvnamer.py" ]; then rm -rf $f; fi; done' HEAD

That uses git filter-branch --tree-filter to go through each commit, run the command and recommit the resulting directories content. This is extremely destructive (so you should only do this on a copy of your repository!), and can take a while (about 1 minute on a repository with 300 commits and about 20 files)

The above command just runs the following shell-script on each revision, which you'd have to modify of course (to make it exclude your sub-directory instead of tvnamer.py):

for f in *; do
    if [ $f != "tvnamer.py" ]; then
        rm -rf $f;
    fi;
done

The biggest obvious problem is it leaves all commit messages, even if they are unrelated to the remaining file. The script git-remove-empty-commits, fixes this..

git filter-branch --commit-filter 'if [ z$1 = z`git rev-parse $3^{tree}` ]; then skip_commit "$@"; else git commit-tree "$@"; fi'

You need to use the -f force argument run filter-branch again with anything in refs/original/ (which basically a backup)

Of course this will never be perfect, for example if your commit messages mention other files, but it's about as close a git current allows (as far as I'm aware anyway).

Again, only ever run this on a copy of your repository! - but in summary, to remove all files but "thisismyfilename.txt":

git filter-branch --tree-filter 'for f in *; do if [ $f != "thisismyfilename.txt" ]; then rm -rf $f; fi; done' HEAD
git filter-branch -f --commit-filter 'if [ z$1 = z`git rev-parse $3^{tree}` ]; then skip_commit "$@"; else git commit-tree "$@"; fi'
dbr
+3  A: 

Nowadays there's a much easier way to do it than manually using git filter-branch: git subtree

apenwarr