tags:

views:

1035

answers:

3

Hi,

I can't seem to grok the different solutions I've found and studied for tracking external code. Let alone understand how to apply them to my use case...

Would you guys be so kind to shed some light on this and help me with my specific use case? What would be the best solution for the following, concrete problem? (I'm not gonna attempt to generalize my problem, since I might make wrong assumptions about stuff, especially since I'm so new with all this...)

I'm building a website in Django (a web framework in Python). Now, there are a lot of 3rd party plugins available for use with Django (Django calls them 'apps'), that you can drop in your project. Some of these apps might require a bit of modification to get working like I want them. But if you start making modifications to 3rd party code you introduce the problem of updating that code when newer versions appear AND at the same time keeping your local modifications.

So, the way I would do that in Subversion is by using vendor branches. My repository layout would look like this:

/trunk
  ...
  /apps
    /blog-app
  ...
/tags
  ...
/branches
  ...
/vendor
  /django-apps
    /blog-app
      /1.2
      /1.3
      /current
    /other-app
      /3.2
      /current

In this case /trunk/apps/blog-app would have been svn copy'd of one of the tags in /vendor/django-apps/blog-app. Say that it was v1.2. And that I now want to upgrade my version in trunk to v1.3. As you can see, I have already updated /vendor/django-apps/blog-app/current (using svn_load_dirs) and 'tagged' (svn copy) it as /vendor/django-apps/blog-app/1.3. Now I can update /trunk/apps/blog-app by svn merge'ing the changes between /vendor/django-apps/blog-app/1.2 and /vendor/django-apps/blog-app/1.3 on /trunk/apps/blog-app. This will keep my local changes. (for people unknown with this process, it is described in the Subversion handbook: http://svnbook.red-bean.com/en/1.5/svn.advanced.vendorbr.html)

Now I want to do this whole process in Git. How can I do this?

Let me re-iterate the requirements:

  • I must be able to place the external code in an arbitrary position in the tree
  • I must be able to modify the external code and keep (commit) these modifications in my Git repos
  • I must be able to easily update the external code, should a new version be released, whilst keeping my changes

Extra (for bonus points ;-) ):

  • Preferably I want to do this without something like svn_load_dirs. I think it should be possible to track the apps and their updates straight from their repository (most 3rd party Django apps are kept in Subversion). Giving me the added benefit of being able to view individual commit messages between releases. And fixing merge conflicts more easily since I can deal with a lot of small commits instead of the one artificial commit created by svn_load_dirs. I think one would do this with svn:externals in Subversion, but I have never worked with that before...

A solution where a combination of both methods could be used would be even more preferable, since there might be app developers who don't use source control or don't make their repos available publicly. (Meaning both svn_load_dirs-like behavior and tracking straight from a Subversion reposity (or another Git))

I think I would either have to use subtrees, submodules, rebase, branches, ... or a combination of those, but smack down me if I know which one(s) or how do to it :S

I'm eagerly awaiting your responses! Please be as verbose as possible when replying, since I already had a hard time understanding other examples found online.

Thanks in advance

A: 

I use git submodules to track reusable apps in my Django projects, but it is kind of messy in the long run.

It is messy for deployment because you can't get a clean archive of the whole tree (with submodules) using git archive. There are some tricks, but nothing perfect. Besides, the submodule update mecanism is not that good for working with submodules branches.

You might have to take a look at virtualenv and pip, because they had some recent improvements in order to work with external repositories.

pip : http://pip.openplans.org/ and working with pip/virtualenv : http://www.b-list.org/weblog/2008/dec/15/pip/

Grégoire Cachet
How messy does it become? Where does the mess come from?Can you give me a URL to this 'pig' thing? Googling for 'git pig' or 'python pig' gives mixed results (about snakes eating pigs etc).
hopla
sorry, I made a typo on pig: it is pip.
Grégoire Cachet
Ok, thanks. Would I be able to make local changes using virtualenv or pip? (and keep those changes when update the 3rd party code)
hopla
+6  A: 

There are two separate problems here:

  1. How do you maintain local forks of remote projects, and
  2. How do you keep a copy of remote projects in your own tree?

Problem 1 is pretty easy by itself. Just do something like:

git clone git://example.com/foo.git
cd foo
git remote add upstream git://example.com/foo.git
git remote rm origin
git remote add origin ssh://.../my-forked-foo.git
git push origin

You can then work on your forked repository normally. When you want to merge in upstream changes, run:

git pull upstream master

As for problem 2, one option is to use submodules. For this, cd into your main project, and run:

git submodule add ssh://.../my-forked-foo.git local/path/for/foo

If I use git submodules, what do I need to know?

You may find git submodules to be a little bit tricky at times. Here are some things to keep in mind:

  1. Always commit the submodule before committing the parent.
  2. Always push the submodule before pushing the parent.
  3. Make sure that the submodule's HEAD points to a branch before committing to it. (If you're a bash user, I recommend using git-completion to put the current branch name in your prompt.)
  4. Always run 'git submodule update' after switching branches or pulling changes.

You can work around (4) to a certain extent by using an alias created by one of my coworkers:

git config --global alias.pull-recursive '!git pull && git submodule update --init'

...and then running:

git pull-recursive

If git submodules are so tricky, what are the advantages?

  1. You can check out the main project without checking out the submodules. This is useful when the submodules are huge, and you don't need them on certain platforms.
  2. If you have experienced git users, it's possible to have multiple forks of your submodule, and link them with different forks of your main project.
  3. Someday, somebody might actually fix git submodules to work more gracefully. The deepest parts of the submodule implementation are actually quite good; it's just the upper-level tools that are broken.

git submodules aren't for me. What next?

If you don't want to use git submodules, you might want to look into git merge's subtree strategy. This keeps everything in one repository.

What if the upstream repository uses Subversion?

This is pretty easy if you know how to use git svn:

git svn clone -s https://example.com/foo
cd foo
git remote add origin ssh://.../my-forked-foo.git
git push origin

Then set up a local tracking branch in git.

git push origin master:local-fork
git checkout -b local-fork origin/local-fork

Then, to merge from upstream, run:

git svn fetch
git merge trunk

(I haven't tested this code, but it's more-or-less how we maintain one submodule with an upstream SVN repository.)

Don't use git svn rebase, because it will make it very difficult to use git submodule in the parent project without losing data. Just treat the Subversion branches as read-only mirrors of upstream, and merge from them explicitly.

If you need to access the upstream Subversion repository on another machine, try:

git svn init -s https://example.com/foo
git svn fetch

You should then be able to merge changes from upstream as before.

emk
The trick here is using submodules with git-svn.
Ryan Graham
Ryan: I added some untested Subversion examples, based on a submodule I set up a few days ago. If it breaks, let me know and I'll fix it.
emk
emk: this is a very nice overview, thanks! I still not having my 'aha!' moment, but I think I will just have to try things out.Given the fact that I won't be pushing my changes back, I think the subtree merge method would be my best bet? Can I use that in combination with git-svn?
hopla
emk
hopla: You should be able to use subtree merge with git svn, but I haven't tried it myself. In general, subtree mixes all the branches for all your projects in one repo. Submodules are clunkier, but they keep projects distinct. You may find them easier if you have lots of branches. Good luck!
emk
A: 

I've looked around a bit more and stumbled upon Braid. It's a tool that automates vendor branches in Git. It can use Git or SVN repos.

By crawling through the source I found out that it uses the subtree strategy. And seems to make it really simple! Plus, it seems to fulfill all my requirements!

Before I jump in and use it: does anyone here have any experience with Braid? I would like to find out about possible cons if there are any. Also, if you haven't used Braid, but have some expertise in Git, what do you think about it, at first sight?

hopla