views:

1540

answers:

5

How do I set up a Git project to contains other projects?

eg. I am working on an online mapping app. We developed a GPS tool together with an outfit in SF. We simultaneously developed a Python Geomapping script together with a different concern (that only cares about geomapping). Our own core files unite the two, and build upon them for the app we need.

Each of the projects must exist by itself - the folks that have interest in the GPS only have interest in GPS - but the "parent" project which includes all of the others must be accessible as a project.

I've spent some time trying to understand submodules, but they appear to have too much independence for what is needed.

Also, if possible, it would be nice if each of those projects could contain one or two overlapping scripts. Could one Git project include a file that is not part of its 'root' so that when this file is updated by either team both can benefit?

Is this doable with Git? With Mercurial? Does the host (GitHub, Gitorious) matter?

I have the idea of using Subversion for the 'parent' - ignoring the .git folders, and using Git for the projects (ignoring .svn folders) - but that is only a last resort.

edit:

To explain why I don't want Submodules:

  1. When users download, the zip does not include the submodules (here & here). Ditto when even collaborators try to setup the project. This is a show stopper.
  2. Submodules are frozen - they do not (easily) pick up the latest version of the project that is being pointed to.
  3. Other reasons as pointed out in the fantastic answers below and in this monologue at NoPugs.

Subtree-merging (introduced to me by Paul, below) will not do: It is difficult to update the source [of a subtree] from within the project it is merged into, and that source must reside outside of the 'root' folder of the project. Being a web app, it is vital that all my pages link internally to a folder within them, and that testing and updates be done directly within that folder. (Hope this is clear and useful to others.)

Still studying setting up 'remote branches' but other ideas are still welcome.

+2  A: 

git-submodule might be what you are looking for:

http://book.git-scm.com/5_submodules.html

http://git-scm.org/gitwiki/GitSubmoduleTutorial

Caotic
+5  A: 

I haven't found submodules to be particularly useful on the (small) projects I've worked on. Once you've set them up, working on the whole project requires adding additional parameters to almost every command and the syntax isn't completely regular. I imagine if I worked on larger projects with more submodules, I'd see it as a more beneficial tradeoff.

There are two possibilities that keep the sub-projects as independent git repos that you pull from into your main (integration) repo:

  • Using subtree merge to bring your external projects into separate subdirectories in your main repo that includes your core files. This makes it easy to update the main project from the external projects, but complicated to send changes back to the external projects. I think of this as a good way to include project dependencies, but it wouldn't work so well with shared files. Another simple explanation.

  • Set up each project as a remote branch in your main repo and merge from each of them into your master (integration) branch that also contains your core files. This requires some discipline: if you make any changes to the external projects in your main repo, they must be made in the branch and then merged into the master; and you never want to merge into the project branches. This makes it easy to send changes back to the external projects and is a perfectly acceptable use of branches in Git.

    Your shared scripts can be handled as another independent branch in your main directory which your external partners can pull from and push to as a remote branch.

If you try to run SVN & Git in the same directory, you make it very hard to use branching in either system, because SVN does branching by copying file directories while Git tracks pointers. Neither system would automatically see branches you make in the other. I think that 'solution' is more trouble than it is worth.

Paul
After too much time trying, I still can't get your second method to work. Setting up the branch and merging it in was OK, but beyond that I'm lost. Would it be possible to describe a more complete workflow? Would it be proper to ask this as its topic and start a new thread?
SamGoody
What are you trying to do that you can't? Basically you are creating vendor branches, which are fairly common in svn. (Actually 'vendor branch' seems to have two different uses -- an included repo from a vendor like you want, or a release branch for a specific vendor.)I'm traveling with very little net access, but will try to get something more written for this.
Paul
I started a new thread on the subject at http://stackoverflow.com/questions/769786/vendor-branches-in-git, please excuse my obvious frustration there. Thanks very much.
SamGoody
SamGoody
+2  A: 

I've used git to stitch together my own github hosted project and an external UI library that I wanted to use. The library is hosted in a subversion repository on sourceforge.

I used git-submodule and git-svn and it worked reasonably well. The downsides were:

  1. In order to keep up to date with the library repository, I had to perform a new commit to update the submodule git hash "pointer". This is because git submodules, unlike svn:externals, are pinned to a particular commit id. This may not be an actual downside if you actually want to pin a stable version, I was working with code that was WIP.

  2. The initial pull of a git repo with submodules requires an additional step with "git submodule init". This is not an issue for you, but for others using your code they will have to remember or be told to perform this step before compiling/running/testing your code.

  3. If you use the command line it is easy to screw up your repository with git-add. This is because you type git add subm<tab> to complete to git add submodule, but it auto-completes to git add submodule/ - note the trailing slash. If you execute the command with the trailing slash, then it blitzes the submodule and adds all its contained files instead. This is mitigated by using git-gui, git add . or just training yourself to delete the slash (it happened to me enough times that I trained myself to remove it)

  4. Submodules commits can mess up git rebase -i. I forget the exact details, but it is especially bad if you have a "dirty" submodule and you run a rebase-interactive. Normally with a dirty tree you can't rebase, but submodules are not checked. Having several submodule commits in a rebase group also causes problems. The last submodule hash gets committed to the first pick on your list, and this is pretty tricky to fix later. This can be worked around with a more careful workflow (i.e. carefully deciding when to do your submodule commits...) but can be a PITA.

The steps to set this up were something along the lines of:

  1. Run git svn clone https://project.svn.sourceforge.net/svnroot/project/project/trunk
  2. Push that as a "real" git project to e.g. github
  3. Now in your own git repository, run git submodule init
  4. git submodule add git://github.com/project subproject
  5. Push that out too, to your own repo this time.

That is it, more or less. You will have a new directory "subproject", which in your case would be the geomapping library.

Each time you need to update the geomapping code, you would run something like:

cd subproject
git svn rebase
git svn push  # this updates the git mirror of the subproject
cd ..
git add subproject # careful with the trailing slash!
git commit -m "update subproject"
git push # this pushes the commit that updates the subproject

I've not seen to many tutorials on a git submodule work flow, so I hope this helps you decide.

rq
Paul
Agreed. Once you figure out that they are really just pinned svn:externals, it is pretty easy. The gotchas until you grok submodules are enormous warts on the UI though.
rq
+1  A: 

From the little I've read about Externals, it appears to be a port of SVN 'externals' to GIT.

This solves some of problems with GIT submodules, including updating to the latest version automatically.

While I have no experience with SVN externals or with this project, it might be a better solution for some than any of the others posted.

Alternatively, the following software (looks like it can be used with GitHub). May be another way for some to skin the cat: Braid (Softpedia page)

SamGoody
Looks interesting, thx for the link
rq
A: 

It depends on what kind of project you're working on, and what tools, if any, need to interact with your SCM. Rails, for example, often uses Capistrano to deploy, and Capistrano makes certain assumptions about what your directory structure will look like relative to the root of your repository. In this case, if you have several interrelated rails apps, you need to use submodules. Each app gets its own repository, and then you have a larger repository that manages each of the independent repositories as submodules.

Even if you don't have tools that make these kind of assumptions, good repository design requires that you break things up a bit if there's even the slightest possibility that some day you might want to use a subsection of a larger project independently, or reuse some large swath of code within some independent project.

Extracting out some subsection of a repository as its own separate entity while maintaining version history is difficult in git, and so it's a good idea to plan ahead.

As for your specific question, honestly, I'd call that a perfect example of where a couple submodules would be ideal. With regard to sharing scripts, if for some reason that's actually an issue that proves problematic, you can always use a symlink.

Bob Aman
How would I use a symlink to offer a download of the complete program including submodules?The term "symlink" is new to me (a Win/Mint user) but from a quick Google I don't see what it would solve.And my environment is PHP, with a consideration of moving to Python (TurboGears).
SamGoody
Programming language is irrelevant in this case, but obviously, Windows doesn't have symlinks. Though in order to even run git, you need symlinks to begin with, and I believe that the port of git for Windows basically obtains that symlink support through msys.Essentially, if project A needs a script that's in project B, but they're separate submodules, you can create a symlink within project A to the script in project B, and check the symlink into git. Symlink should be relative.
Bob Aman