views:

2169

answers:

2

A GIT project has within it a second project whose content is being worked on independently.

Submodules cannot be used for the smaller, as even the subproject must be included when users attempt to clone or download the 'parent'.

Subtree-merging cannot be used, as the subproject is being actively developed, and subtree merging makes it very difficult to merge those updates back into the original project.

I have been informed that the solution is known in the SVN world as "Vendor Branches", and that it is so simply done in GIT so as to not even need addressing. Half-baked tutorials abound on the 'net.

Nonetheless, I cannot seem to get it to work.

Can someone please (pretty please?) explain how I can create a structure whereby one project exists within another, and both can be developed and updated from the same working directory. Ideally [or rather: it is quite important, if unsupported] that when a client attempts to download the 'parent' project, that he should be given the latest version of the subproject automatically.

Please do NOT explain to me how I should use submodules or subtree-merges or even SVN:Externals. This thread is the outgrowth of the following SO thread, and if something was missed there, please DO post it there. This thread is trying to get an understanding of how to Vendor branches, and the longer, clearer, and more dummied an explanation I receive the happier I will be.


The following rant is off-topic, but perhaps those knowledgeable can correct my (mis)conceptions: I find GIT:

  1. Very very very poorly documented. Especially if you (gasp) don't come from Ruby.
  2. Without good software (Win - gitGUI hangs, crashes, et al to the point of unusability. GitBash is better, but hardly stable. Tested on two computers)
  3. lacking any concept of subprojects. Submodules are woefully out of touch with reality [despite SVN having 'externals' to copy], other solutions are no better, and there isn't even a way to update or checkout one part of the codebase without touching the rest.
+4  A: 

I think submodules are the way to go when it comes to "vendor branch".
Here is how you should use submod... hmmm, just kidding.

Just a thought; you want:

  • to develop both main project and sub-project within the same directory (which is called a "system approach": you develop, tag and merge the all system)
  • or to view tour sub-project as a "vendor branch" (which is a branch which allows you to access a well-defined version of a vendor external component - or "set of files" - , and which is only updated with the new version every release of that external component: that is called a "component-approach", the all system is viewed as a collection of separate components developed on their own)

The two approaches are not compatible:

  • The first strategy is compatible with a subtree-merge: you are working both on project and sub-project.
  • The second one is used with submodules, but submodules is used to define a configuration (list of tag you need to work): each git submodules, unlike svn:externals, are pinned to a particular commit id, and that is what allows you to define a configuration (as in S*C*M: "software configuration management")

I like the second approach because most of the time, when you have a project and a sub-project, their lifecycle is different (they are not developed at the same rhythm, not tagged together at the same time, nor with the same name).

What really prevents that approach ("component-based") in your question is the "both can be developed and updated from the same working directory" part.
I would really urge you to reconsider that requirement, as most IDE are perfectly capable to deals with multiple "sources" directories, and the sub-project development can be done in its own dedicated environment.


samgoody adds:

Imagine an eMap plugin for both Joomla and ModX. Both the plugin and the Joomla-specific code (which is part of Joomla, not of eMap) are developed while the plugin is inside Joomla. All paths are relative, the structure is rigid, and they must be distributed together - even though each project has its own lifecycle.

If I understand correctly, you are in a configuration where the development environment (the set of files you are working on) is quite the same than the distribution environment (the same set of file is copied on the release platform)

It all comes done to a granularity issue:

  • if both sets of files cannot exist one without the other, then they should be viewed as one big project (and subtree-merged), but that force them to be tagged and merged as one. -if one depends on the other (which can be developed alone), then they should be in their own Git repository and project, the first one depending on a specific commit of the second as a sub-module: if the sub-module is defined in the right subtree of the first component, all relative paths are respected.


samgoody adds:

The original thread listed issues with submodules - primarily that GitHub's download doesn't include them (vital to me) and that they get stuck on a particular commit.

I am not sure GitHub's download is an issue recently: that "Guides: Developing with Submodules" article does mention:

Best of all: people cloning your my-awesome-framework fork will have no problem pulling down your my-fantastic-plugin submodule, as you’ve registered the public clone URL for the submodule. The commands

$ gh submodule init
$ gh submodule update

Will pull the submodules into the current repository.

As for the "they get stuck on a particular commit": that is the all point of a submodule, allowing you to work with a configuration (list of tagged version of components) instead of a latest potentially unstable set of files.

samgoody mentions:

I need to avoid both subtrees and submodules (see question), and would rather address this need without arguing too much if the approach is justified

Your requirement is a perfectly legitimate one, and I do not want to judge its justification: my previous answers are only here to provide a larger context and try to illustrate the options usually available with a generic SCM tool.

Subtree merge should be the answer here, but would imply to merge back only commits made for files for the main project, and not commits made for the sub-projects. If you can manage that kind of partial merge, I would reckon it is the right path to follow.

I do not see however a native Git way to do what you want that does not use subtree-merge or submodule.
I hope a true Git guru will post here a more adequate answer.

VonC
Imagine an eMap plugin for both Joomla and ModX. Both the plugin and the Joomla-specific code (which is part of Joomla, not of eMap) are developed while the plugin is inside Joomla. All paths are relative, the structure is rigid, and they must be distributed together - even though each project has its own lifecycle. I'm not up-to-date with IDEs; our office switched last year from Eclipse to Notepad++ for HTM/PHP/JS, and from Flex to FD for AS [Productivity has gaind]. Am I missing something big?
SamGoody
Your answer:If the projects are interdependant, subtree-merge. Otherwise use submodules.I need to avoid both subtrees and submodules (see question), and would rather address this need without arguing too much if the approach is justified.In my case the larger project must include the subproject but not vice versa. The original thread listed issues with submodules - primarily that GitHub's download doesn't include them (vital to me) and that they get stuck on a particular commit.Any ideas?
SamGoody
+3  A: 

Hi samgoody, I finally have a few hours access to the net before I head back to the mountains. We'll see if I have anything to contribute clarity into your situation.

My (probably oversimplified) understanding is you have (offsite) vendor(s) developing plug-in(s) for your project where your (in-house) team is developing code for your main project using an externally sourced framework. Vendor doesn't make changes to your code and probably doesn't need your bleeding edge development, but does need your stable code to develop and test their work. Your team doesn't make changes to the framework, but does sometimes contribute changes to the plug-in.

  1. Like VonC (who usually thinks things thru very thoroughly) I don't think Git has a perfect fit for your requirements. And like him, I think using subtree merge pattern is the closest fit. I'm not a Git guru, but I have been successful at bending Git to a wide range of needs. Maybe Git doesn't meet your needs:

    • SVN will let you have multiple repos within one, which seems important for you. I think this would mean either using externals or the Vendor Branch pattern to come close to what you want.

    • Mercurial has an extension, Forest, for using nested repos, which seems to fit your mental model better. I chose Git over Mercurial 15 months ago, but HG was stable and for many uses I think it is comparable to Git. I don't know how stable the extension is.

  2. If I were in your situation, I'd use two Git repos -- one for the Plugin and one for the MainProject. The vendor would do development in the Plugin repo and would have a release branch that they pull current versions of the plug-in into without the rest of the development environment. That branch would be pulled into the MainProject repo as a vendor branch, and then merged into your main development branch. When your team works on a change to the plug-in, they develop it in a feature branch off of your main development branch and submit it to the vendor repo as patches. This gives you a very clean workflow, relatively easy to set-up and learn, while keeping the develop history segregated.

    I'm not trying to be argumentative, but simply to say this is Git's best fit for my understanding of your situation. The easiest way to set this up would use the subtree merge, but this does not run changes thru it in both directions, which was my objection to using that pattern.

  3. If your team is really actively involved in the plugin development or you really want to have the development history of both project and plug-in integrated in one Git repo, then just use one Git repo. You can extract the plug-in and its history for the records of your vendor as explained here, from time to time. This may give you less encapsulation than you intend, but Git is not designed for encapsulation -- Git's data structure is based on tracking changes within one whole project.

Maybe I've misunderstood your situation and none of this applies. If so I apologize. Thanks for the details that you and VonC have worked out, which have filled in many holes that I originally had in trying to understand your question.

Paul
On a personal note - ..You say "I'm not a Git guru" - I notice that your answer is the accepted answer on virtually every Git related Q' on the site. I do very much appreciate your help. .."VonC (who usually thinks things thru very thoroughly)" - I should think so. How the heck does someone get 25,000+ points in a site with contributions from world experts?! I'm in awe! I appreciate his help as well. You guys rock. Now, if just I can get this to work... :)
SamGoody
"..I'd use two Git repos" - I'm trying this, but (me the newb) am having trouble understanding. From this post I gather that to update the vendor project, I should subtree-merge their latest commits to the parent, create a branch of the parent called "vendor", and make my changes. They than pull in those changes from my "vendor" branch into their project. I didn't realize this was doable, and don't understand the point in making it a branch. Also didn't the other post avoid a subtree-merge?
SamGoody
Please don't get upset, but do tell me if there is a someplace to see the complete line by line workflow of this in action. (a theoretical example) [1. clone git://project.git 2. merge branch origin 3. commit origin master etc.]Thank you very much.
SamGoody