views:

126

answers:

4

Ok I hope that this will end up sounding like a reasonable question.

From what I understand of subversion if you have a repo that contains multiple projects, then you can branch individual projects within that repo (see SVN Red book - Using Branches)

However what I don't quite follow is what happens when you create a branch in one of the distributed systems (Git, Hg, Bazaar - I don't think it matters which one). Can you branch just a sub-directory of the repo, or when you create the branch are you branching the entire repo?

This question is part of a larger one that I posted on superuser (choice and setup of version control) and has come about as I am trying to figure out how to best version control a large hierarchal layout of independent projects.

It may be that for distributed systems that what I would like to do is best handled by a sub-project mechanism of some sort - but again that is something I am not clear on although I have heard the term mentioned in regards to git.

+1  A: 

With bazaar, if you create two branches in a shared repository, any common history they have lives within the repository and not the branch itself - the branch merely references it. This saves disk space for repositories that have many branches of the same projects for different features as well as speeds up the creation of new branches (you're not having to duplicate the files containing branch history). It's been a while since I looked at hg and git, but I do not believe they have a feature identical to this.

Bazaar does not have sub-projects. A branch is a whole, contiguous unit. You cannot branch portions of it. I believe git and hg both have sub-branches, though.

sechastain
How exactly is what you describe different from hg or git?
Jefromi
@sechastain: I'm not sure whether you're referring to having two branches in the same repository or in different repositories. In the case of the same repository, Git will only hold one copy of each object -- it'll never duplicate data when you create a new branch. For multiple repositories, the default is to copy everything but it's also possible to have the new repository reference the old one for its objects. I believe this is the case for Hg too. Also, git doesn't support sub-branches as described here; you can use sub-modules though (which are themselves separate git repositories).
Andrew Aylett
@Andrew this is where my git knowledge becomes suspect as I'm not avid user. But as I understand it, a directory that contains a branch in git can actually contain many branches, only one of which is active (i.e. is reflected in the content of the files of that directory's tree). In this manner, yes, the revision history for the branches in the directory is shared in git. In bazaar, a directory that contains a branch contains one and only one branch. If that branch directory is sitting inside a repository directory, then history between other branches in the same repo directory is shared.
sechastain
@sechastain: Ah. A branch is an attribute of a repository in git, not a directory. It's weird to hear the phrase "directory that contains a branch" in context of git.
Jefromi
A: 

In general, distributed version control systems will only support you to create a new branch out of the whole of an existing branch, rather than (as Subversion does) allowing you to make a copy of a small part of what you're working on. Git at least (and I think some of the others) allows you to reference sub-modules (which are git repositories in their own right).

Git does allow you to do pretty much anything you want, even if it's not particularly useful or obvious (and even if the tools won't really support you in doing it). There's no technical reason why all the branches in a Git repository need to have a common parent or have anything to do with each other at all. There's also nothing stopping you constructing a commit consisting of a sub-tree of its parent commit and Git's change tracking and merging will actually probably cope quite well in this case.

Mercurial at least differs from Git in this regard, as the Mercurial workflow seems tailored to trying to keep separate branches in separate repositories while the git workflow is quite happy with having many branches in the same repository.

Andrew Aylett
Andrew, you said "create a new branch out of the whole of an existing branch". I think the crux of my question is about what an "existing branch" is. Is it seen as the whole repo so that you conceptually have entire parallel repos that you flick between when you change branches?
Peter M
@Peter: Yes. In DVCS systems, a repo is conceptually a unit of project.
Santa
@Peter, @Santa: However, a git repository may well contain several different branches, but only one may be checked out as the current working copy. If you want to check out two branches at once, clone the repo to a different directory and check out the right branch there.
Andrew Aylett
A: 

With git, a branch is simply a pointer to the commit at the tip of the branch. It doesn't contain any information of its own. So, your history might look like this:

- o - o - o - o - o (branchA)
           \
            o - o (branchB)

Each o there is a commit, which represents the state of the entire repository at that point. The two branches thus in general represent different states of the entire repo, though it could be that they only differ in the contents of one subdirectory. There certainly won't be any wasted space, though; if two commits use the same version of a given file, they internally point to the same object for its contents.

Depending on what you're actually trying to do, you could be interested in using submodules, which are essentially a mechanism for placing repos inside of repos, so that you can have a meta-project repository which contains sub-project (embedded) repositories.

Jefromi
Jefromi, I understand the space considerations of branches, but I think you are clarifying my actual question - that a branch is a state of an entire repo. I'm going to have to look more into submodules
Peter M
@Peter M: There are a ton of questions here about submodules; hopefully you should be able to find what you need.
Jefromi
@Peter: Technically speaking, a "commit" is a state of an entire repo. A branch is a series of commits that diverges from another branch, having a common parent at one point in the repository's commit history.
Santa
@Santa: Technically, a branch is merely a pointer to a commit, as I said in my answer. It may or may not have diverged from another branch.
Jefromi
+1  A: 

Subversion being centralized, you can organize your projects within one repo as you want. Since branchbes are emulated as directory with SVN, you end up mixing:

  • history isolation (which is the main purpose of a branch: you isolate the versions of a set of files from other versions from the same set of files)
  • "component" isolation (a component or module being a group of files each in their own directory)

But with a DVCS, each repository is its own component (or module).
I.e. you don't want to put all your projects within one repo.
Rather you are using submodules (Git) or subrepos (Hg).

That leaves you with the branch as a pure history isolation:
Whe you branch, the history of the all repo creates a new branch ready to record (reference) any new commit you will make.
The is no "cheap copy", just a new pointer made.
Note: Mercurial has a more complex branching model which can involve cloning a repo to create a new branch, but the general principle behind branching stands.

VonC
VonC - I am now beginning to understand that the DVCS is pitched at the module level. But this has implications that I am not happy with as per my question on SU. I effectively have 100-200 separate modules and I can see issues with maintaining that many DVCS repos unless I go down a very structured path in order to ensure a consistent level of maintenance quality.
Peter M
@Peter: There is no actual issue (beside initializing those 200 repos). You can develop your submodules directly within *one* main project: http://stackoverflow.com/questions/1979167/git-submodule-update/1979194#1979194. Note: each branch you will create within each of those submodules will be valid only within that submodule.
VonC
VonC - it is the mechanism of initializing those repos that concerns me and I don't have any DVCS experience to know if my fears are unfounded. It seems like I have to repeat a manual task N times over or spend effort describing and building the ultimate automated system that will build N sub-projects as desired
Peter M
@Peter As illustrated in https://git.wiki.kernel.org/index.php/GitSubmoduleTutorial, the process is painlepss and very quick. 200 `git init`, and in the main project, 200 `git submodule add`. A simple loop is all you need.
VonC
@VonC - making 200 repos is not hard. What is hard is making 200 repos in a arbitrary hierarchical structure. See my SU question for what I am attempting to implement http://superuser.com/questions/144503/choice-and-setup-of-version-control And note that what I have described there is an idealised structure. In the real world it is not so uniform.
Peter M
@Peter: What makes you think it is hard to make those 200 repos in an arbitrary hierarchical structure? In fact, it is in a CVCS where you'd have to think about this when laying out your directory structure inside one giant master repo (which is ugly, IMO). Each project, being its own repo, as in a DVCS, is more suited to the "chaos".
Santa
@Peter: You may find Google's 'repo' tool worth taking a look at: they developed it to help with this situation. Unfortunately, most of the documentation is android-specific: http://source.android.com/source/git-repo.html
Andrew Aylett