Pros and cons of different branching models in DVCS

The Big Three of distributed version control (Git, Bazaar, and Mercurial) each treat branching fairly differently. In Bazaar, for example, branches are separate repos (actually, divergent copies of the parent repo); on your file system, different branches live in different directories. In Git, on the other hand, you can have multiple branches existing in the same repo (and therefore in the same directory on your file system). Mercurial supports both behaviors, the latter with named branches.

What are the pros and cons associated with these different branching models? In my mind, Bazaar's approach of one branch, one repo makes branching more of a pain than Git's approach (e.g. to use a branch in Bazaar, I have to first create the branch, then cd out of my current working copy, then check out the new branch, like I would in SVN).

I don't know much about branching models in VCS other than Git. I'd say that in any DVCS you can implement branching by cloning (you create a branch by doing a clone). Mercurial so-called "named branches" are (from what I understand it) in fact commit labels only interpreted as a branch, sometimes requiring local numbering of revisions to resolve ambiguity. Mercurial "bookmarks" ar, I think, quite similar to Git branches. The two DVCS that have very different concept of branching are Monotone and Darcs. I think that "branching by copying" that Subversion uses, where separation between project name and branch name is by convention, is a wrong idea.

In Git revisions form a directed acyclic graph (DAG) of commits. It is directed, because commits have parents. That is a very important issue: edges in DAG of commits are from commit to its parent (or, in the case of merge commit, two or more its parents). Graph of commits is acyclic, which means that there is no chain (no path) that begins and ends with the same object.

Git glossary defines "branch" as an active line of development. This idea is behind an implementation of branches in Git.

The most recent commit on a branch is referred to as the tip of that branch. The tip of the branch is referenced by a branch head, which is just a symbolic name for this commit. In its "loose" form such branch head (for example for branch named 'master') is just a file somewhere in refs/heads/ directory inside git repository (inside .git dir), which contains reference to current tip of a branch: its SHA-1 identifier of commit (as hexadecimal string).

When you create a new commit in Git, the tip of currently checked out branch moves forward. In other words the new commit is created on top of tip of current branch, and branch head advances to the new commit (somewhat similarly to how the pointer to top of stack might advance).

A single git repository can track an arbitrary number of branches, but your working tree (if you have any) is associated with just one of them (the "current" or "checked out" branch). Current branch is given by the HEAD pointer. HEAD is (usually) pointer to currently checked out branch (to a branch head name), just like branch heads are pointers to tips of branches.

For example if currently checked out branch is 'master', then .git/HEAD file (representing HEAD) would contain single LF terminated line with ref: refs/heads/master (a symbolic reference to refs/heads/master), and .git/refs/heads/master (head of 'master' branch) would contain for example LF terminated line 0b127cb8ab975e43398a2b449563ccb78c437255, whihc is SHA-1 identifier to tip of 'master' branch (that is if current branch is not "packed": then you have to take a look at .git/packed-refs).

Some commands in Git, such as "git commit" or "git reset" manipulate / change branch head; other such as "git checkout" manipulate / change HEAD (symbolic reference to current branch).

"git log branch" command shows all commit reachable from branch tip, which means tip of branch, its parent, parent (or parents) of that parent commit etc. It shows part of a DAG of commits.

In Git deleting a branch means simply removing a branch head. That might mean that some commits become "invisible", unreachable freom refs (branches and tags), which means that at some time those commits could get garbage collected and removed from repository. But if you can delete branch with "git branch -d <branchname>" then that means that no commits would be lost; you can force branch deletion with "git branch -D <branchname>". Renaming a branch is simply a matter of renaming branch head, a symbolic reference (symbolic name) of branch tip; branch names are not saved anywhere in the commit object.

Git has also concept of reflogs, which is a local history of where branch tip pointed (and when). For example if you amend a commit with "git commit --amend", branch tip would get replaced with amended commit, and HEAD^ would be parent of commit before and after amending, while there would be entry in reflog for version before amending and after amending. If you rewind history using "git reset", reflog would contain information of old branch tip before rewinding.

In short reflog gives additional safety and easy recovery to git commands.

ansaurus

tags:

views:

answers:

Pros and cons of different branching models in DVCS

related questions