views:

1135

answers:

6

I've heard that many of the distributed VCSs (git, mercurial, etc) are better at merging than traditional ones like Subversion. What does this mean? What sort of things do they do to make merging better? Could those things be done in a traditional VCS?

Bonus question: does SVN 1.5's merge-tracking level the playing field at all?

+2  A: 

Flippant answer: Why are some programming languages better at text/math than others?

Real answer: because they have to be. distributed VCSs do much of there merging at a point where the neither of the authors of the conflicting code can tweak the merge manually because the merge is being done by a third party. As a result, the merge tool has to get it right most of the time.

In contract with SVN you are doing something funky (and wrong?) if you ever end up merging something where you didn't write one side or the other.

IIRC most VCSs can shell out the merge to whatever you ask them to use, so there is (theoretically) nothing preventing SVN from using the GIT/mercurial merge engines. YMMV

BCS
Perhaps I should have asked *how* are they better, rather than *why*. Interesting point about outsourcing the merge.
JW
That's silly, all version control systems have this problem. With SVN it's always at checkin time. You've got some changes you'd like to check in and -surprise!- someone's already committed conflicting changes. You've got to commit both your changes, and the merge of their changes as one commit.
Peter Burns
@Peter Burns: With SVN you almost always have one of the authors doing the merge. With Git/Hg/etc the merge may be done by someone who didn't write either side. I'll edit to make this more clear.
BCS
I still disagree with your new post. The only time a merge is done in git et al. is when you're explicitly merging two divergent trees. True, SVN forces an author to merge before he commits, but if you use branches in SVN, then you have the same potential for a merge to be done by a third party.
Peter Burns
@Peter Burns: The branch cases is not the norm and, if I understand correctly how you are supposed to use it, a branch is "owned" by someone who is invested in the changes made in it and also will, more likely than not, be the person doing the merge back in. With git, these is no such expectation.
BCS
If I understand git correctly, any commit what so ever is a merging of two divergent trees. If not then I guess I fail to see the basic rational for git et al.
BCS
It appears that you've been mislead. Only some commits in git are merges. It be clarifying to note that in distributed VCSs there is a distinction between committing changes and publishing them.
Peter Burns
s/commit/accepting of publishes/ to my last. Nomenclature aside, I think my point still holds: distributed VCS end up merging 2 3rd party edits a lot more than traditional ones
BCS
Ok, I'm clearly belaboring my point, so this is the last comment I'll make. A third party merging two other people's code is often bad idea, and distributed VCSs don't need to do this. I've used git for a year but I've never needed to merge two other people's code together, nor have I seen this.
Peter Burns
+4  A: 

The merge tracking in 1.5 is better than no merge tracking, but it is still very much a manual process. I do like the way that it records which rev's are and aren't merged, but its no where near perfect.

Merge has a nice dialog in 1.5. You can pick which revisions you wish to merge individually, or the whole branch. You then trigger the merge which occurs locally (and takes FOREVER) when then gives you a bunch of files to read through. You need to check logically each file for the correct behaviour (preferably running through unit tests on the files) and if you have conflicts you have to resolve them. Once your happy you make a commit of your change and at that point the branch is considered merged.

If you do it piecemeal, SVN will remember what you have previously said that you have merged, allowing you to merge. I found the process and the result of some of the merges to be strange to say the least however...

Spence
+3  A: 

These version control systems can do better because they have more information.

SVN pre-1.5, along with most VCS's before the latest generation, doesn't actually remember that you merged two commits anywhere. It remembers that the two branches share a common ancestor way back when they first branched off, but it doesn't know about any more recent merges that could be used as common ground.

I know nothing of SVN post 1.5 though, so maybe they've improved on this.

Peter Burns
Check out http://revctrl.org/CategoryMergeAlgorithm for some (cursory) descriptions of merge algorithms. Some of these algorithms make complicated use of directed-acyclic-graph history that SVN simply does not keep. Don't forget merging /renamed directory trees/ as well.
joeforker
+4  A: 

I asked a similar question a while ago that you might find useful:

http://stackoverflow.com/questions/43995/why-is-branching-and-merging-easier-in-mercurial-than-in-subversion

Nick Pierpoint
+3  A: 

SVN's merging capabilities are decent, and the simple merging scenarios work fine - e.g. release branch and trunk, where trunk tracks the commits on the RB.

More complex scenarios get complicated fast. For example lets start with a stable branch (stable) and trunk.

You want to demo a new feature, and prefer to base it on stable as it's, well, more stable than trunk, but you want all your commits to be propagated to trunk as well, while the rest of the developers are still fixing things in stable and developing things on trunk.

So you create a demo branch, and the merging graph looks like:

  • stable -> demo -> trunk (you)
  • stable -> trunk (other developers)

But what happens when you merge changes from stable into demo, then merge demo to trunk, while all the time other developers are also merging stable into trunk? SVN gets confused with the merges from stable being merged twice into trunk.

There are ways around this, but with git/Bazaar/Mercurial this simply doesn't happen - they realize whether the commits have already been merged because they ID each commit across the merging paths it takes.

orip
+7  A: 

Most answers seems to be about Subversion, so here you have one about Git (and other DVCS).

In distributed version control system when you merge one branch into another, you create new merge commit, which remembers how you resolved a merge, and remembers all parents of a merge. This information was simply lacking in Subversion prior to version 1.5; you had to use additional tools such as SVK or svnmerge for this. This information is very important when doing repeated merge.

Thanks to this information distributed version control systems (DVCS) can automatically find common ancestor (or common ancestors), also known as merge base, for any two branches. Take a look at ASCII-art diagram of revisions below (I hope that it didn't got too horribly mangled),

---O---*---*----M---*---*---1
     \                /
       \---*---A/--*----2

If we want to merge branch '2' into branch '1', the common ancestor we would want to use to generate merge would be version (commit) marked 'A'. However, if version control system didn't record information about merge parents ('M' is previous merge of the same branches), it wouldn't be able to find that is commit 'A', and it would find commit 'O' as common ancestor (merge base) instead... which would repeat already included changes and result in large merge conflict.

Distributed version control system had to do it right, i.e. they had to make merge very easy (without needing to mark/tag merge parents, and supply merge information by hand) from the very beginning, because the way to get somebody else to get code into project was not to give him/her commit access, but to pull from his/her repository: get commits from the other repository and perform a merge.

You can find information about merging in Subversion 1.5. in Subversion 1.5 Release Notes. Issues of note: you need different (!) options to merge branch into trunk than merge trunk into branch, aka. not all branches are equal (in distributed version control systems they are [usually] technically equivalent).

Jakub Narębski