views:

754

answers:

8

I read at Joel on Software:

With distributed version control, the distributed part is actually not the most interesting part.

The interesting part is that these systems think in terms of changes, not in terms of versions.

and at HgInit:

When we have to merge, Subversion tries to look at both revisions—my modified code, and your modified code—and it tries to guess how to smash them together in one big unholy mess. It usually fails, producing pages and pages of “merge conflicts” that aren’t really conflicts, simply places where Subversion failed to figure out what we did.

By contrast, while we were working separately in Mercurial, Mercurial was busy keeping a series of changesets. And so, when we want to merge our code together, Mercurial actually has a whole lot more information: it knows what each of us changed and can reapply those changes, rather than just looking at the final product and trying to guess how to put it together.

By looking at the SVN's repository folder, I have the impression that Subversion is maintaining each revisions as changeset. And from what I know, Hg is using both changeset and snapshot while Git is purely using snapshot to store the data.

If my assumption is correct, then there must be other ways that make merging in DVCS easy. What are those?

* Update:

  • I am more interested in the technical perspective, but answers from non-technical perspective are acceptable
  • Corrections:
    1. Git's conceptual model is purely based on snapshots. The snapshots can be stored as diffs of other snapshots, it's just that the diffs are purely for storage optimization. – Rafał Dowgird's comment
  • From non-technical perspective:
    1. It's simply cultural: a DVCS wouldn't work at all if merging were hard, so DVCS developers invest a lot of time and effort into making merging easy. CVCS users OTOH are used to crappy merging, so there's no incentive for the developers to make it work. (Why make something good when your users pay you equally well for something crap?)
      ...
      To recap: the whole point of a DVCS is to have many decentralized repositories and constantly merge changes back and forth. Without good merging, a DVCS simply is useless. A CVCS however, can still survive with crappy merging, especially if the vendor can condition its users to avoid branching. – Jörg W Mittag's answer
  • From technical perspective:
    1. recording a real DAG of the history does help! I think the main difference is that CVCS didn't always record a merge as a changeset with several parents, losing some information. – tonfa's comment
    2. because of merge tracking, and the more fundamental fact that each revisions knows its parents. ... When each revision (each commit), including merge commits, know its parents (for merge commits that means having/remembering more than one parent, i.e. merge tracking), you can reconstruct diagram (DAG = Direct Acyclic Graph) of revision history. If you know graph of revisions, you can find common ancestor of the commits you want to merge. And when your DVCS knows itself how to find common ancestor, you don't need to provide it as an argument, as for example in CVS.
      .
      Note that there might be more than one common ancestor of two (or more) commits. Git makes use of so called "recursive" merge strategy, which merges merge bases (common ancestor), till you are left with one virtual / effective common ancestor (in some simplification), and can the do simple 3-way merge. – Jakub Narębski's answer
+19  A: 

There's nothing in particular in DVCSs that makes merging easier. It's simply cultural: a DVCS wouldn't work at all if merging were hard, so DVCS developers invest a lot of time and effort into making merging easy. CVCS users OTOH are used to crappy merging, so there's no incentive for the developers to make it work. (Why make something good when your users pay you equally well for something crap?)

Linus Torvalds said in one of his Git talks that when he was using CVS at Transmeta, they set aside an entire week during a development cycle for merging. And everybody just accepted this as the normal state of affairs. Nowadays, during a merge window, Linus does hundreds of merges within just a few hours.

CVCSs could have just as good merging capabilities as DVCSs, if CVCS users simply went to their vendors and said that this crap is unacceptable. But they are caught in the Blub paradox: they simply don't know that it is unacceptable, because they have never seen a working merge system. They don't know that there is something better out there.

And when they do try out a DVCS, they magically attribute all the goodness to the "D" part.

Theoretically, due to the centralized nature, a CVCS should have better merge capabilities, because they have a global view of the entire history, unlike DVCS were every repository only has a tiny fragment.

To recap: the whole point of a DVCS is to have many decentralized repositories and constantly merge changes back and forth. Without good merging, a DVCS simply is useless. A CVCS however, can still survive with crappy merging, especially if the vendor can condition its users to avoid branching.

So, just like with everything else in software engineering, it's a matter of effort.

Jörg W Mittag
recording a real DAG of the history does help! I think the main difference is that CVCS didn't always record a merge as a changeset with several parents, losing some information.
tonfa
@tonfa: You're right, of course. But again, that's not really a limitation of CVCSs, just laziness on part of the developers. There's no reason that a CVCS couldn't record the full DAG including merges. The fact that it took Subversion ten years to record merges, especially since there have been third-party tools available for at least five years, speaks volumes. I mean, they did it without any changes to the data format! In other words: everything they needed was already there ten years ago.
Jörg W Mittag
I think round trips to the server also contribute. Since Hg has a complete local history, the info is right there vs Subversion.
msemack
@msemack: You could perform the merge on the server by adding a new command to the SVN network protocol: "merge A into B". Alternatively, you could cache the entire history on the client. Subversion *already* caches *one* revision, there is no reason why it couldn't cache *all* revisions. (And while they're at it: clean up the cache, because currently Subversion needs more diskspace caching *one* revision than Git and Mercurial need for caching 1000. A Mercurial checkout of Subversion (~20000 revs) is only slightly larger than a Subversion checkout of Subversion (1 rev)).
Jörg W Mittag
+10  A: 

In Git and other DVCS merges are easy not because of some mystical series of changesets view (unless you are using Darcs, with its theory of patches, or some Darcs-inspired DVCS; they are minority, though) that Joel rambles about, but because of merge tracking, and the more fundamental fact that each revisions knows its parents. For that you need (I think) whole-tree / full-repository commits... which unfortunately limits ability to do partial checkouts, and making a commit about only subset of files.

When each revision (each commit), including merge commits, know its parents (for merge commits that means having/remembering more than one parent, i.e. merge tracking), you can reconstruct diagram (DAG = Direct Acyclic Graph) of revision history. If you know graph of revisions, you can find common ancestor of the commits you want to merge. And when your DVCS knows itself how to find common ancestor, you don't need to provide it as an argument, as for example in CVS.

Note that there might be more than one common ancestor of two (or more) commits. Git makes use of so called "recursive" merge strategy, which merges merge bases (common ancestor), till you are left with one virtual / effective common ancestor (in some simplification), and can the do simple 3-way merge.

Git use of rename detection was created to be able to deal with merges involving file renames. (This supports Jörg W Mittag argument that DVCS have better merge support because they had to have it, as merges are much more common than in CVCS with its merge hidden in 'update' command, in update-then-commit workflow, c.f. Understanding Version Control (WIP) by Eric S. Raymond).

Jakub Narębski
A: 

I think the DAG of changesets, as mentioned by others, makes a big difference. DVCS:es require split history (and merges) at a fundamental level, whereas I suppose CVCS:es (which are older) where built from day 1 to track revisions and files first, with merge support being added as an afterthought.

So:

  • Merging is easy to do and track in when tags/branches are tracked separately from the directory tree of sources, so the entire repo can be merged in one go.
  • Since DVCS:es have local repos, these are easy to create, so it's turns out it's easy to keep different modules in different repos instead of tracking them all inside a big repo. (so repo-wide merges don't cause the same disruptions as they would be in svn/cvs where one repo often contains many unrelated modules which need to have separate merge histories.)
  • CVS/SVN allows different files in the working directory to come from different revisions, while DVCS:es usually have one revision for the entire WC, always (i.e. even if a file is be reverted to an earlier version, it will show as modified in status as it is different from the file in the checked out revision. SVN/CVS does not show this always.)

Mixing these concepts (as Subversion does) is, I belive, a big mistake. For instance, has branches/tags inside the source tree, so there you have to track which revisions of files have been merged to other files. This is clearly more complex than just tracking which revisions have been merged.

So, summarizing:

  • DVCS:es need easy merges, have have their feature set based on that. Design decision where made so that these merges are easy to do and track (via DAG), and other features (branches/tags/submodules) are implemented to suit that, not the other way around.
  • CVCS:es had some features from the start (such as modules) that made some things easy, but make repo-wide merges very tricky to implement.

At least this is what I feel from my experience with cvs, svn, git and hg. (There probably are other CVCS:es which has got this thing right too.)

Marcus Lindblom
Well.. Subversion tracks directories as first class objects in version history. A great decision at its time (as it made it possible to track copies and deletes easily; which were very hard before), but not one that makes merging and rename handling easier. Mixed revision working copies were common in the pre-svn file based version control world, but are another thing that make merging harder. The new DVCSes made other choices, learning from the past; and this resolves some scenarios, but at the same time introduces other issues. (But non of this is DVCS or CVCS specific; just implementation)
Bert Huijben
+5  A: 

Part of the reason is of course the technical argument that DVCSes store more information than SVN does (DAG, copies), and also have a simpler internal model, which is why it is able to perform more accurate merges, as mentioned in the other responses.

However probably an even more important difference is that because you have a local repository, you can make frequent, small commits, and also frequently pull and merge incoming changes. This is caused more by the ‘human factor’, the differences in the way a human works with a centralised VCS versus a DVCS.

With SVN, if you update and there are conflicts, SVN will merge what it can and insert markers in your code where it can’t. Big big problem with this is that your code will now no longer be in a workable state until you resolve all the conflicts.

This distracts you from the work you are trying to achieve, so typically SVN users do not merge while they are working on a task. Combine this with the fact that SVN users also tend to let changes accumulate in a single large commit for the fear of breaking other people’s working copies, and there will be large periods of time between the branch and the merge.

With Mercurial, you can merge with incoming changes much more frequently inbetween your smaller incremental commits. This will by definition result in less merge conflicts, because you will be working on a more up-to-date codebase.

And if there turns out to be a conflict, you can decide to postpone the merge and do it at your own leisure. This in particular makes the merging so much less annoying.

Laurens Holst
Note that above I am mainly talking about anonymous branches (SVN working copies, and thus the merging performed by `svn update`), but this applies to named branches (SVN branches) as well.
Laurens Holst
+4  A: 

Whoa, attack of the 5-paragraph essays!

In short, nothing makes it easy. It is hard, and my experience indicates that errors do occur. But:

  • DVCS forces you to deal with merging, which means taking a few minutes to familiarize yourself with the tools that exist to help you out. That alone helps.

  • DVCS encourages you to merge frequently, which helps too.

The snippet of hginit that you quoted, claiming that Subversion is unable to do three-way merges and that Mercurial merges by looking at all the changesets in both branches, is simply wrong on both counts.

Jason Orendorff
In short, I think there is both a technical component, better merging algorithms, as well as a workflow component, DVCSes better support frequent merges.
Laurens Holst
DVCSes also have a simpler view of a working copy. In Subversion you can have mixed revision working copies and partial working copies, which are certainly not components that make merging easier.
Bert Huijben
@Laurens What better merging algorithms do you mean? Unless I am wildly mistaken, Mercurial doesn't do anything any fancier than `svn up`, in terms of merging all the changes it can. And I think both programs handle conflicts by delegating the problem to some external merge program (or, failing that, putting conflict markers in the files).
Jason Orendorff
Maybe I’m mistaken, but even though SVN tracks copies, I’ve never actually seen a merge against copies going right.
Laurens Holst
+1  A: 

One thing I find easier with DVCS is that each developer can merge their own changes into which ever repository that they desire. It's much easier to handle merge conflicts when you're merging your own code. I've worked in places where some poor soul had fixed merge conflicts by finding each developer involved.

Also with a DVCS you can do things like clone a repository, merge work from two developers into the clone, test the changes, then merge from the clone back into the main repository.

Pretty cool stuff.

BeWarned
A: 

May be DVCS users just never do things that make merging hard like refactorings that change and rename/copy most files in the project, or redesigning from stratch APIs that are used in the hundrends of files.

Ha
Do you use DVCS?
afriza
Ya, that must be it. -1.
Laurens Holst
+1  A: 

One point is that svn merging is subtly broken; see http://blogs.open.collab.net/svn/2008/07/subversion-merg.html I suspect this is in conjunction with svn recording mergeinfo even on cherry-picking merges. Add a few plain bugs in handling border cases, and svn as the current poster child of CVCS makes them look bad as opposed to all the DVCS which just got it right.

Andreas Krey