views:

185

answers:

2

I would like to know what is the difference between versioning approaches suggested by git (or other DVCSs) and subversion (or other CVCSs).

Here is what I found on http://www.xsteve.at/prg/vc_svn/svn.txt regarding this topic:

Subversion mananges versioned trees as first order objects (the repository is an array of trees), and the changesets are things that are derived (by comparing adjacent trees.) Systems like Arch or Bitkeeper are built the other way around: they're designed to manage changesets as first order objects (the repository is a bag of patches), and trees are derived by composing sets of patches together.

But it's not clear how subversion repository stores changes, whether it contain oldest variant of versioned file and so on. Why couldn't we generate a bunch of patches as in case of git, for example? It's always mentioned as a principal difference between svn and git which simplifies/complexifies merges, but, unfortunately, I still do not get the idea.

+4  A: 

Git is arranged with version trees as first-order objects in principle. That is, you deal with a graph of commit objects, each of which has a one-to-one relationship with a tree that is the state at that revision.

Note that how these are actually stored can be very different. Git started out simply compressing each file and tree/commit object individually. As I understand it, packing objects into a single file and storing just deltas for some objects was added much later.

So in fact, although patches seem to be ubiquitous in git user interfaces, they are in fact no relation to how the data is stored- the deltas that are stored in the pack files are binary-level deltas, not text-style diffs at all. Git will apply deltas to get objects and then diff them again to produce the patch on demand. This is in contrast to, for instance, CVS which inherited a latest-version-plus-reverse-deltas storage system from RCS.

Based on what you quoted, it appears that Git and SVN are actually more similar than either is to CVS, for example.

araqnid
+3  A: 

There's a nice explanation about the main differences between VCS based on changesets and on snapshots at Martin's blog. I'll not repeat it here.

However, I would stress one point that may not be obvious at first. Changeset based VCSs make it really easy to track merges, which is much more difficult for systems like Subversion, which is based on snapshots.

In a changeset based VCS, merges are simply changesets (or commits, as they're called in git) which have more than one parent changeset. The graphical representation of the repository usually shows a DAG (Directed acyclic graph) where the nodes represent changesets and the arrows represent parent-child relationships. When you see a node with more than one parent you know exactly what kind of merge occurred there.

In Subversion, "merge tracking" is something new. Up until version 1.4 there was no such concept, so that in order to know about the history of merges you had to make notes in the log messages of your commits. Version 1.5 implemented merge tracking to make it easier to perform repeated merges from one branch to another without forcing the user to be explicit about revision ranges and the like. This is implemented with a property (svn:mergeinfo) associated with the directory receiving the merge. It tracks which revisions have been already merged from which branches. This is enough to infer which revisions should be merged in subsequente merges. But it doesn't make it easy to draw graphs showing the merge history, which is something you would like to see frequently as you work in a complex project with several developers.

Gnustavo
Welcome to stackoverflow, Gnustavo.Thanks for your great answer
altern