views:

3363

answers:

16

What are the benefits and drawbacks with using Centralized versus Distributed Version Control Systems (DVCS)? Have you run into any problems in DVCS and how did you safeguard against these problems? Keep the discussion tool agnostic and flaming to minimum.

For those wondering what DVCS tools are available, here is a list of the best known free/open source DVCSs:

+2  A: 

The main problem (aside from the obvious bandwidth issue) is ownership.

That is to be sure to different (geographic) site are not working on the same element than the other.

Ideally, the tool is able to assign ownership to a file, a branch or even a repository.

To answer the comments of this answer, you really want the tool to tell you who owns what, and then communicate (through phone, IM or mail) with the distant site.
If you have not ownership mechanism... you will "communicate", but often too late ;) (i.e.: after having done concurrent development on an identical set of files in the same branch. The commit can get messy)

VonC
This can be remedied via a technique known as "communication" :)
bk1e
Well... When 'communication' involves talking to some Indians in Bangalore, with 3/2 hours of time shift, and an impossible English accent, communication can be pretty hard ;) (no disrespect to my Indian colleagues, they rock ;) )
VonC
Indians do know how to write english, right? Try communicating with IM or email. :) But yeah, the time shift is a pain.
Spoike
First of all, Indians are great to work with ;) For the rest, I just updated my answer
VonC
+2  A: 

For me this is another discussion about a personal taste and it's rather difficult to be really objective. I personally prefer Mercurial over the other DVCS. I like to write hooks in the same language as Mercurial is written in and the smaller network overhead - just to say some of my own reasons.

unexist
+1  A: 

I have a feeling that Mercurial (and other DVCS) are more sophisticated than the centralised ones. For instance, merging a branch in Mercurial keeps the complete history of the branch whereas in SVN you have to go to the branch directory to see the history.

Roman Plášil
+17  A: 

From my answer to a different question:

Distributed version control systems (DVCSs) solve different problems than Centralized VCSs. Comparing them is like comparing hammers and screwdrivers.

Centralized VCS systems are designed with the intent that there is One True Source that is Blessed, and therefore Good. All developers work (checkout) from that source, and then add (commit) their changes, which then become similarly Blessed. The only real difference between CVS, Subversion, ClearCase, Perforce, VisualSourceSafe and all the other CVCSes is in the workflow, performance, and integration that each product offers.

Distributed VCS systems are designed with the intent that one repository is as good as any other, and that merges from one repository to another are just another form of communication. Any semantic value as to which repository should be trusted is imposed from the outside by process, not by the software itself.

The real choice between using one type or the other is organizational -- if your project or organization wants centralized control, then a DVCS is a non-starter. If your developers are expected to work all over the country/world, without secure broadband connections to a central repository, then DVCS is probably your salvation. If you need both, you're fsck'd.

Craig Trader
You can have a DVCS locally to manage your personal branches and stuff and later convert (export) them to the standard centralized repos. Many people is finding that a very comfortable working model, where you are forced to use centralized.
Vinko Vrsalovic
Distribution version control systems are a clean superset of centralized ones. If you want the centralized control you get with a centralized version control system you can have that with a distributed one. In fact, I would argue that it's even easier to exercise because of the way most distributed version control systems allow you to work with changesets as a discreet entity that you can add or remove at will.
Omnifarious
@Omnifarious, go back and read what I wrote, because I said that, though not in so many words. Again, it comes down to the organizational needs and biases. Companies that have been happy with ClearCase or Perforce for many years aren't going to bother looking at other solutions, much less tools that are outside of their comfort zone. (Fought those wars many times, with the scars to prove it.)
Craig Trader
A: 

W. Craig Trader's answer sums up most of it, however, I find that personal work style makes a huge difference as well. Where I currently work we use subversion as our One True Source, however, many developers use git-svn on their personal machines to compensate for workflow issue we have (failure of management, but that's another story). In any case. its really about balancing what feature sets make you most productive with what the organization needs (centralized authentication, for example).

+9  A: 

During my search for the right SCM, I found the following links to be of great help:

  1. Better SCM Initiative : Comparison. Comparison of about 26 version control systems.
  2. Comparison of revision control software. Wikipedia article comparing about 38 version control systems covering topics like technical differences, features, user interfaces, and more.
  3. Distributed version control systems. Another comparison, but focussed mainly on distributed systems.
Nobby
Note that information about Git in "Better SCM Initiative : Comparison" isn't entirely correct. Also set of features seems to me geared towards centralized VCS and CVS-like systems.
Jakub Narębski
+13  A: 

To those who think distributed systems don't allow authoritative copies please note that there are plenty of places where distributed systems have authoritative copies, the perfect example is probably Linus' kernel tree. Sure lots of people have their own trees but almost all of them flow toward Linus' tree.

That said I use to think that distributed SCM's were only useful for lots of developers doing different things but recently have decided that anything a centralized repository can do a distributed one can do better.

For example, say you are a solo developer working on your own personal project. A centralized repository might be an obvious choice but consider this scenario. You are away from network access (on a plane, at a park, etc) and want to work on your project. You have your local copy so you can do work fine but you really want to commit because you have finished one feature and want to move on to another, or you found a bug to fix or whatever. The point is that with a centralized repo you end up either mashing all the changes together and commiting them in a non-logical changeset or you manually split them out later.

With a distributed repo you go on business as usual, commit, move on, when you have net access again you push to your "one true repo" and nothing changed.

Not to mention the other nice thing about distributed repos: full history available always. You need to look at the revision logs when away from the net? You need to annotate the source to see how a bug was introduced? All possible with distributed repos.

Please please don't believe that distributed vs centralized is about ownership or authoritative copies or anything like that. The reality is distributed is the next step in evolution of SCM's.

manicmethod
+7  A: 

W. Craig Trader said this about DVCS and CVCS:

If you need both, you're fsck'd.

I wouldn't say you're fsck'd when using both. Practically developers who use DVCS tools usually try to merge their changes (or send pull requests) against a central location (usually to a release branch in a release repository). There is some irony with developers who use DVCS but in the end stick with a centralized workflow, you can start to wonder if the Distributed approach really is better than Centralized.

There are some advantages with DVCS over a CVCS:

  • The notion of uniquely recognizable commits makes sending patches between peers painless. I.e. you make the patch as a commit, and share it with others developers who need it. Later when everyone wants to merge together, that particular commit is recognized and can be compared between branches, having less chance of merge conflict. Developers tend to send patches to each other by USB stick or e-mail regardless of versioning tool you use. Unfortunately in the CVCS case, version control will register the commits as seperate, failing to recognize that the changes are the same, leading to a higher chance of merge conflict.

  • You can have local experimental branches (cloned repositories can also be considered a branch) that you don't need to show to others. That means, breaking changes don't need to affect developers if you haven't pushed anything upstream. In a CVCS, when you still have a breaking change, you may have to work offline until you've fixed it and commit the changes by then. This approach effectively defeats the purpose of using versioning as a safety net but it is a necessary evil in CVCS.

  • In today's world, companies usually work with off-shore developers (or if even better they want to work from home). Having a DVCS helps these kind of projects out because it eliminates the need of a reliable network connection since everyone has their own repo.

…and some disadvantages that usually have workarounds:

  • Who has the latest revision? In a CVCS, the trunk usually has the latest revision, but in a DVCS it may not be plainly obvious. The workaround is using rules of conduct, that the developers in a project have to come to an agreement in which repo to merge their work against.

  • Pessimistic locks, i.e. a file is locked when making a check-out, are usually not possible because of concurrency that may happen between repositories in DVCS. The reason file locking exists in version control is because developers want to avoid merge conflicts. However, locking has the disadvantage of slowing development down as two developers can't work on same piece of code simultaneously as with a long transaction model and it isn't full proof warranty against merge conflicts. The only sane ways regardless of version control is to combat big merge conflicts is to have good code architecture (like low coupling high cohesion) and divide up your work tasks so that they have low impact on the code (which is easier said than done).

  • In proprietary projects it would be disastrous if the whole repository becomes publically available. Even more so if a disgruntled or malicious programmer gets hold of a cloned repository. Source code leakage is a severe pain for proprietary businesses. DVCS's makes this plain simple as you only need to clone the repository, while some CM systems (such as ClearCase) tries to restrict that access. However in my opinion, if you have an enough amount of dysfunctionality in your company culture then no version control in the world will help you against source code leakage.

Spoike
A: 

A centralised system doesn't necessarily prevent you from using separate branches to do development on. There doesn't need to be a single true copy of the code base, rather different developers or teams can have different branches, legacy branches could exist etc.

What it does generally mean is that the repository is centrally managed - but that's generally an advantage in a company with a competent IT department because it means there's only one place to backup and only one place to manage storage in.

MarkR
With DSCM's every developer has a copy of the repository. How much backup do you want.
Ikke
+5  A: 

To some extent, the two schemes are equivalent:

  • A distributed VCS can trivially emulate a centralised one if you just always push your changes to some designated upstream repository after every local commit.
  • A centralised VCS won't usually be able to emulate a distributed one quite as naturally, but you can get something very similar if you use something like quilt on top of it. Quilt, if you're not familiar with it, is a tool for managing large sets of patches on top of some upstream project. The idea here is that the DVCS commit command is implemented by creating a new patch, and the push command is implemented by committing every outstanding patch to the centralised VCS and then discarding the patch files. This sounds a bit awkward, but in practice it actually works rather nicely.

Having said that, there are a couple of things which DVCSes traditionally do very well and which most centralised VCSes make a bit of a hash of. The most important of these is probably branching: a DVCS will make it very easy to branch the repository or to merge branches which are no longer needed, and will keep track of history while you do so. There's no particular reason why a centralised scheme would have trouble with this, but historically nobody seems to have quite gotten it right yet. Whether that's actually a problem for you depends on how you're going to organise development, but for many people it's a significant consideration.

The other posited advantage of DVCSes is that they work offline. I've never really had much use for that; I mostly do development either at the office (so the repository's on the local network) or at home (so there's ADSL). If you do a lot of development on laptops while traveling then this might be more of a consideration for you.

There aren't actually very many gotchas which are specific to DVCSes. There's a slightly greater tendency for people to go quiet, because you can commit without pushing and it's easy to end up polishing things in private, but apart from that we haven't had very many problems. This may be because we have a significant number of open source developers, who are usually familiar with the patch-trading model of development, but incoming closed source developers also seem to pick things up reasonably quickly.

+8  A: 

Not really a comparison, but here are what big projects are using:

Centralized VCSes

  • Subversion

    Apache, KDE, GNOME, GCC, Python, Ruby, Samba, MPlayer, Mono, Mediawiki, Django, Zope, Plone, Xiph, GnuPG, CUPS, vim, ... a many, many more.

  • CVS

    GNU Emacs, SQLite, FreeBSD, ...

Distributed VCSes

  • Mercurial (hg)

    Mozilla and Mozdev, OpenJDK (Java), OpenSolaris, ALSA, NTFS-3G, Dovecot, MoinMoin, mutt, PETSc, Octave, FEniCS, Aptitude, XEmacs, Xen, Xine...

  • git

    Linux kernel, Perl, Ruby on Rails, Android, Wine, Fedora, X.org, VLC.

  • bzr

    Apt, Mailman, MySQL, Squid, ... also promoted within Ubuntu.

  • darcs

    ghc, ion, xmonad, ... popular within Haskell community.

jetxee
This answer needs to be upvoted.
Spoike
The task was to "Keep the discussion tool agnostic", so this answer does not really help.
Weidenrinde
A: 

Centralized systems

  1. Entire processing is done at the server i.e the mainframe computer.

  2. Database control is done at the center i.e the central server.

  3. Security is high and is provided.

  4. It requires high maintenance cost and high hardware cost.

  5. The entire processing is done at the center so the overhead of the serversystem is increased.

  6. There would be terminal nodes.

  7. If the central server sytems breaks down entire system gets collapsed.

  8. There is no resource sharing.

  9. It generally requires low networking cost.

  10. Every backup recovery is centralized.

  11. System software is easily upgraded.

  12. OS software installation is not needed for terminal nodes.

here everything goes opposite for distributed systems

I think you're talking about some other kind of centralized than what relates to VCS:s. Most of these points are not relevant to the discussion.
calmh
+2  A: 

I have been using subversion for many years now and I was really happy with it.

Then the GIT buzz started and I just had to test it. And for me, the main selling point was branching. Oh boy. Now I no longer need to clean my repository, go back a few version or any of the silly things I did when using subversion. Everything is cheap in dvcs. I have only tried fossil and git though, but I have used perforce, cvs and subversion and it looks like dvcs all have really cheap branching and tagging. No longer need to copy all code to one side and therefore merging is just a breeze.

Any dvcs can be setup with a central server, but what you get is among other things

You can checkin any small change you like, as Linus says if you need to use more than one sentence to describe what you just did, you are doing too much. You can have your way with the code, branch, merge, clone and test all locally without causing anyone to download huge amount of data. And you only need to push the final changes into the central server.

And you can work with no network.

So in short, using a version control is always a good thing. Using dvcs is cheaper (in KB and bandwidth), and I think it is more fun to use.

To checkout Git : http://git-scm.com/ To checkout Fossil : http://www.fossil-scm.org To checkout Mercurial : http://www.selenic.com/mercurial/

Now, I can only recommend dvcs systems, and you easily can use a central server

Trausti Thor Johannsson
+1  A: 

Another plus for distributed SCM even in solo developer scenario is if you, like many of us out there, have more than one machine you work on.

Lets say you have a set of common scripts. If each machine you work on has a clone you can on demand update and change your scripts. It gives you:

  1. a time saver, especially with ssh keys
  2. a way to branch differences between different systems (e.g. Red Hat vs Debian, BSD vs Linux, etc)
rev
+2  A: 

Distributed VCS are appealing in many ways, but one disadvantage that will be important to my company is the issue of managing non-mergable files (typically binary, e.g. Excel documents). Subversion deals with this by supporting the "svn:needs-lock" property, which means you must get a lock for the non-mergable file before you edit it. It works well. But that work-flow requires a centralised repository model, which is contrary to the DVCS concept.

So if you want to use a DVCS, it is not really appropriate for managing files that are non-mergable.

Craig McQueen
+1  A: 

Everybody these days is on the bandwagon about how DVCSs are superior, but Craig's comment is important. In a DVCS, each person has the entire history of the branch. If you are working with a lot of binary files, (for example, image files or FLAs) this requires a huge amount of space and you can't do diffs.

elmonty