Does a Distributed Version Control System really have no centralised repository?

views:

106

answers:

+7 Q:

Does a Distributed Version Control System really have no centralised repository?

It might seem a silly question, but how do you get a working drectory set up without a server to check out from? And how does a business keep a safe backed up copy of the repo?

I assume then there must be a central repo... but then how exactly is it 'distributed'? I always thought of a server-client (SVN) Vs peer-2-peer (GIT) distinction, but I don't believe that can be correct unless tools like GIT are dependent on torrent-style technology?

+4 A:

With distributed version control, you can have a complete copy of the entire history (the entire repository) embedded as part of your local (checked out) copy.

In addition, most projects will have some central repository that will also have a copy of everything. This means that at some point you will need to push your changes from your local repository to the central one. But it also means that you can work locally to your heart's content, and then only push the changes that you want to push, and you only need to push them when you are ready.

For example, look at the Linux kernel: Lots of people will check out "clone" a kernel tree from somewhere. It might be Linus's tree, or it might be one of the other trees floating around kernel.org or the internet. But Linus's tree exists on both kernel.org and (presumably) also on Linus's computer(s) (and the computers of anyone else that has pulled from there).

Joel's latest blog post described the advantages (and major difference from systems like Subversion) of a DVCS best:

When you manage changes instead of managing versions, merging works better, and therefore, you can branch any time your organizational goals require it, because merging back will be a piece of cake.

So you put a copy of the tree on some central server somewhere that other people can pull from (or on your private server so you have a backup) and when you feel like it, you push some bits across to there. Then if someone wants a copy, they can clone from there.

Adam Batkin 2010-03-19 09:43:33

+2 A:

Technically, DVCS don't need centralised repository.

In real life, an application (e.g. a Linux kernel) must be built from a single agreed collection of sources before being delivered.

In this way, DVCS don't impose any source management policy and leave such decisions to project managers.

mouviciel 2010-03-19 09:52:57

+1 A:

The "peer-to-peer" analogy refers actually to how you get changes from another repo.

With a DVCS, you fetch changes, and then choose to merge them if you want.
With a CVS you updates the changes, which impact directly your workspace

Since you can fetch from any other "peer" repository (a repo which share the same first commit), you could consider this a peer-to-peer model, since:

Peers are both suppliers and consumers of resources, in contrast to the traditional client-server model where only servers supply, and clients consume.

VonC 2010-03-19 09:54:51

+6 A:

Re: 'torrent-style technology' - you're confusing 2 issues, one of network topology (peer to peer vs. server/client) and one of server authority. This is understandable because the terms are almost identical. But there's nothing about distributed source control that makes any requirements on the network connection model - you could be distributing changesets via email if you prefer. The important thing with distributed version control is that each person essentially runs their own server and merges changes in from the other servers. Of course, you need to be able to get your initial clone from somewhere, and how you know where that 'somewhere' is falls outside of the scope of the system itself. There is no 'tracker' program or anything - typically someone has a public repository somewhere with the address published on a web site. But once you've cloned it, your copy is a full one that is capable of being the basis for someone else's clone.

Kylotan 2010-03-19 10:01:02

+1 for the authority, which is a keyword.

Tadeusz A. Kadłubowski 2010-03-19 10:05:42

+4 A:

There's an important distinction to make here: is there a technical central server, or is there one by convention.

Technically all clones of a of a git repository are equivalent. All of them allow changes, check-ins, branches, merging with each other. There's no single repository that is somehow "more true" than any other.

By social convention most projects using git have a central repository that's considered the authoritative repository, representing the official state of the project.

Compare that with a more traditional VCS such as SVN: here the central repository is technically very different from the local checkout that each developer may have. The local check-out can only do VCS operations in relation to the central repository. Without the central repository, the developer can't commit.

Joachim Sauer 2010-03-19 10:11:50

+5 A:

Does a Distributed Version Control System really have no centralised repository?

There is no enforced central repository - it is only by convention. Most projects do have a central repository, but each repository is equal in the sense that they have the full history, and can push and pull patches between each other.

One way to think of it is a centralised VCS is fixed in a star topology: one central hub acts as the server with the complete repository, with one or more clients hanging off it. The clients typically only have a copy of the most recent clean checkout, and limited history (if any). So most operations require a round-trip to the server. Branching is achieved by creating branches within the one repository.

In a distributed VCS, there is no limit to the topology of your network. You can theoretically have any shape you like. You can have a separate repository per team or sub-project, and stage commits. You can have a stable repository and an unstable repository, and lots of feature branches, and so on. And there is no client/server distinction - all nodes are equal. Each repository is self-contained and complete, and can push and/or pull changes from any other. To get started, you clone an existing repository (make your own copy to work from), and start making changes. Once you make your first commit, you effectively have a branch. Fortunately, it is usually very easy to merge your changes back when you're done.

But what normally happens is you have one repository which is on a central server, which makes it easier for people to get started, and to keep track of where the latest changes are.

how do you get a working drectory set up without a server to check out from?

Your repository has to start somewhere with a source tree. So there is always a first repository, with the initial series of checkins. Let's say you want to work on Murky. You would clone the repository, which gives you a complete repository of your own, with all the history and checkins. You make some changes (thus creating a branch), and when you're done, you push your changes back, where they get merged. Both systems are acting as peers, and they push and pull changesets between each other.

Both Mercurial and Git keep the repository in a hidden subdirectory, so the one directory tree contains both your working copy (which can be in whatever state you like), and the repo itself.

And how does a business keep a safe backed up copy of the repo?

As above, you simply have a nominated master repository which has all the latest merged changes, and back it up like anything else. You can even have multiple backup repos, or have automated clones on physically separate boxes. In some ways, backing up is easier.

I assume then there must be a central repo... but then how exactly is it 'distributed'? I always thought of a server-client (SVN) Vs peer-2-peer (GIT) distinction, but I don't believe that can be correct unless tools like GIT are dependent on torrent-style technology?

It is not distributed in the sense that different clients have different parts, like peer-to-peer file sharing. It is really just in contrast to the centralised model.

All DVCS repositories are first-class citizens. It becomes a social or managerial question of how to arrange them, rather than a technical issue.

gavinb 2010-03-19 10:26:36

Technically you don't need a central server: you can just exchange commits with your peers and that's it.

Logically (just take a look at github.com) there will be ALWAYS (at least) a central repository, some sort of "master copy" you've to rely on. I guess on Linux Kernel the Linus' repo is the master one from which ultimately changes are accepted, isn't it?

I think this will be specially true for companies embracing DVCS: they won't rely on developers' "copies" but centralized ones, although, obviously, there could be MORE than just one copy (which is very good to avoid disaster too :-P, and happens rather naturally with DVCS)

pablo 2010-03-19 23:33:17

ansaurus

tags:

views:

answers:

Does a Distributed Version Control System really have no centralised repository?

related questions