tags:

views:

640

answers:

7

If our organisation were to switch from a central-server VCS like subversion to a distributed VCS like git, how do I make sure that all my code is safe from hardware failure?

With a central-server VCS I just need to backup the repository every day. If we were using a DVCS then there'd be loads of code branches on all the developer machines, and if that hardware were to fail (or a dev were to lose his laptop or have it stolen) then we wouldn't have any backups.

Note that I don't consider it a good option to "make the developers push branches to a server" -- that's tedious and the developers will end up not doing it.

Is there a common way around this problem?

Some clarification:

With a natively-central-server VCS then everything has to be on the central server except the developer's most recent changes. So, for example, if a developer decides to branch to do a bugfix, that branch is on the central server and available for backup immediately.

If we're using a DVCS then the developer can do a local branch (and in fact many local branches). None of those branches are on the central server and available for backup until the developer thinks, "oh yeah, I should push that to the central server".

So the difference I'm seeing (correct me if I'm wrong!): Half-implemented features and bugfixes will probably not available for backup on the central server if we're using a DVCS, but are with a normal VCS. How do I keep that code safe?

+1  A: 

It's not uncommon to use a "central" server as an authority in DVCS, which also provides you the place to do your backups.

Brad Wilson
A: 

You could have developer home directories mount remote devices over the local network. Then you only have to worry about making the network storage safe. Or maybe you could use something like DropBox to copy your local repo elsewhere seamlessly.

Jonathan Tran
<blockquote>home directories mount remote devices over the local network</blockquote>We've tried that before, and it's usually disasterous because of network lag. That, and it means a whole lot more stuff for the backup tapes.
Stewart Johnson
+3  A: 

I think it's a fallacy that using a distributed VCS necessarily means that you must use it in a completely distributed fashion. It's completely valid to set up a common git repository and tell everybody that repository is the official one. For normal development workflow, developers would pull changes from the common repository and update their own repositories. Only in the case of two developers actively collaborating on a specific feature might they need to pull changes directly from each other.

With more than a few developers working on a project, it would be seriously tedious to have to remember to pull changes from everybody else. What would you do if you didn't have a central repository?

At work we have a backup solution that backs up everybody's working directories daily, and writes the whole lot to a DVD weekly. So, although we have a central repository, each individual one is backed up too.

Greg Hewgill
Greg - I've clarified the question to highlight that I'm talking about half-implement-feature/bug branches. VCS or DVCS there would need to be a central server for releases and so forth anyway.
Stewart Johnson
+10  A: 

I think that you will find that in practice developers will prefer to use a central repository than pushing and pulling between each other's local repositories. Once you've cloned a central repository, while working on any tracking branches, fetching and pushing are trivial commands. Adding half a dozen remotes to all your colleagues' local repositories is a pain and these repositories may not always be accessible (switched off, on a laptop taken home, etc.).

At some point, if you are all working on the same project, all the work needs to be integrated. This means that you need an integration branch where all the changes come together. This naturally needs to be somewhere accessible by all the developers, it doesn't belong, for example, on the lead developer's laptop.

Once you've set up a central repository you can use a cvs/svn style workflow to check in and update. cvs update becomes git fetch and rebase if you have local changes or just git pull if you don't. cvs commit becomes git commit and git push.

With this setup you are in a similar position with your fully centralized VCS system. Once developers submit their changes (git push), which they need to do to be visible to the rest of the team, they are on the central server and will be backed up.

What takes discipline in both cases is preventing developers keeping long running changes out of the central repository. Most of us have probably worked in a situation where one developer is working on feature 'x' which needs a fundamental change in some core code. The change will cause everyone else to need to completely rebuild but the feature isn't ready for the main stream yet so he just keeps it checked out until a suitable point in time.

The situation is very similar in both situations although there are some practical differences. Using git, because you get to perform local commits and can manage local history, the need to push to the central repository may not be felt as much by the individual developer as with something like cvs.

On the other hand, the use of local commits can be used as an advantage. Pushing all local commits to a safe place on the central repository should not be very difficult. Local branches can be stored in a developer specific tag namespace.

For example, for Joe Bloggs, An alias could be made in his local repository to perform something like the following in response to (e.g.) git mybackup.

git push origin +refs/heads/*:refs/jbloggs/*

This is a single command that can be used at any point (such as the end of the day) to make sure that all his local changes are safely backed up.

This helps with all sorts of disasters. Joe's machine blows up and he can use another machine and fetch is saved commits and carry on from where he left off. Joe's ill? Fred can fetch Joe's branches to grab that 'must have' fix that he made yesterday but didn't have a chance to test against master.

To go back to the original question. Does there need to be a difference between dVCS and centralized VCS? You say that half-implemented features and bugfixes will not end up on the central repository in the dVCS case but I would contend that there need be no difference.

I have seen many cases where a half-implemented feature stays on one developers working box when using centralized VCS. It either takes a policy that allows half written features to be checked in to the main stream or a decision has to be made to create a central branch.

In the dVCS the same thing can happen, but the same decision should be made. If there is important but incomplete work, it needs to be saved centrally. The advantage of git is that creating this central branch is almost trivial.

Charles Bailey
A: 

All developers on your team can have their own branches on the server as well (can be per ticket or just per dev, etc). This way they don't break the build in master branch but they still get to push their work in progress to the server that gets backed up.

My own git_remote_branch tool may come in handy for that kind of workflow (Note that it requires Ruby). It helps manipulating remote branches.

As a side note, talking about repo safety, on your server you can set up a post-commit hook that does a simple git clone or git push to another machine... You get an up to date backup after each commit!

webmat
A: 

We use rsync to backup the individual developers .git directories to a directory on the server. This is setup using wrapper scripts around git clone, and the post-commit etc. hooks.

Because it is done in the post-* hooks, developers don't need to remember to do it manually. And because we use rsync with a timeout, if the server goes down or the user is working remotely, they can still work.

Luuk Paulussen
A: 

I find this question to be a little bit bizarre. Assuming you're using a non-distributed version control system, such as CVS, you will have a repository on the central server and work in progress on developers' servers. How do you back up the repository? How do you back up developers' work in progress? The answer to those questions is exactly what you have to do to handle your question.

Using distributed version control, repositories on developers' servers are just work in progress. Do you want to back it up? Then back it up! It's as simple as that.

We have an automated backup system that grabs any directories off our our machines which we specify, so I add any repositories and working copies on my machine to that last, including both git and CVS repositories.

By the way, if you are using distributed version control in a company releasing a product, then you will have a central repository. It's the one you release from. It might not be on a special server; it might be on some developer's hard drive. But the repository you release from is the central repository. (I suppose if you haven't released, yet, you might not have one, yet.) I kind of feel that all projects have one or more central repositories. (And really if they have more than one, it's two projects and one is a fork.) This goes for open source as well.

Even if you didn't have a central repository, the solution is the same: back up work on developer's machines. You should have been doing that anyway. The fact that the work in progress is in distributed repositories instead of CVS working copies or straight nonversioned directories is immaterial.

skiphoppy
We don't back up developer workstations (it's expensive when you've got 100s of them) and encourage them to checkin a few times a day. Then we just have to back up the server. That's not an option with git.
Stewart Johnson
You are still in exactly the same boat, asking exactly the same question: do you back up developer work in progress, or not? You've chosen not to. Distributed version control doesn't make that situation any worse or better.
skiphoppy
The thing to realize is that distributed version control doesn't distribute your code across many machines. The only thing distributed across many machines is works in progress, which you're already not backing up. Somewhere will be the repository or repositories you release from; back those up.
skiphoppy