How are DVCS used in large teams?

views:

521

answers:

+16 Q:

How are DVCS used in large teams?

I've recently started getting into Git on a personal project, and I can see how a DVCS might benefit us at work (which is a large enterprise software company, currently running Perforce). Feature work in my team for example mostly consists of developers creating their own branches; sometimes these are shared between small teams of developers. I think it would be more efficient in this instance to use a DVCS.

In the more general case, though, I'd be interested to hear from people that use a DVCS at work, in medium to large teams.

How do you deal with N-way merges? Is this even a common scenario? Mercurial only supports N-way merges by doing (N-1) 2-way merges (and read that this is the preferred solution in other DVCS), which sounds like a very laborious process for even relatively small N.
Do you use a single central authoritative repository, or is it truly P2P?
Do developers often push and pull code to and from each other, or does everything go via the central repository?

Here is one example (by no mean a "universal" one)

We have central VCS (ClearCase or SubVersion, depending on the different projects), and we are using them for "official" developments efforts (dev, patches, fixes), where the number of branches is limited and well-identified.

However, for refactoring developments involving a lot of intermediate state, where nothing works, and where many developers needs to have their own activity-based branch or branches, some Git repositories are set up between those developers, in a P2P way.
Once the work achieve some kind of 0.1 stability, and merges are reduced, its is re-imported in the VCS, where the work can go on in an "orderly" central fashion.

Since Git on Windows works well (MSysGit), we manage to have small initial developments quickly done on the side that way.

We are still evaluating Git for a full-scale project development though.

VonC 2009-04-24 22:02:00

I've found the Windows support for Git pretty poor, IMO. MSysGit does seem to provide a good base, but the fact it's all built on MinGW (and historically was just run in CygWin) is a bit sucky. TortoiseGit (http://code.google.com/p/tortoisegit/) helps a fair bit, but it's still beta (possibly alpha, even).

alastairs 2009-04-25 23:06:29

It's probably best to look into how the linux kernel developers work. They have quite a complex workflow where changes are submitted from many sources, and then trusted developers for each subsytem (called lieutenants) pull in the changes, and when they're happy submit them to Linus, who eventually either pulls them into his tree or rejects them. Of course it's more complex than that, but that's a general overview.

David Plumpton 2009-04-25 04:02:46

+9 A:

My team at my previous employer used Git, and it worked well for us. We weren't all that large (maybe 16 or so, with maybe 8 really active committers?), but I have answers to your questions:

N-Way merges aren't terribly common. We came up with some conventions about branch naming that allowed us to write scripts that eased the "release engineering" process (I use scare quotes because we didn't have a release engineer), and people would create private feature branches, but we rarely had an issue with merging more than two branches (see the next one).
(and #3). We had a central repository on a development server for three reasons: (a) The development machine had a RAID5 (more fault tolerant) and nightly backups (dev workstations were not nightly), (b) production releases were built on the development server, and (c) having a central repository simplified scripting. As a result, N-way merges simply never happened. The closest thing we had to N-way was when someone merged laterally and then merged vertically.

Git was a really great thing for us because of its high degree of flexibility; however, we did have to establish some conventions (branch and tag names, repo locations, scripts, etc, process) or it might have been a little chaotic. Once we got the conventions set up, the flexibility we had was just fantastic.

Update: our conventions basically were thus:

a directory on our NFS server that housed all central repositories
we had several projects that shared components, so we broke them out into libraries, essentially, with their own repositories, and the deliverable projects just included them as git submodules.
there were version strings and release names imposed on us from above, so we just used a variants of those as branch names
similarly, for tags, they followed the process-dictated release names
the deliverable projects contained a properties file which I read into the shell scripts, and that allowed me to write a single script to manage the release process for all the projects, even though each one had slight variations on the process - the variations were accounted for in those property files
I wrote scripts that would rebuild a deliverable package from any tag
using git allowed us to control access using PAM and/or normal user permissions (ssh, etc)
There were other conventions that are harder to put in a bulleted list, like when merges should happen. Really, me and another guy were sort of the in-house "git gurus", and we helped everyone figure out how to use branches and when to merge.
getting people to commit in small chunks and not drop diff-bombs in the master branch was a challenge. One guy dropped about two solid weeks of work into one commit, and we eventually had to unravel it all. A huge waste of time, and frustrating to all.
informative and detailed comments to go with commits

There were other things that you learn as your team gets experienced and learns to work with each other, but this was enough to get us started.

Ben Collins 2009-04-25 04:23:23

I would welcome more info about your conventions and examples of your flexibility.

Norman Ramsey 2009-04-25 23:00:58

Me too :-) I was in the middle of a really waffly comment before @Norman came in with that one!

alastairs 2009-04-25 23:11:21

Well, one way that Git gives you a lot of flexibility is that it has dozens of programs instead of just one big one. That sounds messy, and it is, but that allows you to write some really powerful scripts by piping output from one command into another. It pretty much allows POSIX sh to be the extension language of Git. If you are proficient with shell scripting, that's a very, very powerful thing.

Ben Collins 2009-04-26 00:09:29

Hmmm, interesting you mention "diff bombs"... We have this problem with our Perforce deployment; developers often use branches to work on features or large bugs (no problem there), but then when it comes to integrating that branch back to the trunk you end up with a HUGE commit that takes forever to review (and often goes several rounds, too). I would have thought something like git that merges the branch's history back in as well would be a better solution. Maybe it's one of those "hard" problems to solve :-)

alastairs 2009-04-26 11:17:47

No....branching fixes the diff bomb problem in Git. What this guy did was do work for two weeks without committing anything. Then he just rolled up everything into one commit. Git is really flexible, but a commit is the atomic unit. The only way to fix it at that point is to merge his commit without committing the results, and selectively breaking up the changes for the merge commit.

Ben Collins 2009-04-26 19:21:48

"Diff bombs" can be lessened by pulling mainline into the dev branch regularly and not doing the final merge in mainline. I.e. you integrate mainline into the branch and test it there. Then merging back into mainline is a non-event. Additionally, you can then do partial merges back into mainline before the final commit. If you work on a "stale" version in isolation (in the branch) you are asking for trouble! Same for p4 or hg.

Nick 2009-10-14 14:09:12

@Nick: you don't have to do regular merges in git to avoid diff bombs. You just have to enforce a convention of small commits. How long a branch goes without merging is really immaterial to the diff bomb problem.

Ben Collins 2010-02-05 20:13:33

+1 for Diff Bombs. Hilarious.

Andres Jaan Tack 2010-07-09 15:14:08

+2 A:

I've been working for several years with the Glasgow Haskell Compiler team using Darcs. I've recently (several months) started using git for my own copy of the repo, both for performance and to improve my education.

How do you deal with N-way merges?

There are no N-way merges. Each developer originates a stream of patches, and streams are merged one at a time at each repo. So if N developers make changes simultaneously, they get merged pairwise.
Do you use a single central authoritative repository?

Absolutely. It's the only way to tell what's GHC and what isn't.
Do developers often push and pull code to and from each other, or does everything go via the central repository?

I think it depends on the developers and the VCS you are using. On the GHC project almost all the pulls and pushes I see go through the central repository. But there's a heavyweight (self-administered) gatekeeper on pushes to the central repo, and if a colleague has a bug fix I need now, I'll pull it direct from his or her repo. With darcs it is very easy to pull just a single patch (rather than the whole state as in git), and I know that my fellow deveopers, who have more experience with darcs, use this feature a lot more than I do---and they like it a lot.

With git, when I am working closely with one other developer, I will frequently create a new branch just for the purpose of sharing it with one other person. That branch will never hit the central repo.

Norman Ramsey 2009-04-25 22:59:55

The fairly famous "Tech Talk: Linus Torvalds on git" explains how it is used for Linux (about as big as team as I can think of)

If I recall correctly, it's use was likened to a Military chain-of-command - each module has a maintainer, who handle pull requests from developers, then there's a few "most trusted" people that deal with pulling data from the module maintainers into the official kernel.org git repository.

"Linux: Managing the Kernel Source With 'git'" also explains it, although again it's hardly a concise explanation..

dbr 2009-04-25 23:37:20

+6 A:

Wim Coenen 2009-04-26 00:35:45

What might "trusted lieutenants" translate to in a business/proprietary environment? A feature lead? Project manager? ;-) I'm not sure I see this working so well in my team for example, where we have a central Perforce repository to which everyone has access. It seems inefficient (and not very distributed...) to make a single person responsible for integrating to the blessed repository.

alastairs 2009-04-26 11:23:10

The other thing about our team is that we don't have clear-cut feature responsibilities, as such. Sure, we each have our own expertise on different areas, but we can move about, working on different areas of the product for different projects. This would appear to invalidate the integration manager/dictator-lieutenant model for us.

alastairs 2009-04-26 11:25:23

In a business environment, it's probably a better idea to split up projects if they become too large for a single integration manager. But don't overestimate the work an integration manager has to do: the developers are the ones who make sure that their public changes merge cleanly to the HEAD of the blessed repository.

Wim Coenen 2009-04-26 13:26:23

If you can't or don't want to burden anyone with the role of integration manager, then you obviously need to stay with the centralized model where every body pushes/pulls from the central repository. Developers can still pull from each other if they want to share changes which are experimental or not ready.

Wim Coenen 2009-04-26 13:29:17

+1 because I really like the term "blessed repository". Quite amusing.

Ben Collins 2009-05-07 13:34:21

+1 for this. Hadn't thought about it so clearly until I saw that picture. This could also be separate teams etc, where one team integrates work of the others. (I.e the lowest unit might be a team, not a single developer, if they're all well behaved.)

Marcus Lindblom 2009-10-11 19:57:34

ansaurus

tags:

views:

answers:

How are DVCS used in large teams?

related questions