views:

211

answers:

6

I look at Mercurial repositories of some known products, like TortoiseHg and Python, and even though I can see multiple people committing changes, the timeline always looks pretty clean, with just one branch moving forward.

However, let's say you have 14 people working on the same product, won't this quickly get into a branch nightmare with 14 parallel branches at any given time?

For instance, with just two people, and the product at changeset X, now both developers start working on separate features on monday morning, so both start with the same parent changeset.

When they commit, we now have two branches, and then with 14 people, we would quickly have 10+ (might not be 14...) branches that needs to be merged back into the default.

Or... What am I not seeing here? Perhaps it's not really a problem?


Edit: I see there's some confusion as to what I'm really asking about here, so let me clarify.

I know full and well that Mercurial easily handles multiple branches and merging, and as one answer states, even when people work on the same files, they don't often work on the same lines, and even then, a conflict is easily handled. I also know that if two people end up creating a merge hell because they changed a lot of the same code in the same files, there's some overall planning failure here, since we've placed two features in the exact same place onto two developers, instead of perhaps trying them to work together, or just giving both to one developer in the first place.

So that's not it.

What I'm curious about is how these open source project manage such a clean history. It's not important to me (as one comment wondered) that the history is clean, I mean, we do work in parallel, that the repository is able to reflect that, so much the better (in my opinion), however these repositories I've looked at doesn't have that. They seem to be working along the Subversion model where you can't commit before you've updated and merged, in which case the history is just one straight line.

So how do they do it?

Are they "rebasing" the changes so that they appear to be following the latest tip of the branch even though they were originally committed a bit back in the branch history? Transplanting changesets to make them appear to' having been committed in the main branch to begin with?

Or are the projects I've looked at either so slow (at the moment, I didn't look far back in the history) at adding new things that in reality they've only been working one person at a time?

Or are they pushing changes to one central maintainer who reviews and then integrates? It doesn't look like that since many of the projects I looked at had different names on the changesets.

+2  A: 

I don't know how the TortoiseHg team does things, but you can use Mercurial's rebase extension to "detach" a branch and drop it on the top of the tip, creating a single branch.

In practice, though, I don't get concerned about multiple branches, as long as I don't see more heads than there should be. Merging is not really a big deal.

Michael Petrotta
+3  A: 

The Linux kernel is stored in thousands of repositories and probably millions of branches, and this doesn't seem to pose a problem. For large projects you need a repository strategy (e.g., the dictator–lieutenants strategy), but having many branches is the main strength of the modern DVCSes and not a problem at all.

Philipp
For example, the tip repo (http://git.kernel.org/?p=linux/kernel/git/tip/linux-2.6-tip.git;a=heads) has hundreds of branches, whereas Linus' repo only has the master branch.
Philipp
+3  A: 

Yes, we'll have to merge and to avoid heads on the main repository, merging should be done on the child repositories by the developer.

So before you push your code to the parent repository you first pull the latest changes, merge on your side and (try to) push. This should avoid unwanted heads in the master repo

Andreas_D
+6  A: 

Or... What am I not seeing here? Perhaps it's not really a problem?

It's not really a problem. In a large project even when people work on the same feature, they don't usually work on the same file. When they work on the same file, they don't usually modify the same lines. And when they modify the same lines, then a merge should be done manually (for the affected lines).

This means in practice that 80+% of the merges can be done automagically by Mercurial itself.

Let's take an example:

you have:

[branch 1]  [branch2]
        \    /
         \  /
        [base]

Edit: for clarity, by branch I refer here to unnamed branches.

If you have a file changed in branch 1 but the same file in branch 2 is the same as in base, then the version in branch 1 is chosen. If the file is modified in both branch 1 and branch 2 the files are merged line by line using the same algorithm: if line 1 in file1 in branch 1 is different than line 1 in file1 in base but branch 2 and base have the line 1 equal, line 1 in branch 1 is chosen (and so on and so forth).

For the lines that are modified in both branches, Mercurial interrupts the automated merging process and prompts the user to choose which lines to use, or edit the lines manually.

Since deciding which lines to use is best done by the person(s) who modified those lines, a good practice is to have the person that implemented a feature perform the merge. That means that if me and you work on the same project, I implement my feature, then make a pull from a central/common repository (get the latest version that everyone uses), then merge my new version with the pulled changes, then publish it to the common repository (at this point, the common repository has one main branch, with my merged changes into it). Then, you pull that from the server and do the same with your changes.

This implies that everyone is capable of doing whatever they want in their local repository, and the common/official repository has one branch. It also means that you need to decide on a time frame when people should merge their changes in.

I used to have three or four repositories on my machine already compiled on different product versions (different branches of the repository) and a few different branches in my main repository (one for refactoring, one for development and so on). Whenever I would bring one branch to a stable state (say - finish a refactoring) I would pull from the server, merge that branch into the pulled changes, then push it back to the server and let anyone know that if they made any changes to the affected files, they should pull first from the server.

We used to synchronize implemented features every Monday morning and it took us about an hour to merge everything, then make a weekly build on the server to give to QA (on bad days it would take two member of the team two hours or so, then everyone would pull the week's changes on their machine and use them as a new base for the week). This was for an eight-developers team.

utnapistim
You seem to by *branches* mean unnamed branches, which are what happens when there are multiple changesets with a single parent changeset. When those branches are merged (and the merge commit changeset appended) you still have the two branches when pushed to the central server. Although you only have one head, you still have a divergence in the history. And a clean history seemed important to the OP. Or is it an implied rebase Im missing here?
mizipzor
A clean history isn't important to me, I just wonder how these projects I look at manage such a straight history with multiple developers. I mean, I manage to get several parallel branches (unnamed) myself just by having a couple of computers and working on them all. I merge in changes to synchronize, but I still get parallel branches in the history graph. Yet those projects doesn't seem to have even that. Am I just undisciplined?
Lasse V. Karlsen
@mizipzor: I added an edit specifying that I meant unnamed branches. Thanks.
utnapistim
@Lasse V. Karlsen - I don't know how they manage a straight-line history (I have multiple branches on my single developer projects also).
utnapistim
+4  A: 

In your updated question it seems that you are more interested in ways of tidying up the history. When you have a history and want to make it into a single, neat, straight line you want to use rebase, transplant and/or mercurial queues. Check the docs out for those three and you should realise the workflow for how its done.

Edit: Since Im waiting for a compile, here follows a specific example of what I mean:

> hg init
> echo test > a.txt
> hg addremove && hg commit -m "added a.txt"
> echo test > b.txt
> hg addremove && hg commit -m "added b.txt"
> hg update 0 # go back to initial revision
> echo test > c.txt
> hg addremove && hg commit -m "added c.txt"

Running hg glog now shows this (diverging) history with two branches:

@  changeset:   2:c79893255a0f
|  tag:         tip
|  parent:      0:7e1679006144
|  user:        mizipzor
|  date:        Mon Jul 05 12:20:37 2010 +0200
|  summary:     added c.txt
|
| o  changeset:   1:74f6483b38f4
|/   user:        mizipzor
|    date:        Mon Jul 05 12:20:07 2010 +0200
|    summary:     added b.txt
|
o  changeset:   0:7e1679006144
   user:        mizipzor
   date:        Mon Jul 05 12:19:41 2010 +0200
   summary:     added a.txt

Do a rebase, making changeset 1 into a child of 2 rather than 0:

> hg rebase -s 1 -d 2

Now lets check history again:

@  changeset:   2:ea0c9a705a70
|  tag:         tip
|  user:        mizipzor
|  date:        Mon Jul 05 12:20:07 2010 +0200
|  summary:     added b.txt
|
o  changeset:   1:c79893255a0f
|  user:        mizipzor
|  date:        Mon Jul 05 12:20:37 2010 +0200
|  summary:     added c.txt
|
o  changeset:   0:7e1679006144
   user:        mizipzor
   date:        Mon Jul 05 12:19:41 2010 +0200
   summary:     added a.txt

Presto! Single line. :)

Also note that I didnt do a merge. When you rebase like this, you will have to deal with merge conflicts and everything just like as if you did a merge. Because thats pretty much what happens under the hood. Experiment with this in a small test repo. For example, try changing the file added in revision 0 rather than just adding more files.

mizipzor
This seems to be the most plausible way to get a clean history like the one I'm seeing, yes. I'll accept this answer, even though there's several good ones here.
Lasse V. Karlsen
+1  A: 

I'm a Mercurial developer, so let me explain how we/I do it.

In the Mercurial project we accept contributions in form of patches sent to the mailinglist. When we apply those with hg import, we do an implicit rebase to the tip of the branch we are working on. This help a lot with keeping the history clean.

As for my own changes, I use rebase or mq to linearize things before I push them, again to keep the history tidy. It's basically a matter of doing

hg push  # abort: creates new remote head
hg pull
hg rebase
hg push

You can combine the pull and rebase if you like (hg pull --rebase) but I've always liked to take one step at a time.

By the way, there are some disagreements about this practice of linearizing the history -- some believe that the history should show how things really happened, with all the branches and merges and whatnot. I find that as long as you don't mess with public changesets, then it's okay and useful to linearize history.

Martin Geisler
I agree, the changeset history should reflect how things really happened. I just wondered how they got such a clean history seeing as even just one developer (myself) gets to forget to update across all machines from time to time and manage to get parallel unnamed branches.
Lasse V. Karlsen