What git branching models actually work - the final question

views:

854

answers:

+21 Q:

What git branching models actually work - the final question

In our company we have successfully deployed git and we are currently using a simple trunk/release/hotfixes branching model. However, this has it's problems, I have some key issues of confusion in the community which would be awesome to have answered here. Maybe my hopes for an Alexander stroke are too great, quite possibly I'll decompose this question into more manageable issues, but here's my first shot.

Workflows / branching models - below are the three main descriptions of this I have seen, but they are partially contradicting each other or don't go far enough to sort out the subsequent issues we've run into (as described below). Thus our team so far defaults to not so great solutions. Are you doing something better?
Merging vs rebasing (tangled vs sequential history) - the bids on this are as confusing as it gets. Should one pull --rebase or wait with merging back to the mainline until your task is finished? Personally I lean towards merging since this preserves a visual illustration of on which base a task was started and finished, and I even prefer merge --no-ff for this purpose. It has other drawbacks however. Also many haven't realized the useful property of merging - that it isn't commutative (merging a topic branch into master does not mean merging master into the topic branch).
I am looking for a natural workflow - sometimes mistakes happen because our procedures don't capture a specific situation with simple rules. For example a fix needed for earlier releases should of course be based sufficiently downstream to be possible to merge upstream into all branches necessary (is the usage of these terms clear enough?). However it happens that a fix makes it into the master before the developer realizes it should have been placed further downstream, and if that is already pushed (even worse, merged or something based on it) then the option remaining is cherry-picking, with it's associated perils... What simple rules like such do you use? Also in this is included the awkwardness of one topic branch necessarily excluding other topic branches (assuming they are branched from a common baseline). Developers don't want to finish a feature to start another one feeling like the code they just wrote is not there anymore
How to avoid creating merge conflicts (due to cherry-pick)? What seems like a sure way to create a merge conflict is to cherry-pick between branches, they can never be merged again? Would applying the same commit in revert (how to do this?) in either branch possibly solve this situation? This is one reason I do not dare to push for a largely merge-based workflow.
How to decompose into topical branches? - We realize that it would be awesome to assemble a finished integration from topic branches, but often work by our developers is not clearly defined (sometimes as simple as "poking around") and if some code has already gone into a "misc" topic, it can not be taken out of there again, according to the question above? How do you work with defining/approving/graduating/releasing your topic branches?
Proper procedures like code review and graduating would of course be lovely, but we simply cannot keep things untangled enough to manage this - any suggestions? integration branches, illustration please?

Vote and comment as much as you'd like, I'll try to keep the issue page clear and informative enough. Thanks!

Below is a list of related topics on stackoverflow I have checked out:

Update: I accepted VonC's answer, because I found it the most useful, but also the one from Ninefingers was very good. I have still to compile this into something comprehensive, but essentially the most striking point was probably the reference to Linus's advices on rebase and merges, once I read it thorooughly. Thanks for all your awesome comments!

+8 A:

The most troubling feature new developers to DVCS need to realize is about the publication process:

you can import (fetch/pull) whatever remote repo you need
you can publish (push) to any (bare) repo you want

From that, you can respect a few rules to make your questions easier:

only rebase a branch if it hasn't been pushed (not pushed since the last rebase)
only push to a bare repo (mandatory since Git1.7)
follow Linus's advices on rebase and merges

Now:

Workflows / branching models:

each workflow is there to support a release management process, and that is tailored for each project.
What I can add to the workflow you mention is: each developer should not create a feature branch, only a "current dev" branch, because the truth is: the developer often doesn't know what exactly his/her branch will produce: one feature, several (because it ended up being too complex a feature), none (because not ready in time for release), another feature (because the original one had "morphed"),...

Only an "integrator" should established official feature branches on a "central" repo, which can then be fetched by developers to rebase/merge the part of their work that fits that feature.

Merging vs rebasing (tangled vs sequential history):

I like my answer you mention ("Workflow description for git usage for in-house development")

I am looking for a natural workflow:

for fixes, it can help associating each fix with a ticket from a bug tracking, which helps the developer remember where (i.e. on which branch, i.e. a dedicated branch "for fixes") he/she should commit such modifications.
Then hooks can help protect a central repo against pushes from non-validated bug-fixes or from branches from which one shouldn't push. (no specific solution here, all this need to be adapted to your environment)

How to avoid creating merge conflicts (due to cherry-pick)?

As stated by Jakub Narębski in his answer, cherry-picking should be reserved for rare situations where it is required.
If your setup involves a lot of cherry-picking (i.e. "it is not rare"), then something is off.

Would applying the same commit in revert (how to do this?)

git revert should take care of that, but that is not ideal.

How to decompose into topical branches?

As long as a branch as not yet been pushed everywhere, a developer should reorganize its history of commits (once he/she finally see the development takes a more definitive and stable shape) into:

several branches if needed (one by clear identified feature)
a coherent set of commits within one branch (see Trimming Git Checkins)

Proper procedures like code review and graduating ?

Integration branches (in a dedicated integration) repo can help the developer to:

rebase his/her development on top of that remote integration branch (pull --rebase)
solve locally
push the development to that repo
check with the integrator that doesn't result in a mess ;)

VonC 2010-04-12 12:01:40

Thanks VonC, I'll consider your answer ASAP!

UncleCJ 2010-04-12 12:05:14

@UncleCJ: as you can see, this is not exactly a *final answer* to your "final question" ;)

VonC 2010-04-12 12:06:25

I understand, and I have a fine sense of irony as well, it's ok ;-)

UncleCJ 2010-04-12 12:26:19

Ok, I've learnt one key thing from Linus' post, "you must never pull into a branch that isn't already in good shape", but I still find it tricky to trim your checkins to achieve strictly sequential history (is this necessary or not?!). It is so easy for two developers to mistakingly create an innocent merge and leave it there... Also, I still find it impossible to nail down exactly what "upstream" and "downstream" means.

UncleCJ 2010-04-12 13:04:25

@UncleCJ upstream is just where you regularly pull from, from my post, wherever all the commits end up (the release version or trunk in SVN parlance). Downstream is everyone below them. Sending stuff upstream is the process of getting it merged into the release repo (like linux-2.6) and downstream is the changes from there going out, or from your repository as say the manager of developing such a feature to your minions... I mean team.

Ninefingers 2010-04-12 13:10:21

I've read this post more clearly now and I agree entirely.

Ninefingers 2010-04-12 13:31:21

@UncleCJ: "I still find it tricky to trim your checkins to achieve strictly sequential history": it is easier with Git1.7 and its `rebase --interactive --autosquash` which will move automatically all commits with the same beginning of another commit message. If those commits use a ticket number (for instance), even if those fixes related to that ticket were not made sequentially at the time, the autosquash allows for a quick reordering of those commits.

VonC 2010-04-12 13:35:44

@UncleCJ: "strictly sequential history (is this necessary or not?!)": not always necessary, but it help keeping trace of **functional dependencies** (http://stackoverflow.com/questions/881092/how-to-merge-a-specific-commit-in-git/881112#881112) and **semantic conflicts** (http://stackoverflow.com/questions/2514502/better-simpler-example-of-semantic-conflict/2515163#2515163)

VonC 2010-04-12 13:39:11

@VonC: Additional silly question regarding trimming commits and whether to achieve sequential history or not - we have been talking about the individual developer before publishing their changes so far, what about Linus' "generals" as I understand they're called, do they straighten things out or on the contrary keep the merges to clearly display what baselines and merges their "minions" have used? Is it even possible to communicate this type of tangled branches via patches?

UncleCJ 2010-04-13 08:14:17

@UncleCJ: "Is it even possible to communicate this type of tangled branches via patches? ": yes, this is actually the all principle of the GitHub Fork Queue (http://github.com/blog/270-the-fork-queue).

VonC 2010-04-13 11:04:18

@UncleCJ: In a "general" configuration, if the commits doesn't apply cleanly, they are rejected, and the end user is asked to based his/her work first (on top of the destination branch fetch into the developer's repo), before pushing again his/her work to the general. Once that is done, no straightening (i.e. no rewriting of history) should be done if those "general" branches are pulled again. The end users wouldn't be able to easily pulled those public branches of any straightening had taken place.

VonC 2010-04-13 11:05:37

+4 A:

I think, and I might be wrong, that one of the things that's most misunderstood about git is its distributed nature. This makes it very different to say subversion in the ways you can work although you can mimick SVN behaviour should you want. The problem is pretty much any workflow will do, which is great but also misleading.

If I have my understanding of kernel development (I'll focus on that) right, everyone has their own git repository for developing the kernel. There is one repository, linux-2.6.git, looked after by Torvalds, that acts as the release repository. People clone from here if they wish to start developing a feature against the "release" branch.

Other repositories do some development. The idea is to clone from linux-2.6, branch out as many times as you like until such a point as you've got a working "new" feature. Then, when this is ready, you may make it available to someone considered trusted, who will pull this branch from your repository into theirs and merge it into the mainstream. In the linux kernel this happens on several levels (trusted lieutenants) until it reaches linux-2.6.git at which point it becomes "the kernel".

Now here's where it gets confusing. Branch names don't need to be consistent across repositories at all. So I can git pull origin master:vanilla-code and get a branch from the origin's master in a branch in my repository called vanilla-code. Providing I know what's going on, it really doesn't matter - it is distributed in the sense that all repositories are peers to each other and not just shared across several computers like SVN.

So, with all of this in mind:

I think it is up to each programmer how they do their branching. All you need is a central repository for managing releases etc. Trunk could be head. Releases could be tags or branches and hotfixes are probably branches in themselves. In fact, I'd probably do releases as branches so you can keep patching them.
I would merge and not rebase. If for example you take a repository, clone it, branch and do some dev, then pull from your origin you should, in your repository, probably make another branch and merge the latest master into yourbranch so that someone else can pull your changes with as little effort as possible. There is very rarely a need to truly rebase, in my experience.
I think it's a case of understanding the way Git works and what it can do. It does take a while and a lot of good communication - I only truly started to understand what's going on when I began using git with other developers and even now, some things I'm not sure about.
Merge conflicts are useful. I know, I know, you want it all to work, but, the fact is code changes and you do need to merge the results into something that works. Merge conflicts are in fact just more programming. I've never found an easy explanation for what to do about them, so here it is: note the files that have merge conflicts, go and change them to what they should be, git add . and then git commit.
However it suits. As I've said, each users git repository is their own to play with and branch names don't need to be the same. If you had a staging repository, for example, you could enforce a naming schema, but you don't need to for each developer, only in the release repo.
This is the merge stage. You only merge into release branches etc when you consider code to be reviewed/pass quality testing.

I hope that helps. I realise VonC as just posted a very similar explanation... I can't type fast enough!

Edit some further thoughts on how to use git in a commercial setting, as this seems relevant to the OP from the comments:

The release repository, we'll call it product.git, is accessible by a number of senior programmers / technical people responsible for actually looking after the product itself. They are analogous to the role of maintainers in OSS.
These programmers probably also in part lead development of new versions, so they might also code themselves and maintain varios repositories. They might manage staging repositories for really new features and they might also have their own repositories.
Below them are programmers responsible for developing individual bits. For example, someone might be responsible for the UI work. They therefore manage the UI.git repository.
Below them are the actual programmers who develop the features as their full day to day job.

So what happens? Well, everyone pulls at the start of each day from the "upstream" source i.e. the release repository (which will also probably contain the latest material from the previous days development). Everyone does this, directly. This will go on a branch in their repository, probably called "master" or maybe if you're me called "latest". The programmer will then do some work. This work might be something they're not sure about, so they make a branch, do the work. If it doesn't work, they can delete the branch and go back. If it does, they will have to merge into the main branch they're currently working on. We'll say this is a UI programmer working on latest-ui so he does git checkout latest-ui followed by git merge abc-ui-mywhizzynewfeature. He then tells his technical lead (the UI lead) hey, I've completed such a task, pull from me. So the UI lead does git pull user-repo lastest-ui:lastest-ui-suchafeature-abc. The UI lead then looks at it on that branch and says, actually, that's very good, I'll merge it into ui-latest. He might then tell everyone below him to pull from him on their ui-latest branches or whatever name they've given them, and so the feature gets explored by the devs. If the team is happy, the UI lead might ask the testing lead to pull from him and merge the changes. This propagates out to everyone (downstream of the change) who tests it and submits bug reports etc. Finally, if the feature passes testing etc, one of the top technical leads might merge it into the current working copy of the program, at which point all the changes are then propagated back down. And so on.

It's not a "traditional" way of working and is designed to be "peer driven" rather than "hierarchical" like SVN/CVS. In essence, everyone has commit access, but only locally. It is access to the repository and which repository you designate as the release repo that allows you to use hierarchy.

Ninefingers 2010-04-12 12:07:31

+1 for the "distributed" aspect clarification.

VonC 2010-04-12 12:19:22

Thanks a bunch for your extensive answer (and votes), I will read it a couple more time to wring useful information out of it. However, we're a company, not a OSS development committee ;-), and I have to help my developers with more clear guidelines than "fiddle around as you want in your own repository". Let's see where this post leads, I feel a good momentum, keep it coming!

UncleCJ 2010-04-12 12:41:19

@VonC Thanks. @UncleCJ true, but you do, I'm sure, have release-managers etc. Anyone with access to the repository can do these things. As for development, why not give developers the freedom, within reason, to branch away? Providing you have some protocol for agreeing merges and your central repository(ies) are named as you like, there isn't a problem. Having said that, a common naming schema isn't a bad idea. I tend to use initials-version-feature-subbranches for personal branches and version for branches.

Ninefingers 2010-04-12 13:07:53

@UncleCJ I've added an example for how it might work in a company. It's essentially the OSS roles replaced with managers, but you get the idea. It has the added benefit over SVN that your devs can work offline too (they only need the net to pull/push) and I think makes it easier to test features, if you implement it well.

Ninefingers 2010-04-12 13:30:05

Wow, that's actually a great example, we may start to use something like that for the graduation. I didn't mean so much that since we're not doing OSS everyone have to be regulated, we're actually a pretty small and flat team, but we have to try to collaborate efficiently on a tight schedule and also learning as a team. That's why I'm here asking these stupid questions so I can help the rest of the team later :-) . I also realized from #git that poorly defined baseline combined with pressure to shorten lead times is making us trip on our feet... will be back later.

UncleCJ 2010-04-12 13:43:44

That's fair enough - I've been there recently which is exactly how I picked up that example, by trying it and failing a lot... and also adapting to some OSS project's ways of working. I guess the real biggie is it doesn't matter how you branch and where your repositories are... you can define these in any way you want which was a real shocker for me anyway! But it allows you to do some interesting things. Anyway, good luck and have fun!

Ninefingers 2010-04-12 14:02:07

ansaurus

tags:

views:

answers:

What git branching models actually work - the final question

related questions