Should checkins be small steps or complete features?

views:

519

answers:

+22 Q:

Should checkins be small steps or complete features?

Two uses of version control seem to dictate different checkin styles.

distribution centric: changesets will generally reflect a complete feature. In general these checkins will be larger. This style is more user/maintainer friendly.
rollback centric: changesets will be individual small steps so the history can function like an incredibly powerful undo. In general these checkins will be smaller. This style is more developer friendly.

I like to use my version control as really powerful undo while while I banging away at some stubborn code/bug. In this way I'm not afraid to make drastic changes just to try out a possible solution. However, this seems to give me a fragmented file history with lots of "well that didn't work" checkins.

If instead I try to have my changeset reflect complete features I loose the use of my version control software for experimentation. However, it is much easier for user/maintainers to figure out how the code is evolving. Which has great advantages for code reviews, managing multiple branches, etc.

So what's a developer to do? Checkin small steps or complete features?

+3 A:

My recommendation would be to create a branch or even separate repository for experimentation purposes. Then, once the feature is complete, you could then merge the code from the branch back into the main trunk of code. Hopefully, that would allow you to have the best of both worlds.

btreat 2010-06-14 22:48:12

I think a new repository for experimentation is a bit extreme, but pulling a branch for this type of work is an excellent use of branches. Don't have any continuous integration on the branch so breaking check-ins won't affect anyone else, and feel free to pound away.The one caveat is to regularly pull up changes from the line if others are making changes so you don't get too far afield.

Eric 2010-06-14 22:51:28

A new repository is the standard way to do this type of thing in mercurial (or at least was a few years ago).

Xiong Chiamiov 2010-06-14 22:55:35

The problem is if you *merge* the changes back mercurial will merge the revision history as well. So even if the code all comes into the main repo at one it's still broken up into small step-like commits instead of feature commits.

caspin 2010-06-14 23:15:34

+18 A:

So what's a developer to do? checkin small steps or complete features?

It's possible to get the best of both worlds, especially with git and other DVCSs that let you be selective about which history to publish. Here's a simple workflow that illustrates this.

Your project has master and release branches. Developers each maintain their own develop branches that they don't push.
You use develop to do your day-to-day work. Bite-sized commits appear here, representing incremental advances in the state of the project over time. You might make topic-* branches for working on longer features that span more than a few days or major refactorings. You commit to develop very frequently, perhaps several times an hour. It's like hitting "Save" in a document that you're editing.
When you have some commits that are suitable for the next release, you merge the relevant commits to release. release now has a bunch of individual commits that have selectively been taken from your develop branch. You commit to release whenever you reach a good stopping point. That's usually a few times a day.
When the release is ready to go, your lead developer squashes all the commits since the last merge to master into a single merge commit that appears on master. Then you tag this commit with the release identifier (e.g., v.1.0.4). This happens infrequently, perhaps once an iteration or every few weeks.

Here, you get to have your cake and eat it too. Prior to releasing, you can rollback changes that shouldn't have happened or that you don't want to go into the release, and you can do it one at a time. Developer-friendly! But users get what they want, too: big, globby commits on master that represent what's changed since the last release.

John Feminella 2010-06-14 22:50:58

+1, this is what we need to be working towards as an industry.

Craig Trader 2010-06-14 22:53:53

More on topic branches on Pro Git ( http://progit.org/book/ch3-4.html ) and SO ( http://stackoverflow.com/questions/284514/what-is-a-git-topic-branch ).

Xiong Chiamiov 2010-06-14 22:57:01

But there's a drawback to this approach too: you still have the issue of long periods between pushes. So from users (or other developers') perspective, it feels as though you were in the dark for a long time only to push a couple of big commits. The reality is different on your repository, but that's how it is on the central hub.

wilhelmtell 2010-06-14 23:02:19

wilhelmtell: I think you might be misunderstanding. There's definitely a long time between changes to `master`, but pushes to `release` should happen anytime you think your commits on develop are significant enough to represent some work you want to share.

John Feminella 2010-06-15 00:43:36

My two cents: in a lot of projects, the squashing happens between develop and release, not between release and master. The developer, not the integrator, knows how best to squash things into the "how it should have been done" history. Then they can push the clean result on into the integration pipeline (next -> master).

Jefromi 2010-06-15 15:08:58

@Jefromi: The two approaches aren't mutually exclusive. As a developer, you're free to decide that a bunch of commits should have been rolled up into one thing. In this (extremely simple) workflow above, though, the integrator _always_ squashes everything to a single commit on `master` when a release is ready. It's like a packaging of the entire release. (The other commits stay intact on `release`, though, if you want to see the full history.)

John Feminella 2010-06-15 15:20:04

@John Feminella: Ah, okay, I see what you mean, I think. Do you really mean a squash merge, though? It seems like a merge commit is ample to collect things, and you can use the `git log --first-parent master` to see only the history of merge commits, while still preserving the history instead of squashing. (It's a lot harder to go find the commit you want to see if it's been squashed, even if it does exist elsewhere.)

Jefromi 2010-06-15 16:12:36

Do you always keep your complete true history on `develop`? Or do you start `develop` anew from `release` every now and then -- perhaps when you squash-merge to `release`?

wilhelmtell 2010-07-16 12:57:21

Also: if `master` has (mostly) big commits, doesn't it reduce the effectiveness of `git-bisect`? Each developer can only easily find bugs in their own `develop` branch, and even that only so long as they keep that branch whole and never reset-hard on master.

wilhelmtell 2010-07-20 15:25:23

I think I can answer my two questions now. I think it's best to reset `develop` to `release` when done merging into `release`. It makes life easier later; otherwise I "pull the rug from under my own feet".

wilhelmtell 2010-08-03 19:47:35

As for my second question: indeed this workflow reduces from the effectiveness of bisecting, but only from the moment I reset `develop` to `release`. There's a balance to play with between merging as stable a code as possible into `release`, and merging into `release` often so new branches get as recent a tip as possible.

wilhelmtell 2010-08-03 19:50:16

Now, I encounter another issue. Is it possible to effectively branch off from a topic branch? I feel not, because I squash my topics into release, and this pulls the rug from under the subbranches' feet. So really, I feel it's only reasonable to branch off from release, and even that only assuming release never rebases itself. The issue with that, of course, is that sometimes I really do want to branch off from a topic, before the topic is ready to be squashed and merged into `release`. I suppose I need some more time and experience before I report back. :s

wilhelmtell 2010-08-03 19:54:12

+14 A:

The beauty of DVCS systems is that you can have both, because in a DVCS unlike a CVCS, publishing is orthogonal to committing. In a CVCS, every commit is automatically published, but it in a DVCS, commits are only published when they are pushed.

So, commit small steps, but only publish working features.

If you are worried about polluting your history, then you can rewrite it. You might have heard that rewriting history is evil, but that is not true: only rewriting published history is evil, but again, since publishing and committing are different, you can rewrite your unpublished history before publishing it.

This is how Linux development works, for example. Linus Torvalds is very concerned with keeping the history clean. In one of the very early e-mails about Git, he said that the published history should look not like you actually developed it, but how you would have developed it, if you were omniscient, could see into the future and never made any mistakes.

Now, Linux is a little bit special: it has commits going in at a rate of 1 commit every 11 minutes for 24 hours a day, 7 days a week, 365 days a year, including nights, weekends, holidays and natural disasters. And that rate is still increasing. Just imagine how much more commits there would be if every single typo and brainfart would result in a commit, too.

But the developers themselves in their private repositories commit however often they want.

Jörg W Mittag 2010-06-14 23:09:55

I've never really appreciated rewriting your commit history until your example with the kernel.

caspin 2010-06-16 14:49:58

+3 A:

One thing I really like about Git is that the repo in your dev. environment is YOUR repo. It's a copy of the maintainer's repo. You're free to do what ever you want to that repo and you won't tick off the maintainer unless you push up some crazy histories.

To that point, use branching and merging to your advantage as much as you can to aid in your development and experimentation. Only push the changes you are most comfortable with upstream. Git even gives you the ability to squash your commit history into fewer change sets if needed so you can push up a series of commits you performed into a single commit.

The flexibility is extremely empowering to your personal work flow as well as the policies your colleagues have in place.

Bryce 2010-06-14 23:18:01

+5 A:

Small steps. There's a reason it's called revision control, and not release control :)

Commit as often as you like. Don't hold back. There should never be negative consequences to committing code on an "in progress" branch. Development shops that expect commits not to "break the build" are misusing the RCS. Likewise, ascribing any meaning whatsoever to a commit is dangerous policy, simply because it conflicts with the purpose of revision control. Meaning should instead be ascribed to tags, branches, clones, stashes, or whatever your RCS calls them. These things have meta data (perhaps as minimal as a name) designed to convey the purpose. Revisions are simply a history of what you modified.

The last thing you want to do is institute a policy to discourage developers from committing their code, for any reason.

John 2010-06-14 23:43:44

Most of the time I can get away with the following rule of thumb -- check in the smallest amount at a time that makes sense (and still be useful or an improvement). I find this helps me better plan out my work, which has several benefits including (but not limited to) ...

Better development estimates.
Better testing estimates.
Faster development time.
Fewer overall bugs.
Less coupling between modules.
Finding out sooner if my code unintentionally broke something else.
many more

There are times however when it is necessary to create a branch and then when the work is done, merge that back into the mainline. However, once operating on the branch, I still try to follow the rule as it does automagically waive all those benefits away.

Hope this helps.

Sparky 2010-06-15 01:10:06

+3 A:

Small steps are really great. You can always bundle them into larger steps in another repo. To do the opposite you have to "rewrite history" which can be done in some systems (notably git), but it's not as well supported as you might like.

Another reason I like small steps is so my colleagues can easily see what I've done. If I work for three or four hours it's often much more sensible for me to reel off half a dozen commits so that my colleagues can see the relevant diffs. (And I appreciate it that they extend me the same courtesy.)

Finally, small steps make it less likely that you'll have conflicts, or that when you do, they'll be smaller.

I use small steps even when working alone, on multiple branches.

Summary: For daily workflow, small steps have many advantages. If you want a distribution-centric workflow, create a repo and a branch just for distribution, and you can set up your big steps there exactly the way you want them.

Norman Ramsey 2010-06-15 01:16:40

ansaurus

tags:

views:

answers:

Should checkins be small steps or complete features?

related questions