views:

361

answers:

7

I have a fairly infrequent problem occuring with source control. In the example here the problem was occuring with Perforce, but I suspect the same problem will occur with many SCMs, especially distributed SCMs.

Perforce supports changelists (or changesets if you prefer). Changelists support two common usages:

  1. When you commit a changelist, the commit is atomic so that all the files are committed or none are. This is the headline feature that most people talk about when referring to changelists.

  2. Perforce supports multiple changelists. Basically, when you check out a file you tell it which changelist it belongs to. So, if you are working on the fancy new email feature which is going to take months of work and makes millions of dollars and somebody from tech support comes to you with a bug that must be fixed yesterday, you don't have to start with a new branch of the whole project. You can just check out the buggy file into a new changelist, fix the problem, check in the new changelist and get back to the real work of the new email feature, as though nothing had happened.

For the most part everything works great. However, when you are implemening the email feature you are making zillions of changes all over the place, especially in main.h, and it just so happens that when go to work on the bug fix you discover that the tiny change you have to make is also in main.h. The changelist for the new feature already has main.h checked out, so you can't easily put it in the changelist for the bug fix.

Now what do you do? You have several choices:

  1. Create a new clientspec. A clientspec in Perforce is a list of files/directories in the depot and a local destination where everything is to be copied. So you can create a second copy of the project without any of changes for the email feature.

  2. Do a fudge. Backup your modified copy of main.h and revert this file. You are then free to checkout main.h into the bugfix changelist. You fix the bug, check in the bugfix changelist, then checkout main.h into the email feature changelist. Finally you merge all your changes from the backup you made at the start.

  3. You determine that all the changes you have made to main.h have no side affects or dependencies, so you just move main.h into the bugfix changelist, make the change and check it in. You then check it out again into the email feature changelist. Obviously there are two problems with this approach: firstly there may in fact be side affects that you hadn't considered and secondly you have corrupted your version histoty.

Option 1 is probably the cleanest, but not always practical. A project I was working on had millions of lines of code and a really complicated build process. It would take a day to setup a new environment, so it was not really practical for a 5 minute bug fix.

Option 3 is a bad option, but is is the quickest, so it can be very seductive.

That leaves Option 2, which is the one I would generally use.

Does anybody have a better solution?

My apologies for the lengthy question, but I have discovered on StackOverflow that fully thought out questions elicit better answers.

+5  A: 

This exact problem has been called the "Tangled Working Copy Problem". Ryan Tomayko has a blog entry titled The Thing About Git that talks about this problem in detail and how Git addresses it.

This is one of the best things about Git. I use git add -p at least daily, to help commit individual chunks of code that make sense independently of one another. The fact that two logically different changes are in the same source file has become irrelevant.

Greg Hewgill
I only just found out about add -p. I could have used it 50000 times in the last 5 years of programming, first with svn and now with git.
1800 INFORMATION
+2  A: 

ClearCase supports also changelist (called "activity" in its UCM flavor), and presents a similar challenge.

Option 1 ("kind of branch") makes sense only when you determine that the "debug effort" is not compatible with the current development effort (email feature) and is best kept in a separate branch. Then you can retrofit whatever correction done in the "patch" branch to the main branch (since not every bug you will fix have to be present in both: the current development may have render some fixes obsolete).
See also "What is a branch", and what is your merge workflow.

Option 3 is an illustration of the limit of the notion of changeset: a single revision (or "version") of a file can only be part of one changeset at a time.

The git add -p (add patch) mentioned by Greg is an alternation to Option 1 and 3, since it takes advantage of the staging feature of the "Index" (Staging Area), zone in which you decide what will actually be committed, and what will remain in your private space.
That is nice but also in my experience quite difficult to sustain on a long period of time, especially on a common set of files upon which you apply two different evolutions. A branch is cleaner, simpler to unit-test. However, for a small fix like you mention, it could be a nice way out.

Option 2 is the practical solution when you realize you have two changes for two different efforts (that are still compatible, which do not "break" one another).
But may be an even simpler solution would be to just:

  • checkin the current state of main.h in email,
  • checkout in bug, fix it, checkin in bug
  • and then checkout in email to resume the email feature development.

Again, if the two development efforts (email and bug) are compatible, you can have a revision history with mixed activities.

VonC
+1  A: 

We use jobs so that a single 'task' can span multiple committed changesets.

Therefore:

  1. Check main.h changes are independent of other changes
  2. Check-in current state of main.h - under long-term Email job
  3. Do bug fix to main.h
  4. Check-in bug fix changeset
  5. Edit main.h under Email job if required
Douglas Leeder
+1  A: 

For Perforce, you can use a tool like p4 tar:

http://public.perforce.com/wiki/P4tar

It lets you save and revert your current changelist, make the fix, and then restore your work. You'd still need to integrate your changes to main.h, but it makes the task much easier.

Mark James
+2  A: 

I manage this with Perforce by maintaining multiple workspaces from the beginning. My primary development is on the mainline (where new development occurs), while another is pointing at a branch of released code. If I need to fix a bug, I go to the release branch.

I'm not sure if this would work for you, but at least you wouldn't need to create a new workspace each time you fix a bug (since it would already be there).

John Stauffer
+1  A: 

You didn't mention option 4 which is to create branches.

You can have the main code line to which no individual changes get made - just integrations from other branches.

Then you have the main development line which is where you are creating your fancy new e-mail feature. This is where you are doing most of your work.

Finally you have your bug fix branch. This is where you do all your minor edits and urgent bug fixes. Once these have been tested they get integrated into the main code line for QA and release (which should be on a separate branch). These edits can then be integrated from the main line into your development line so that you are always working on the latest code. This integration can happen at the time you choose - so that you can be confident that it's not going to cause any problems in your new code.

This is (IMO) the best solution.

ChrisF
Branching would also be the more natural solution for me!
pablo
A: 

I agree with ChrisF: branching would be the most natural solution for this.

I've used Perforce for a while, and it's true it is not as strong in branching as other SCMs out there, but it can be done.

The trick is really simple: create a branch for each task you're working on (the god-ol branch per task pattern), switch to it. What if you need to fix something else? Easy, just switch to a different branch after checking in everything (with some scms you don't even need to checkin) fix it and come back later to your original "email" branch.

pablo