views:

504

answers:

7

I just made the move to version control the other day, and after a bad experience with Subversion, I switched to Mercurial, and so far am happy with it.

Although I understand and appreciate the idea of version control, I don't really have any practical experience with it.

Right now, I am using it for a couple websites I am working on, and a couple questions have come to mind:

  • When/how often should I commit? After any major change, whether it works or not? When I'm done for the night? Only when it reaches it's next stable iteration? After any bugfixes?
  • Would I branch off when I wanted to, say, change the layout of a menu, then merge back in?
  • Should I branch? What is the difference (for just me, a lone developer) between branching, then merging back in, and cloning the repository and pulling it back in?

Any other advice for a version control newbie?


So far, everyone has given me good advice, but very team-oriented. I would like to clarify:

At the moment, I am just using VC on some websites I do on the side. Not quite full-out freelance work, but for the purposes of VC, I am the only one that really touches the website code.

Also, since I am using PHP on the sites, there is no compiling to be done.

Does this change your answers significantly?

+2  A: 

I commit when I am finished a piece of work and only if it is working. It's bad practise to commit to somewhere where other people use the code.

Branching is something that people will argue about. Some people say never branch and just have switches to get something working or not. Do what you feel more comfortable but don't branch just because you can. I use branching and Branch when i am working on a major bit of work where if I commit broken code by accident its not going to affect everyone else.

AutomatedTester
+8  A: 

These are the practices that I follow

  • Each commit should make sense: one bug fix (or a set of bugs related to each other), one (small) new feature, etc. The idea is that if you need to rollback, your rollbacks fall on well defined "boundaries"

  • Every commit should have a good message explaining what you are committing. Really get into this habit, you will thank yourself later. Doesn't have to be verbose, a few sentences can do. If you are using a bug tracking system, associating a bug number with your commit is also extremely helpful

  • Now that I use git and branching is so incredibly fast and cheap, I tend to make a new branch for each new feature I'm about to implement. I'd never even consider doing this for many other VCSes. So branching depends on the system you are using, your codebase, your team, etc, there are no hard rules there.

  • I prefer to always use the command line and get to know my VCS's commands directly. The disconnect that a GUI based frontend can cause can be a pain, and even damaging. Controlling your source code is very important, it's worth getting in there and doing it directly. But that's just my preference.

  • Back up your VCS. I back up my local repository with Time Machine, and then I push out to a remote repository on my server, and that server is backed up as well. VCS alone is not really a "backup", it can go down too just like anything else.

Matt Greer
+1 for one bug/feature per commit. This saves time tracing down if something comes back and also lets the rest of team easily keep apprised of *how* you're fixing your issues.
Peter Bernier
+1  A: 

When/how often should I commit?

You'll probably get lots of contradictory answers on this one. My view is that you should commit changes when they are working, and each commit (or checkin) should contain exactly one "edit". An "edit" is an atomic set of changes that go together to fix a bug or implement a new feature.

There is a theory that you should check in code every few hours even if it's not working, but in that case you will need to be working on your own branch - you don't want to be checking in broken code to your main line, or onto a shared branch.

The advantage of checking in every night is that you have backups (assuming that your repository is on a different machine).

As for branching:

  • you should have main line that contains always working code.
  • you should have a current development branch that contains the latest code. When you are happy with this (and it's passed all it's tests) you should merge it back into the main line.
  • you might want a branch that contains the last released version. This can be used for testing/debugging bugs and releasing patches (in extremis).
ChrisF
+2  A: 
  • update before each commit
  • provide commit comments
  • commit as soon as you have something finished
  • don't commit anything that makes the code in the repository not compiling or buggy
  • update every morning
  • sometimes verbally communicate with colleages if there is something important to update
  • commit code relevant to exactly one thing (i.e. fixing a bug, implementing a feature)
  • don't worry to make very small commits, as long as they conform to the previous rule

Btw, what's the bad experience with Subversion?

Bozho
Bad experience? While moving a directory up a level, it threw a fit and stopped half-way, deleting a folder containing most of the site's content and somehow generating a dozen revisions for just that move. It took me the better part of a day just trying to reverse that and recover the data.
Austin Hyde
perhaps you were using a bad client? Subversion itself has a few drawbacks that can be observed after quite some time of work.
Bozho
Moving directories in subversion is a two step process. You have to remove the old files and add the new ones. Sometimes this won't work, one situation is if several devs are working on the files that are slated for a move but forget to commit. Their changes will then be magically removed. This is a big grief for agile practitioners who refactor their code a lot, i.e. move around their code files at times.
Spoike
I've been refactoring a lot of code and have rarely faced any problems
Bozho
I had problems with Subversion when I tried moving classes in C# to new namespaces; AnkhSVN corrupted the SVN repository. Corrupted as in the latest revision in the repo could not be retrieved or be repaired in anyway, but the old revisions were still working and we couldn't commit anything new to the repo. This was several years ago, AnkhSVN and later versions of SVN probably fixed it, but that damaged the confidence I had in SVN as version control back then.
Spoike
+1  A: 

Q: When/how often should I commit? After any major change, whether it works or not? When I'm done for the night? Only when it reaches it's next stable iteration? After any bugfixes?

A: Whenever you are feeling comfortable, I am commiting as soon as a unit of work is finished and working (which does not mean that the complete task has to be finished). But you should not commit something that does not compile (might inhibit other people in the team,if any). Also, you should not commit incomplete stuff to the trunk if there is any possibility that you have to implement a quick fix or small change before completing it.

Q: Would I branch off when I wanted to, say, change the layout of a menu, then merge back in?

A: Only if there is a possibility that you have to implement a quick fix or small change before completing your task.

The nice thing about branching is that all commits you are doing in the branch will still be available for future reference (if necessary). Also it is much simpler and faster than cloning the repo, I think ;-)

peter p
+1  A: 

I agree with others on commit times.

Regarding branching, I generally branch only when working on something that breaks what others are doing or when a patch needs to be rolled to production in a file that already has changes that should not go to production. If you're only one developer, then the first scenario doesn't really apply.

I use tags to manage releases - the "production" tag is always associated with the current prod code, and each release is tagged with "release-YYYYMMDD". This allows you to roll back if necessary.

Matt
+16  A: 

Most of the questions you're asking about depends mostly on who you are working with. If you're a lone developer it shouldn't matter a lot, since you can do whatever you'd like. But if you're in a team where you have to share your code then you should discuss with your team members what the code of conduct should be since sharing changes between one another can become tricky at times.

The discussion regarding code of conduct doesn't need to be lengthy, it can be very brief; as long everyone is on the same page on how to use the repository that is shared between the programmers in the team. If you want to use the more advanced features in Mercurial, such as cherry picking or patch queues, then try using them so that it won't impact your team members in a negative way, such as rebasing on a public repository.

Remember version control has to be easy to use for everyone in the team, or else it won't be used.

When/how often should I commit? After any major change, whether it works or not? When I'm done for the night? Only when it reaches it's next stable iteration? After any bugfixes?

While working with a team there are several approaches, but the common rule is to commit early and often. The main reason on why you should commit often is to make merge conflicts easier to handle.

A merge conflict is simply put whenever merging a file that has been changed by at least two people doesn't work because they've been editing on the same lines. If you're holding on to a commit that involves a very large change with several lines of changes across several files, it will become very difficult to manage for the receiver to manage the conflicts that may occur. The merge conflict becomes even more difficult to handle if the said set of changes are held on for too long.

There are some exceptions to the rule of committing often and one is whenever you have a breaking change. although if you have the ability to commit locally (which you are doing in Mercurial and git inherently) you could commit breaking changes. As long as you fix whatever broke, you should push it upstream to the shared repository when you've fixed your own breaking change.

Would I branch off when I wanted to, say, change the layout of a menu, then merge back in? Should I branch?

There are many branching strategies to choose from (there is the Streamed Lines paper from 1998 that has an exhaustive pattern list of branching strategies) and when you're making them for yourself it should be open game for yourself. However when working in teams, you'd better discuss openly with the team if you need to branch or not. Whenever you have the urge to branch though you should ask yourself the following questions:

  • Will my future changes be breaking the work of others?

  • Will my team have a direct negative impact from the changes I'll be doing until I'm done?

  • Is my code throwaway code?

If the answer is yes in any of the questions above you should probably branch publically, or keep it for yourself (since you can do that in Mercurial in several ways). You should first discuss with your team on how to execute the whole endavour to see if there is any other way of doing it and if you're going to merge your changes back in, sometimes there are factors at play where there is no need to branch (this is mostly related to how modular the code is).

When you decide to branch be prepared to handle a merge conflict. It is sane to assume the one who created the branch and made the commits to be able to merge it back into the "main branch". At these times it would be great if everyone in the team made relevant commit comments.

As a side note: You do write good commit comments, right? RIGHT!? A good commit comment usually tells why that particular change was made or what feature the committer was working on instead of a nondescript "I made a commit" kind of comment. This makes it easier for the one who is handling the big merge conflict to figure out what line changes can be overwritten and which ones to keep while going through the revision history.

Compile times, or build times rather, sometimes play into the branch discussion you may have. If your project has a slow build time then it might be a good idea to use a staging strategy in your branches. This strategy takes into account that all developers should integrate to a "main line" and changes that are approved are elevated (or "promoted") to the next stage, such as testing or release lines. It is classically illustrated with tag names for open source software like this:

main -o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-> ...
         \           \              \
test      o-----------o--------------o---------> ...
           1.0 RC1     \ 1.0 RC2      2.0 RC1
release                 o----------------------> ...
                          1.0

The point with this is that testers can work without being interrupted by the programmers and that there is a known baseline for those who are in release management. In distributed version control, the different lines could be cloned repositories and it may look a bit different since repositories share the versioning graph. The principle however is the same.

Regarding web development, there are virtually no build times. But branching in stages (or by tagging your release revisions) it becomes easier to roll-back if you want to check a difficult-to-track-down bug.

However, a whole other thing comes into play and that is the time it takes to deploy the site. Version control tools in my experience are really bad at asset management. Handling art assets that are in total up to several GB usually is a huge pain in the butt to handle in Subversion (more so in Mercurial). Assets may require you to handle them in another way that is less time consuming, such as putting them in a shared space that are synched and backed up in a traditional manner (art assets are usually not worked on concurrently as with source code files).

What is the difference (for just me, a lone developer) between branching, then merging back in, and cloning the repository and pulling it back in?

The concepts of branching and keeping remote repositories are closer now than with centralized version control tools. You could almost consider them being the same thing. In Mercurial (and in git) you can "branch" either by:

  • Cloning a repository

  • Creating a named branch

Creating a named branch means that you're making a new path in the versioning graph for the repository you're creating it on. Creating a cloned repository means you're copying the source repository into a new location, and making a new path in the cloned repository's versioning graph. They are both two different implementations of branching as a general concept in version control.

In practice, the only difference between both methods that you should care about is in usage. You clone a repository to have a copy of the source code and have a place to store your own changes in and you create named branches whenever you want to do small experiments for yourself.

Since browsing through branches is a bit quirky for those who accustomed to a straight line of commits, advanced users know how to manipulate their versions so the version history is "clean" with e.g. cherry picking or rebase. At the moment git docs actually explain rebase rather well.

Spoike