As a subversion user, git's index is the most challenging new concept I'm facing as I consider using it for new projects. I read many people's comments saying that they don't use the Index (always commit -a) but I'm thinking there might be a killer reason out there as to why I would want to make use of it. (I'm sharing code with around 5 other developers, working in a mature development environment where we merge code to test and stable branches and use branching for experimental or significant new features.)
The reason I appreciate Git's index is for staging of local changes. One thing you can do with the index is roughly the same as Subversion's "changelist" support, except it's more convenient. I often stage just one or two files out of several possibly modified ones, to construct a single commit containing just those files. With Subversion, I would have to think of a name for that changelist (even if it's just "work" or "temp"), and repeat typing that name several times during construction and committing of the changelist.
The index also supports the git add -p
feature which I think is one of Git's killer features. See Ryan Tomayko's The Thing About Git which describes how Git solves the "tangled working copy problem". You can stage just portions of modified files without having to mess around with temporary copies or playing tricks with Undo in your editor.
The index doesn't really participate much in your interaction with other developers. However, it can have a significant impact on how you interact with Git.
You know that the index lets you only commit parts of the files that you want to add to the repository, of course. In general, I find it useful for this reason. I can make changes to files that sort of work, check in the parts that work, and then complete and check in the rest.
For a really killer demonstration; try using interactive add, or patch add (using git add -i
, or git add -p
). This runs through all your changes and lets you selectively add them to the index. This lets you make a whole load of changes to your files and yet split the commits. Useful for those 'aha' fixes that we all make from time to time.
Have a look at this screencast to see how it's done. Not till you try it yourself will you see how useful it is.
Aside from interactive staging, the other important usage of the index is during a merge conflict: Git stages the three versions of the file so it knows the file isn't ready, so there's a version on hand that isn't littered with conflict markers. Third-party tools could use the index here to provide a nice merging interface.
That's not to say this feature fundamentally requires the index — I'm sure Mercurial handles merge conflict without having an index — but the way git approaches this seems nice to me.
I find the index really useful, and very rarely commit -a.
Since you're not always pushing to a remote repository when you commit, git users typically make smaller, more frequent commits, and push to a shared repo when a 'group' of changes are complete. This gives the flexibility of being able to revert or cherry-pick individual commits later on. Say I make 3 changes, and using subversion commit them all at once, then want to revert one of those changes.. or apply just one of those changes to another branch.. it's a very fiddly process. With git, you might add each file you've changed to the staging area then commit, separately. Obviously you need to make sure a commit is internally consistent and ensure each change set is 'atomic'.
You may also have local changes to a file under version control that you do not want to commit, such as a customised configuration file (or something). The staging area allows you to exclude that file from the set changes that are committed.
Several people have already mentioned git add -p, but if you've never used it you may not appreciate its utility. Suppose you have the following line of source code which contains 3 errors:
distance= rate * deltaT; /* compute tax rate */
(The three errors are: misnamed variable deltaT, whitespace error before the '=', and an invalid comment.)
You've already edited the file, but you want to make 3 distinct commits with an appropriate log message for each. With git, it's fairly trivial, since add --patch actually allows you to drop into an editor and edit the patch directly.
I find staging changes extremely useful for three reasons.
- I don't accidentally commit changes as much, since there is that extra step to stage the file.
- After making changes to a bunch of files at once via code generation or pattern substituation I like to step through the diff of each file before committing. Being able to stage files one by one is a nice way of bookmarking my progress.
- I might be working through a feature and find an outdated comment or bad formatting along the way in some unrelated section. I can easily stage and commit a tiny change like that, keeping my feature commit pure and focused.
If you want to make sure every commit will build and pass your test suite(1), then ignore the index as much as possible.
When you use the index (in the non-trivial way where you're checking in some changes but not others) you're checking in a state of the code that you probably haven't built or run the test suite on.
Sure, for some things (a change to some documentation, for example) this probably doesn't matter and it's perfectly safe to use the index. But it's good to get out of the habit of doing it the error-prone way and into the habit of doing it the right way:
- Use
git stash
to stash away everything you don't want to commit. - Build what's left.
- Run the test suite on what's left.
- Commit (all of) what's left.
- Unstash the other changes, repeat if necessary.
(1): Not everyone cares about each commit being a buildable, working state of code.
Some people do because it means any version someone checks out will at least build and run. This is important for open source projects (where someone might clone your project at any time), and helps when bisecting to find where a bug was introduced (you don't need to waste time skipping over non-working, test-case-failing states).
If you don't care about each commit being a whole, working state of the code, then this doesn't really matter.