A few basic version control questions

views:

227

answers:

+7 Q:

A few basic version control questions

... which I didn't feel like splitting into several question posts, since I guess, them being basic, most people here will know how to answer.

I've been developing for several years now, and I've never had the time to learn about version control. Renaming directories with different version names always seemed enough. Now, I've finally decided to learn it, but some basic terminology and working principles still confuse me.

My projects are relatively small, up to 10 or so files (although the files are relatively big), and are done in non-OO way. I often take one approach, do it to some point, then decide that that will not do, then (reusing old code) completely rewrite the whole project with completely different file organization, and internal code organization. File dissapearing and new files appearing between those "versions" are not uncommon.

So here are my "confusions":
1) for example, as described, I've putted the first version into vc. Then I delete all files, and rewrite them anew. If I understood well, that would be a new "branch", right ?
2) if I continue developing that version, I would just keep commiting to that "branch" ?
3) does vc save, when saving a save-point in a branch, all files in, or does it just save the difference between them?
4) can I easily get all (whole project) files from a certain save-point in a branch, or do I have to follow it up through diff's until the beginning ? (I just want to be able to say, "here, this is the save-point - copy all files you need so it looks like this")
5) what does it mean, in very simple-terms, to "push" and "pull". I don't understand the difference between "push"/"pull" and "commit".

If it matters, I'm using VS 2008, and am thinking of using git-extensions, for I've heard nice stuff about it. Is it a good combination, and would using SVN (for example VisualSvn or Ankh) be a better option for me, considering the above?

-- with regards, Peter

+1 A:

You don't really need a branch unless you need to work separately on two different sets of code for the same project. Branching is much more common in a team setting where most developers would be working on longer term features in the trunk, while one or two create a branch to release a specific feature earlier. They would then merge that branch back into the trunk, so all of the code was in one place again. This is just one example...

My recommendation is just to set up a Subversion repository (ideally on a different server, but if it has to be on your machine, it's better than nothing), and commit (push) your changes there. You can always update your local working copy (pull) from the repository.

Subversion does a binary diff on files, so it only saves the difference, not a whole copy. This can be a problem if you have a lot of renames/deletes (esp. directories) in branches, but should be okay if you stay in the trunk.

If you are worried about those issues, then I recommend Mercurial or Git, which create (clone) entire copies of the repository (no binary diffs). Mercurial has a slightly better Windows UI right now, if you are a GUI guy. However, Mercurial and Git don't have the same integrated tooling that svn does yet (with Ankh, for example).

I use svn with Ankh for VS2008, and in general, I love it (especially Ankh!), though I've been burned by the directory delete/rename issue in svn a few times...

Good luck!

Noah

Noah Heldman 2009-09-24 03:39:09

+1 A:

Incorrect, that would simply be a new revision. Branching is where your revision flow "forks" in two or more directions. This is generally a special operation distinct from adding and removing files.
As before, you haven't branched at this point. Generally you select which branch you're on using your version control tool. I'd highly recommend looking at the documentation around branching for whichever tool you wish to use.
Most versin control tools only save the differences between versions of a file. Some (notably CVS) will only do this for text files and changes to binary files will result in a full copy of the new version being stored. Note that this doesn't preclude you from getting the full file at any given revision, it's purely a matter of how the data is stored.
You can get the full state at any selected revision at any time.
With a centralised version control system (CVS, Subversion) you commit to the repository and check out from the repository. With a decentralised version control system (git, mercurial, darcs, bazaar) you clone a remote repository (or create your own local repository), commit changes to that repository and then you can push changesets to a different remote repository or pull changesets from a remote repository. The distinction is that each developer has their own copy of the repository and is generating changesets in their copy and for other people to see them these changesets need to be pushed out to a common place from where the other people can pull them. IBM has a decent introduction to the distributed version control concept.

Benno 2009-09-24 03:40:37

+5 A:

Part of your confusion probably comes from different version control systems (VCSs) that use terminology differently.

I usually think of code in "lines". I start with the original version of a file, and save it to my version control system. Putting it into the VCS is called a "check-in". The version control system tags it with some number, such as revision 1.0. Now I compile my software. It breaks, so I have to edit it. To do that, I "check it out" of the version control system and edit it. Now that it's fixed, I check it back in and the version control system stores it as revision 1.1. My boss wants a new feature, so I check it out, edit it, and check it back in again, and it's stored as revision 1.2.

That's the "main line" or "trunk" of code.

A version control system will let you get any old version of a file by specifying the revision number. Let's say I get a bug report from software based on revision 1.1. I can use "diff" or any comparison tool to compare 1.1 with 1.0 and see what changed. It doesn't matter how the version control system stores it internally, I just ask for it by revision number and I get the whole file.

The next thing to understand is that a group of files makes up your project or solution. When you're going to compile your software to release it to the world, you want to associate a "label" with all of those files so you can treat them all as a group. Most people use a numeric label, such as Windows 3.0, Windows 3.51, etc., but that's just convention. You could label a version "hardy heron" or "gutsy gibbon" if you want.

Now, this is all fine if you're one guy who just keeps updating things as you go along. But let's say you keep working on your software, and release version 7, then 8, then 9, and now you're working on version 10. But today you get a serious bug report on version 7 that you just have to fix. So you go to your VCS and request all the source files with the label "version 7". You get those into a separate folder on your disk, and fix the bug. But when you go to check those files in, you need them to be a part of version 7 because you've already added features in versions 8 and 9. This is when you create a "branch".

An example might be clearer. Let's say you checked out "version 7" of the package, and the file to fix was at revision 1.23. In version 10 (which you're working on in a different folder) you're working with revision 1.40. You don't want the changes for version 7 to go into 1.41, because that would overwrite and destroy all the neat features you added in revisions 1.24 thru 1.40. So you create a branch, and check in your changed file as revision 1.23.0.1. You compile it, and now the bug is fixed. And now you have to release it to your customers. When you release, you create a new label. I'd label this something like "version 7.1" so that I could tell the difference between the broken software and the fixed software. And I'd know that it didn't have all the features of versions 8+.

If you plot those software versions on a line, you'd think of a number line going straight from 1 to 10. Where does 7.1 fit on to this line? It sticks out the side, like a branch sticks out from the trunk of a tree. That's where we get the names of "branch" and "trunk" from.

John Deters 2009-09-24 03:50:04

Very good writeup.

Robert Fraser 2009-09-24 04:10:57

good summation, but very RCS/CVS oriented. In contrast to CVS, systems like subversion and git give unique IDs to each commit. This means you aren't checking out version 1.10 of foo.c, you're checking out foo.c from version 12345 of the repository. Same idea, but it removes the need to tag anything "version 7" for any other reason than convenience.

David Dombrowsky 2009-09-25 03:32:13

1) for example, as described, I've putted the first version into vc. Then I delete all files, and rewrite them anew. If I understood well, that would be a new "branch", right ?

You are correct in thinking that a rewrite should be treated as a new branch, but you will have to tell your version control system that you are creating a new branch.

2) if I continue developing that version, I would just keep commiting to that "branch" ?

Yes.

3) does vc save, when saving a save-point in a branch, all files in, or does it just save the difference between them?

I believe most version control systems store deltas between revisions instead of the entire files in order to conserve disk space.

4) can I easily get all (whole project) files from a certain save-point in a branch, or do I have to follow it up through diff's until the beginning ? (I just want to be able to say, "here, this is the save-point - copy all files you need so it looks like this")

Most version control systems will allow you to check out any revision.

5) what does it mean, in very simple-terms, to "push" and "pull". I don't understand the difference between "push"/"pull" and "commit".

Distributed version control systems have "push" and "pull" commands to upload or download changes to a remote server and the "commit" command to save changes to the local repository. This contrasts with SVN, which only stores revision information on the server side, so with SVN you "commit" to the remote server.

las3rjock 2009-09-24 03:50:25

+1 A:

I believe you'll find insightful this: An Illustrated Guide to Git on Windows.

Nick D 2009-09-24 03:51:16

Hi Peter, I'll try to answer your questions in turn.

1) for example, as described, I've putted the first version into vc. Then I delete all files, and rewrite them anew. If I understood well, that would be a new "branch", right ?

Deleting the files and rewriting from scratch is not the hallmark of a branch. Think of a trunk and a branch. A common pattern is to keep the code for ongoing (latest and greatest) development in the trunk. Then at some point in time, you'll want to release your code at which time you'll take a snap shot of the trunk. This snap shot is the branch.

2) if I continue developing that version, I would just keep commiting to that "branch" ?

Yes, you can keep developing the code in the trunk and code in the branch independently.

3) does vc save, when saving a save-point in a branch, all files in, or does it just save the difference between them?

From your point of view just think of the branch as a snap-shot, not as a sequence of diffs. Internally, different products implement the branches in different ways, but you need not concern yourself with this.

4) can I easily get all (whole project) files from a certain save-point in a branch, or do I have to follow it up through diff's until the beginning ? (I just want to be able to say, "here, this is the save-point - copy all files you need so it looks like this")

To reiterate, think of a branch as a snap-shot of files.

5) what does it mean, in very simple-terms, to "push" and "pull". I don't understand the difference between "push"/"pull" and "commit".

Systems such as sccs, cvs, svn, perforce, clearcase maintain a central repository for the assets. Users push their local copies into the repository. In svn, this is done using 'svn commit'. Systems such as git and mercurial maintain a distributed repository of assets, and the user pulls files from other repositories to update their own repository.

Phillip Ngan 2009-09-24 04:01:50

+1 A:

1) Branches are entirely under your control, the SC system won't surprise you with them (except for multiple heads in distributed sytems, but your example should not cause this.) Now, if you are doing a major reorganization, you might choose to create a new branch so you can easily keep them separate, but it's your choice.

2) If you choose to use branches, you could continue to develop whichever one you like. So long as yout changes are committed, you have your checkpoint and can move between them freely. In some cases you can even share changes between them, but I wouldn't get into this right off.

3&4) Most systems will save differences as a matter of space economy, but it is an implementation detail. You can retrieve any version at any time, exactly as it was committed.

5) Commit is setting a checkpoint which you can return to later. Push and pull refer to sharing of changes between you and another server or repository; push is you sending changes somewhere, pull is downloading someone else's work. Centralized systems like Subversion imply push with commit, and require you to be up to date with 'pulls'. Distributed systems allow (and require) you to do each part separately.

Justin Love 2009-09-24 04:02:36

+2 A:

I think you (and anyone who doesn't quite understand version control) should read Eric Sink's articles on the subject.

Jeanne Pindar 2009-09-24 04:58:11

These articles are far and away the best introduction to version control I've seen, highly recommended.

Jim T 2009-09-24 07:48:26

ansaurus

tags:

views:

answers:

A few basic version control questions

related questions