views:

312

answers:

7

I see a lot of sites referring to git, github, svn, subversion etc, but I never really knew what all of those things are. I also hear a lot of terms like 'svn repo', 'commit', and 'push' - I tried googling but it seems that I have so little knowledge about the subject that I don't even know where to get started.

Could someone give me the initial push so I can continue doing research on my own? What are these things all about?

Thanks!

guys: thank you so much for all the really long and encompassing explanations. I wish I could choose more than one answer, but unfortunately SO doesn't allow that (they should have a vote 1st, 2nd, and 3rd place feature or something). thank you all very much!

+9  A: 

They are all different versions of source control:

http://en.wikipedia.org/wiki/Revision_control

metismo
+1  A: 

Source code repositories.

Basically a way to share code, between a team, with the ability to see who "committed" (added) what code at what time, and who changed what at what time, etc.

bobobobo
+2  A: 

Have a look at chapter one of the (free, online) subversion book. It describes what version control systems (such as subversion) are about.

M4N
could you link to it?
yuval
thank you very much.
yuval
+7  A: 

GIT, Subversion and the like are all about version control. If you use such technologies for a project, all your source files are stored in a so-called repository (a.k.a. "repo") - except for files that don't need versioning (big files, user-specific files, ...).

Some advantages of version control are:

  • Branches. You can create a new branch for each bug you're working on, for example, without tampering other developer's code. Most version control systems will make cheap copies, i.e. a new branch will take up (almost) no extra space.
  • Versioning. You can always go back to old versions, update to new versions or look at the commit log to see what has happened on the code. GUI tools like TortoiseSVN even provide diff utilities which show you difference graphically. The term "commit" basically means to put new versions of files in the repository (or adding/deleting files). Version control systems also support "merging", that is, automatically merging changes on a file that was changed by several people (often line-based).
  • Simultaneous development. Multiple developers can have their own "working copy" (also called "checkout"). This means that - even if you don't use branches - your local code copy will compile even if others are currently working on the project (because they have own working copies). When you feel the current code can be useful for others, you can commit your changes, and others can update their copy.
  • Central storage and backup. This is valid for CVS/Subversion/..., not for GIT. It's an advantage because there's a central place to commit changes to, and to pull changes from other developers.
  • Distribution. But this is valid for GIT (not for Subversion). It means that there can be multiple repositories for a project, independent of each other. The Linux kernel, for example, has this. People can "pull" down their own repository on which they work - it acts like a full repository, i.e. commits are made locally and not to a server. If you want to include patches from other people's repositories (or from public repos like kernel.org), you just "pull" those changes to your local repo. If you want to give somebody else your patch, you "push" your changes to a remote repo (if you have the rights).

Hope that explained the terms you mentioned. I think a good start to get going with version control is Subversion, using TortoiseSVN for Windows if possible. There's even a free book about it - Version Control with Subversion.

AndiDog
thank you so much for taking the time to write this!
yuval
+17  A: 

Version control (a.k.a. revision control).

Consider the following problem. You're working on a project with someone else and you're sharing files. You both need to work on, say, "WhateverController.java". It's a huge file and you both need to edit it.

The most primitive way to deal with this, is to not edit the file at the same time, but then both of you have to be on the same page. When you've got a team, especially if the team has members of dozens or hundreds or thousands (typical for open-source projects), this becomes completely impossible.

An old, primitive "solution" to this problem was to have a checkout/checkin mechanism. When you need to edit a file, you "check it out", and the file is locked so no one else can edit it until you unlock it by "checking it in". This is done through the appropriate software, for example Microsoft's breathtakingly stupid piece of crap SourceSafe. But when people forget to "check the file in", then no one else can edit that file while it's in use. Then someone goes on vacation or leaves the project for some other reason and the result is unending chaos, confusion and usually quite a bit of lost code. This adds tremendous management work.

Then came CVS, and subsequently Subversion, which the authors call "CVS done right", so CVS and Subversion are essentially the same idea. With those, there is no actual check out. You just edit the files you need and check them in. Note that the actual files are stored on a central server, and each user runs the software on their own workstations as well. This location on the server is called a repository.

Now, what happens if two people are working on the same file in CVS/Subversion? They are merged, typically using GNU diff and patch. 'diff' is a utility that extracts the difference between two files. 'patch' uses such 'diff' files to patch other files.

So if you're working on WhateverController.java in one function, and I'm working on the same file in a different function, then when you're done with your stuff, you simply check it in, and the changes are applied to the file on the server. Meanwhile, my local copy has no idea of your changes so your changes do not affect my code at all. When I'm done with my changes, I check the file in as well. But now we have this seemingly complicated scenario.

Let's call the original WhateverController.java, file A. You edit the file, and the result is file B. I edit the same file at a different location, without your changes, and this file is file C.

Now we seemingly have a problem. The changes of file B and C are both changes to file A. So in a ridiculously backwards junk like SourceSafe or Dreamweaver will usually end up overriding the change of file B (because it got checked in first).

CVS/Subversion and presumably Git (which I know almost nothing about) create patches instead of just overriding files.

The difference between file A and C is produced and becomes patch X. The difference between A and B is produced and becomes patch Y.

Then patches X and Y are both applied to file A, so the end result is file A + the changes made to B and C on our respective workstations.

Usually this works flawlessly. Sometimes we might be working on the same function in the same code, in which case CVS/Subversion will notify the programmer of a problem, and present the problem within the file itself. Those problems are usually easily fixed, at least I've never had any problem solving them. Graphical utilities such as Visual Studio, Project Builder (Mac OS X) and the such usually show you both files and the conflicts, so you can choose which lines you want to keep and which to throw away... and then you can also edit the file manually if you want to merge the conflict manually.

So in essence, source control is a solution to the problem of multiple people working on the same files. That's basically it.

I hope this explains.

EDIT: There are many other benefits with decent source control systems like Subversion and presumably Git. If there's a problem, you can go back to other versions so you don't have to keep manual backups of everything. In fact, at least with Subversion, if I mess something up or want to take a look at an old version of the code, I can do so without interfering with anyone else's work.

Helgi Hrafn Gunnarsson
this is by far the clearest, most straight-forward explanation i've ever heard of revision control!
yuval
is it possible to work with subversion or git locally with my own application (as a sole developer)? This could be helpful for keeping track of changes in my own software
yuval
@yuval: Yes, it's absolutely possible to use svn or git by yourself, and very useful as well.
ebneter
+1 for bashing SourceSafe
Lucas
Merging is done (usually) via three-way merge, not via applying patches. This means that if there is conflict (you edit the same area of the same file) you would get conflict markers showing your version and their version in the region of overlap (and in some VCS you can choose to show also common/base version).
Jakub Narębski
@yuval: It is very easy to use git (or Mercurial, or Bazaar) locally for your own application. It is slightly more complicated with CVS or Subversion (although your editor/IDE/graphical tool may help with automating that).
Jakub Narębski
+4  A: 

Git and Subversion (also known as svn) are both source control or version control or revision control systems. They help you manage source code and track a history of the changes to each file managed by the system. The wikipedia article metismo links might be helpful.

github is a service to host and manage git repositories. It basically puts the repository online to make it easy for multiple people to interact with the repository.

The commit command generally stores a set of changes into the source control repository. This creates a new revision in the repository.

The push command only applies to distributed version control systems like git or mercurial (also known as hg). Push allows changes to be moved from one repository to another. The notion of distributed version control systems is that each user has their own repository. As a user completes changes, the user pushes them to other repositories (perhaps a central project repository, or as a patch for another user's repository).

The point of these systems is to

  • store a history of the development process
  • enhance collaboration between multiple developers
  • allow old versions of code to be restored and fixed
  • link source code changes to specific features or bugs (see fogbugz and kiln)
  • create variants of code (branches) for experiments or parallel development
John M. P. Knox
thank you very much!!!
yuval
+5  A: 

"The Git Parable" by Tom Preston-Warner (mojombo), one of people behind GitHub, describes how version control system, such like Git, might have been made... at the same time describing why one would want and need (distributed) version control system.

See also "A Visual Guide to Version Control" article at Better Explained.


There are many advantages of using version control system. Let's list them roughly in the order of increasing complexity: increasing number of developers, increasing project size / project history size, more complex workflows, etc.

Single developer, single branch

Even if you are single (only) developer of your project, and (at least for the time being) you do not plan to change it, version control system is still useful. It allows to:

  • Go back to some working version. If you are working on your project, and you realize that you completly screwed up, the approach you tried doesn't work and you don't know how to make it work, it is nice to be able to simply go back to last working version, and start anew.

    This means that you should commit, i.e. make snapshot of your changes when you have working version (well, there are exceptions, see below). To avoid losing to much work you should commit fairly often, best (see below) when you completed single feature, single issue, or single part of feature or issue.

    You would also want to know what you did, and what you were working on lately. This means that you should describe each changeset (each commit).

  • Annotate file / browse history. Unless you have perfect memory, sometimes you would want to know why (and when, and in the case when there are multiple developers also who) you wrote given set of lines. Comments are not always enough. For that you can use (if your version control system provides is) line-wise file history annotations (scm annotate or scm blame), or other similar tools like so called "pickaxe" search in Git, where you search/browse history for commits that introduced or deleted given string.

    For this to be useful you need to write good commit messages, describing the change and the intent of the change, so you would know why the change was made.

  • Bisect history to find errors. Modern version control systems offer alternative (to inserting print statements or debugger) way of finding bugs... at keast in some cases. When you notice a bug, or get a bugreport, and the bug is not the result of the last change, you can use version control system (csm bisect) to automatically find commit that introduced the bug (first commit that has given bug). Version control system finds such commit using bisection on project history, retrieving (checking out) versions which you mark as good (without bug) or bad till it finds commits that introduced the bug.

    For that you should always ensure that version works (or at least compiles) before committing it, otherwise you won't be ebale to decide if commit has bug or not. You should keep commits small (with not many changes), so when you find commit that introduced bug you would have to check only a amsll number of lines affected by change. You would also need good commit messages, so you would know why the change was made (and decide if the change is correct or not).

Multiple branches

Later on you would need another feature of version control system: the ability to work in parallel on different lines of development (flavors) of your project, so called branches. This includes but is not limited to:

  • Taging releases. When you release new version of your project to a larger public, you would want to tag (mark) released version. This way when somebody tells you that version X.Y of your project has a bug, you would be able to check out this version, and check if you can reproduce this bug (and perhaps find a bug via bisection, see above). This might be of use even if you are not releasing your project, if you use possibly different versions deployed in different places.

    For this tags need to be immutable (of course).

  • Long-lived branches. Let's assume that you released your project, and somebody found a bug. You would probably want to be ebale to put (release) fixed version without stopping work on new features, and without shipping version from development which might be unstable and contain multiple other bugs. Also you would want the bugfix to have also in version that you are working on (if it was not fixed independently).

    For this you would use long-lived branches: maintenance branch where you would comit only bugfixes, and development branch (or trunk) where you would do new work, introducing new features etc. There might be more branches with varying stability. For example Git project has four such branches: 'maint' for bugfixes, 'master' for changes that are quite stable, 'next; for development work, and 'pu' or "proposed updates" branch. In other workflows you have separate maintenance (bugfix) branch for each release.

    To quote Joel Spolsky: "Keeping stable and dev code separate is precisely what source code control is supposed to let you do."

  • Topic (feature) branches. When you want to work on multiple issues in parallel, where each feature takes multiple commits to finish, you would probably want to develop each feature (each tipic) in a separate branch. This way you would be able to switch from working on one feature to working on other feature (on other topic).

    This workflow is especially important if you are working with umtiple developers, see below.

Multiple developers

One of the most important features of version control system is that it enables collaboration between different developers, allowing multiple people to work on the same project without stomping on each others changes. This feature is well described in other responses, so I won't elaborate on it.

See also "Understanding Version Control", work in progress by Eric S. Raymond (author of, among others, "The Catedral and the Bazaar" and "The Art of Unix Programming") for description of various methods that version control system use to allow collaboration.

Jakub Narębski