Let me clear up your misconceptions:
Could anyone give me some idea to why git developers made a design decision to store contents of files (blobs), so when the content changes a new blob needs to be created?
Quite good explanation of the (initial) Git design can be found in Tom Preston-Werner's The Git Parable essay (in addition to the one linked to in Greg Hewgill answer).
The idea behind it is that usually (in large enough project) in a new revision only a few files out of large number of files in a project change, so storing only different versions of the file contents saves space. This is the same idea that Subversion uses in its 'cheap copy' technique (it uses hardlinking, IIRC).
Also the contents of the file is zlib (deflate) compressed (or to be more exact each object in git repository database is compressed, including comit objects).
I believe Subversion stores revisions rather than contents, so when the content changes, it simply keeps track of the differences between the two. Couldn't git have done it like this as well? What's the benefit of storing contents rather than revisions?
I don't understand what you wanted to say here.
If it was that storing differences saves space, then I'd like to tell you that in addition to the 'loose' format (where each blob, i.e. each (different) contents of a file is stored in separate file inside .git
) has also 'packed' format, where many objects are stored in deltaified form, using binary delta from LibXDiff library.
This format was created for network transfer (large disk space might be cheap, but bandwidth isn't), and was adapted as also on-disk format. This format is very efficient, one of more efficient if not most efficient version control systems formats, making git repositories smalles or one of smallest among different version control systems. Depending on circumstances full clone of git repository (which contains full history) might be smaller than equivalent Subversion checkout (which contains extra copy of pristine changes so that svn diff
and svn status
work without need for network transfer, with reasonable speed).
This design ('loose' and 'packed' format) has the advantage of very efficient packing, but had the disadvantage that you had to repack manually using "git gc
" (not for disk space, but for performance - disk I/O); nowadays most git commands repack repository (safely) when needed.