views:

109

answers:

3

Using Git or Mercurial, if the working directory is 1GB, then the local repository will be another 1GB (at least), residing normally in the same hard drive. And then when pushed to a central repository, there will be another 1GB.

Can Git or Mercurial be set to use only a working directory and then a central repository, without having 3 copies of this 1GB data?

(actually, when the central repository also update, then there are 4 copies of the same data... can it be reduced? In the SVN scenario, when there are 5 users, then there will be 6GB of data total. With Distributed Version Control, then there will be 12GB of data?)

Update: it is strange -- I just tried to look at a project I cloned using Mercurial: the working directory not including the .hg folder is 126MB, but the .hg folder is 239MB. And it is a new clone... is it because my new repository actually contain all the history / revisions, so that's why it is double the size of the working directory?

+4  A: 

Git or Mercurial are distributed version control systems. This means that every checkout contains the whole history of the project. Bypassing this would defeat the whole purpose of using a DVCS (every operation can be done offline).

But in general Mercurial or Git have a very high compression ratio, often better than svn even if they store the whole history.

tonfa
A: 

hg clone create hard links on unix file systems, so only changes introduced by new change sets use space in the storage. When you don't want a working copy, you can update the repo to the 'null' revision, which consist only of the repository without working copy.

Git also has the option of bare repositories and shared repositories, but I never tried them.

Rudi
git clone is able to use hardlinks as well, it's designed for cloning local repositories on unix. See manual. N.B> hardlinks don't cross file system boundaries (e.g. slices/partitions).
TerryP
hm, so I tried on Windows 7 and Git and Mercurial both seems to duplicate the file instead of using any link... so if I clone a repository with only 1 file that is 10MB with only the initial version, the new directory will have 20MB. The original repository is also 20MB.
動靜能量
git clone with the "-s" switch will just have a reference to the original repository location- but that still only works if the repository is locally accessible.
araqnid
Windows does not support hard links the way Linux/UNIX does so Git and Mercurial both have to copy the files under Windows.
Bombe
Bombe: Wrong. Windows DOES support hard links at the filesystem level (as long as you're using NTFS), but most Windows software appears not to be aware of this. Read up on `mklink` on Windows - it's included by default in Vista and Windows 7.
clee
A: 

You can do what you're asking as long as you have the filesystem with the "central" repository mounted and accessible locally.

From cmd.exe:

git --git-dir=Z:/path/to/git_repo_dir --work-tree=C:/path/to/checkout/root checkout master

And you can do this for as many checkouts as you want, but it's not really ideal. It's true that git doesn't work quite as well on Windows as it does on Linux - the ideal solution is for each clone to have hard links to the objects, so they're only physically stored on disk once, and then each clone can be checked out to a different branch, so you could track development/testing/production all at once, for example.

Also, as far as your concerns about disk usage go - try doing git gc --aggressive --prune on one of your repositories and see if it's still taking up a huge amount of space. In my experience, git is very good about storing only binary deltas - I have tested this by adding a directory full of MP3 files to a repository and committing them, changing the ID3 tags, and then committing the changes, and before I ran git gc there were clearly two copies of each MP3 in the .git folder, but after the git gc the size went back down to just slightly larger than the original working directory.

clee