views:

256

answers:

4

I've been considering using a version control system like SVN as a general-purpose backup and synchronisation tool between the few PCs I use. This would be for all sorts of data, including MP3s and ripped DVDs - a LOT of data (120gb+).

My main issue is that SVN creates a copy of each versioned file in the .svn directory. While I can see that this is very useful in most cases, it's entirely unnecessary for my purposes, and a massive waste of disk space.

Is there a VCS which doesn't create a duplicate of the files in your working copy?

Edit to clarify: I'm just talking about the size of the required files on each computer. For SVN, that'd mean the size of the working copy and its meta files - for a DVCS, that'd be the size of the WC and the repository.

A: 

Actually, I believe, at least for text files, SVN only stores the differences between the files, not the entire file, for each change. Also, for each revision, it only stores the changes to the files that were changed, and nothing else for files that weren't changed. Unless the actual MP3 files are constantly changing (probably not), this would be a decent system for tracking the files. However, for files like this, you'd probably be better off just using rsync to synchronize the files, and not worry about tracking their actual history.

Kibbee
I don't think he is talking about the size of the repository. I believe he means the size of the checkout.
nlaq
@Nelson, yes that's correct.
nickf
Also, for each checked-out file, the checkout contains at least both the current file (modified or not) *and* a full copy of the base file. So usually checkouts are at least twice as big as the size of the working files alone. Export would help, but then you'd lose the version control abilities.
squelart
@squelart: yeah, that's exactly the problem I'm trying to avoid.
nickf
+3  A: 

Git is extremely thrifty when it comes to disk space.

The Git vs SVN Comparison Wiki states:

Git's repository and working directory sizes are extremely small when compared to SVN.

For example the Mozilla repository is reported to be almost 12 GiB when stored in SVN using the fsfs backend. The fsfs backend also requires over 240,000 files in one directory to record all 240,000 commits made over the 10 year project history. The exact same history is stored in Git by only two files totaling just over 420 MiB. SVN requires 30x the disk space to store the same history.

An SVN working directory always contains two copies of each file: one for the user to actually work with and another hidden in .svn/ to aid operations such as status, diff and commit. In contrast a Git working directory requires only one small index file that stores about 100 bytes of data per tracked file. On projects with a large number of files this can be a substantial difference in the disk space required per working copy.

Simucal
being a DVCS, doesn't that mean that each computer has the complete version history though? won't that make it continually expanding?
nickf
SVN only keeps the last checked out copy hidden in .svn. So no, only the SVN repo has the complete history.
Malfist
The difference is, correct me if I'm wrong, GIT relies on the Repo to create the diff files requiring more bandwidth where as SVN relies on the local machine requiring more disk space on the local computer. Although, nobody quote me.
Malfist
The number of files in a single directory can be reduced by enabling sharding. See the subversion book for more details
Bert Huijben
+3  A: 

I think you need to ask a more specific question to get the proper answer for what you are trying to do. You don't actually want a Version Control system, but a digital asset management system.

http://en.wikipedia.org/wiki/Digital_asset_management

Does that sound better?

Xedecimal
The person with 1 rep point is saying another doesn't know how to use SO?
Malfist
hahaha we all have to start somewhere, Malfist. :)
nickf
But that's like a baby telling it's parents that they don't know how to invest in the stockmarket properly
Malfist
@Malfist, no it's more like a stranger you don't know anything about making a suggestion. It's foolish to dismiss the suggestion based on your lack of background information instead on the merits itself. It's an ad hominem argument or a snobbish response.
Dave C
@Dave C, that is exactly right. Rep means nothing. A famous computer scientist could create an account and log onto SO, but he would only have 1 rep point so I guess you would dismiss an answer by him? The arguement makes no sense
Simucal
@Malfist the rep points do not dictate anything
Ric Tokyo
+1  A: 

Version control doesn't work very well with binary files. I would recommend backing up using rsync and not worrying about the history, you're probably not going to be changing the files if you're only ripping and storing.

If you don't want to delete the stuff off the backup that you have on the source, just don't add the --delete option to rsync.

Malfist
Yes, I'd agree with this ... for non-changing BLOBs like MP3s and movies, use an asset tracker or even just a backup and multi-machine sync process. SVN and other VCSes are great for text, not as useful for BLOBs.
Ian Varley