views:

240

answers:

8

In our project files, if there are binary files, such as .doc, .xls, .jpg, and we choose to not keep their past revisions (just keeping a latest version is ok), is there a way to tell SVN, Git, or Mercurial or some other tool to skip the revisions for these files or for a particular folder?

Say, there is a 4MB .doc file that I need to check in hundred of times, but I don't really care so much about its past versions. So if the system keeps 100 revisions of it, that's already 400MB... checking in 300 times means 1.2GB for 1 file and that's not good. Only the latest version is good so that everybody can sync to it. Also I don't want other people check out the project and have to check out 20GB of stuff. (will Git and Mercurial keep all revision in each person's local repository?)

+1  A: 

The primary responsibility of version control systems is to keep a history of changes, so I don't think this is possible. Why use a version control when you only want the latest version?

John
A: 

In general, no: a VCS is intended to keep the entire history. However, all is not lost on the space front; all the systems you named will store binary diffs for each revision, not a complete copy of the entire file. This means that the space required will often be much less.

Andrew Aylett
+12  A: 

Note that this is not quite an answer.

If I forgo the discussion around not keeping the correct version of the file for posterity, I will at least comment on one part of your question, that might make you reconsider not keeping all the revisions of the file in the repository.

Version control systems typically doesn't store the entire file on each new revision, they store changes. Depending on the system, you might occasionally have a full copy of the file, but most of the changesets will be changes only.

For instance, in Mercurial, I tried this: First I downloaded the C# 3.0 language specification as a word file from this url: http://download.microsoft.com/download/3/8/8/388e7205-bc10-4226-b2a8-75351c669b09/CSharp%20Language%20Specification.doc

Then I committed this to a fresh Mercurial repository. Size before the commit (empty repository) was 80 bytes, size of file on disk was 2.387.968 bytes, and repository after commit was 2.973.696 bytes. Note that the file is now effectively stored twice, once in my working copy (the one I can edit), and once in my repository as part of my initial commit.

Then I opened the file, and changed all occurances of 3.0 with 4.0 (without the quotes), and all occurances of C# with VB, and saved. Then I committed the new version with a single-letter comment. Size of repository after commit is now 3.497.984 bytes. Difference is 512KB (there's some chunking involved in the repository, hence the size being an exact 512KB value.)

If I now open up the file again, change only the title page VB back to C#, save, and commit again, the size of the repository grows by 276KB, up to 3.780.608 bytes.

As you can see, changes does not commit an entire copy of the file, but granted, the differences aren't in the "10KB" range either.

Let's assume that the average size of each diff, for this file alone, will be somewhat inbetween those, let's say averages to 50% between the two values. This means that 300 commits of changes to this file, averaging 394KB totals 115MB. This is not alot

My suggestion is as follows:

  • Stop being cheapskates, disk space is cheap, compared to the headache you will have when someone says "I really wish I knew what that file looked like last week before you corrupted it".
Lasse V. Karlsen
+1 for conclusion!
Chris Kaminski
at work, when hard disk space is seldom used up, I think it doesn't matter. If it is the home computer, I really don't want to waste 20GB, 30GB or 60GB as time goes by, on each of the home computers. If the computer has 300GB hard drive, I am wasting 10% of it just because of not caring about it.
動靜能量
also, Lasse was looking at text file, but I am talking about binary files.
動靜能量
Most VCS store diffs for binary files only, text is just a binary file with interpretation.
Lasse V. Karlsen
+3  A: 

A quick check of hard drive prices puts 1 terabyte (TB) internal drives around $75 USD each. Using your math, that's 250,000 copies of your 4MB file, or $0.0003 per copy. Typical overhead for a programmer for an hour is around $100.

What costs more: keeping all of the versions of that file, or paying a programmer to recreate an older version if you ever need that copy again?

Craig Trader
I second your opinion, but: The main cost is not the hard drive but the (tape) backups.
ur
That's even easier: backup up to external hard drives. They're faster and more reliable than tape, and cheaper once you factor in the price of the tape changer and all of the media.
Craig Trader
Keep in mind that *THAT* $75 USD is for a "Consumer" harddrive, if you are talking about "SAN" harddrives, I've heard them weight in at about $1k USD per TB.... (This info might be old/etc but you get the idea)
Pharaun
Oh, enterprise drives do cost a bit more than consumer drives, but they don't cost 10 times as much. Even if they did, that's still only $0.003 to store a 4MB file.
Craig Trader
And even suppose something outlandish like version control costing $5 PER MONTH in disk space, that's nothing at all compared to the time saved and the hassles avoided in the development process.Even if an intern paid $25 dollars an hour is able to fix a mistake that would have otherwise taken half an hour of his time, you've already more than made back your investment.
dimo414
@dimo414, my point exactly.
Craig Trader
+1  A: 

For your specific need, where you can remove past versions whenever you want, a VCS (a Version Control System, made to never lose a version) are not well suited.

A repository manager (which is a more advanced solution than a simple shared path on a filesystem) is what you are looking for.
(E.g Nexus Sonatype, to mention only one)

VonC
+2  A: 

This is not a job for VCS, but for the filesystem, like Ken said.

However, if you really need such a 'feature', you may use hooks mechanism, to delete previous (lets say, older than 3 commits) versions of the file from the history.

takeshin
+1  A: 

I do know one that does this, but you're not going to like the answer.

Its Visual Sourcesafe. Check the flag 'store only latest version' on a file and it stops keeping history.

If you want this feature with a decent SCM, I would recommend not putting the file in the SCM at all, but store it elsewhere like a document management solution, or even just a filesystem share.

gbjbaanb
A: 

If all you want is to sync files across computers, use Dropbox.

If you are using version control, then see what Lasse V. Karlsen wrote, disk space is cheap.

Jaanus