views:

456

answers:

11

I need to keep under version some large files (some Gigs).

I don't need, and I can't keep under version all the version of the files. I want to be able to remove from my VCS large files version at some moment.

What control version system could I use?

EDIT: The files that I want to keep under version control are big .zip files or ISO images. These files may contains executable software or data (seismic data, SAR images, GNSS data) and they are provided by the software supplier of my company.

+3  A: 

I don’t think there’s any version control system that allows you do that regularly because that goes against everything version control systems stand for.

Bombe
I don't. Sometimes you make mistakes. You may accidentally check-in a huge binary that you just don't want anymore. It may double the size of your repo and you don't want it anymore.
Nick Pierpoint
That’s what I mean by “regularly.” Of course almost all systems allow you to remove a file from a repository somehow but that is by no means an everyday operation encouraged by a VCS.
Bombe
+8  A: 

In CVS you can do that by removing the files from the repo. Subversion allows that by dumping the content of the repo and filter it to remove the files (that is a bit cumbersome). Perforce has an obliterate command for that. Many of the newer distributed VCS make it rather difficult by their usage of hashes all over the places and the fact that your repo may have been replicated elsewhere also complicate things. Hg has a strip command (part of the Mq extension), Git can also do that I think.

Keltia
Good answer, and the only answer so far that answers the original question :). The link for handling this in Subversion is http://subversion.tigris.org/faq.html#removal. Here they also talk about having "obliterate" on the TODO list.
Nick Pierpoint
A: 

Many version control systems allow you to configure them in a way so that they store only the differences between several versions of a file and save space through that.

For example if you have a 1Gig file committed, change a part of it and commit it again, only the changed part will be stored in the version control system.
There won't be 2Gigs used (initial and new file) but only 1Gig+sizeOfChanges.

There's just one downside:if you're storing files which change their whole content from revision to revision this can also be counter-productive as the changes take almost the same space as the original version. Archive files are a example for such files where only a small change in the (real) content can lead to a completely changed content of the archive file.

I'd suggest to test several version control systems on your own and with your specific needs and environment and monitor each one at the server-side how the storage requirements for each system changes.

Kosi2801
+1  A: 

Some distributed version control systems allow to create "checkpoints" that allow you to use this version as kind of a base revision and safe you from pulling all the history before the checkpoint on every checkout. So you can remove the big files, create a checkpoint, and checkout/clone the repository from that checkpoint to a new directory. Then you have there a new, small repository, but without the history before the checkpoint. It you don't need that history you can burn the old repository on CD and use the new, partial one from now on.

I've only tested it in darcs, and there it works, but YMMV depending on version control system and use cases.

sth
+3  A: 

Hi there

TFS has a destroy command that you can use to permanently delete files or revisions as you see fit.

There is more information at this MSDN article.

Ray Booysen
+4  A: 

Perforce generally allows files to be put in two way, as head revision only (so, you'd only every have one copy) or all revisions. Perforce does have the admin level obliterate command that can be used to delete revisions. Its up to you to query for a list of files, possibly by date or number of revisions, and to specify the revisions to the obliterate command. As the name suggests obliterate deletes the revisions permanently from the database, so, I always generate scripts to do this and review them before running them. If the obliterate command is NOT run with the -Y flag, it will generate a list of what would be obliterated, also very useful.

Matt Shaw
P4 now allows you to specify the # of revisions to be saved as part of the file type (+S)
Tony Lee
+3  A: 

Somehow I get the impression that you should not use a version control system at all. As said before, what you're trying to do goes against everything you would need a version control system for in the first place.

I suggest you create a file system directory structure that makes sense for what you're trying to accomplish and so that you can structure your data. And just make backup's of those files.

Luke
Maybe, you're right, thanks.
Andrea Francia
For the types of files the OP is talking about, this is the best answer.
Chris Lively
+1  A: 

It sounds to me like you need an intelligent backup system, rather than version control.

I use SyncBackSE; it allows you to keep a number of previous versions, and can also do things like "ignore all files changed more than 30 days ago".

It's one of the few bits of paid-for software I use. I think it's worth checking out.

Brent.Longborough
+1  A: 

I think you're talking about something like "AlienBrain" "bucket" system, aren't you? The ability to remove some revisions from version control. If you want to destroy an item, it's normally called "obliterate" and it's supported by a number of systems out there. Buckets, AFAIK are supported by:

pablo
A: 

I would save such files under a unique name (datestamped, perhaps), and perhaps additionally make a textual reference to the external file in the version control system.

Arafangion
A: 

Fossil allows you to do this via the "shun" mechanism. Fossil being a distributed SCM, however, means that this does not affect all repositories (for obvious reasons).

JUST MY correct OPINION