tags:

views:

312

answers:

11

For some years now, I'm waiting for Subversion to feature a "delete permanently" (obliterate) function. I hesitate to make the transition to Subversion (coming from Visual SourceSafe :p), because I think this is an essential feature, as otherwise I'd expect the repository to grow unstopably. However, for one reason or the other, the feature gets postponed over and over again. So I begin wondering if there is some other feature or workaround which makes the obliterate function dispensable.

What do you do when you want to shrink the SVN central repository?

Example 1: I check in a large third party library, and after a few weeks I realize it is not suited for my needs. I don't want that to store and backup that large amount of data forever.

Example 2: I have 10 versions of 10 big third party libraries in the repository, but I only use the latest versions.

Example 3: I accidentally checked in sensitive information (as suggested by John).

Example 4: I accidentally checked in some big files that were never meant to be put in the repository.

A: 

What I do - not use subversion. Sorry.

They (the developers) obvoiously don't agree with your assessment of that being a critical feature. Did not stop the company I work at at the moment to use it ;) I personaly rule out subversion for this exact reason.

TomTom
Is your employer now regretting the decision because it has to spend so much of its budget on disks now? I mean, how big an effect does this feature's absence really have?
Rob Kennedy
Maybe a stupid question, but which system != VSS supports obliterate? I wouldn't be surprised that the "wisdom of the crowds" shows, that this is no critical feature...
Marc Wittke
Been using SourceGear Vault for years now;) It really dpeends what you do - if your source control contains a lot of binary data.... as ours sometimes does.... things get nasty fast (missing delta). Not everone deals only with code. I know one time we added around 300mb per day into the source control system.
TomTom
@Mark Wittke: CVS does, IIRC as an admin feature.
David Thornley
@TomTom: As far as I know Subversion supports binary diffs. Is obliterate in case of SourceGear a method to avoid problems because of the inferiority of the software?
Mnementh
Hardly given that Sourcegear has a top end data store behind it.
TomTom
+15  A: 

It violates the meaning of source control.
Source control is all about being able to restore a previous state. If you delete a file permanently you won't be able to.

OTOH i do not know VSS so i might have misunderstood "delete permanently"

dbemerlin
Eventually, you reach a point where you don't want to go back that far anymore; at that point, the previous data is pure chaff and can (arguably should) be tossed. Agreed you don't reach that point quickly, or take the decision lightly.
T.J. Crowder
What if you accidentally commit some personal data? like letting all the devs see your evaluation comments on each other or salaries? That can be illegal in some cases, what do you do?
John
+1 Delete permanently is not something source control should do. I'd go further and argue that if system X allows permanent deletion then system X is not a source control system. The increase in disk space used by the repository is, furthermore, one of the weakest arguments in favour of permanent deletion
High Performance Mark
@John: Ask the admins to remove it (which is possible and not that hard). This should be the exception, not the normal case. If you fail to use source control correctly it's not the fault of the software.
dbemerlin
How do they do that? By what I've heard this is a horrible task.
John
SVN is a tool. It doesn't tell me how to work, at least it shouldn't. You don't like people forcing you to do things their way in real life, do you, when they think they know best?
John
It's more effort than simply doing "right-click => obliterate" but making mistakes like that _has to_ hurt so you won't make it again (and next time maybe don't even notice it or only delete half of the data). Still, last time i had to do that about 2 years ago it took me... about 3.5 minutes.
dbemerlin
@John: It's a simple question of "does it make sense". Does it make sense to spend all the developer hours for a feature that solves a problem that shouldn't exist?
dbemerlin
@dbemerlin: So you're saying that in effect, the feature exists. It's admin-level (as it should be), but it's there.
T.J. Crowder
no. You have to manually rip the repo apart based on what I was told.
John
@dbmerlin... if you design software that deliberately hurts people to force them to use the software as you want, you've got problems
John
@John: fair to say, but if a user doesn't like a piece of software because it deliberately hurts him, then he is free not to use it. Which is exactly what the OP does, and I think it's only fair to let the software makers decide all by themselves whether or not they've got problems by not having someone as a user.
RegDwight
Another approach toward sensitive data is to ACL the file out of visibility.
Yuliy
+7  A: 

The obvious reason against it is because the developers think it will on balance make SVN worse - the happiness you feel at being able to prune un-needed stuff will be vastly dwarfed by your anger when you accidentally obliterate something and your /trunk goes missing.

FogBugz has exactly the same behavior, and in their case it's entirely by design I believe, protecting users from themselves.

John
+7  A: 

Obliterate violates the version control principles that you'd want to have. Either you wouldn't save any space, or previous tags would become broken. You would not be able to go back to a true previous version if you had obliterated any files.

As for your comment about the repository growing... Any repository will grow linearly with the size of changes over time. That's the whole point of a source control system. If you don't need to be able to track prior versions, then why not just stick to a shared folder somewhere?

Yuliy
oh yes, return to the golden days of .old, .older, .oldest, .bak, .backup, .deleted, .obsolete and ~. Those were the days... feels like yesterday. I still have nightmares...
dbemerlin
+6  A: 

Because removing data from the repository breaks the basic premise of source control, that being that it is possible to reproduce all previous states and changes to the source tree. If you want to obliterate something from version control, you're probably "Doing It Wrong", as they say.

Sparr
+5  A: 

The entire point of source control is to have a complete history of what your repository looks like. The obliterate command defeats this purpose of source control, and it's a misfeature in all version control systems that have it.

SVN has cheap copying and cheap branching that doesn't require a full copy of the file--just the changed bits. Its central repository is usually very manageable in size, making this misfeature unnecessary.

JSBangs
On the other hand, what about the argument that "It's my damn repository, I should be able to do what I want"? _Should_ the software be able to decide for you?
John
@John: but it most likely isn't your own repository, you are sharing it with other people who rely on the fact that revision X is really revision X. Nothing stops you from forking SVN and giving it another name, but people will probably prefer the "safe" version.
Otto Allmendinger
The other people will do what I tell them ;)
John
You can always obliterate a file by logging in as an admin on the svn server and doing shenanigans in the repository itself.
JSBangs
Hardly a good approach though. I know open-source people don't like good UI, but having to take down the repo and rip it apart is taking it a bit far :)
John
+6  A: 

I use various version control systems for about 15 years now and never needed a feature like this.

I wonder what the reasons are that you want that feature:

  • disc space? Hard to believe considering the price of disc space
  • commited a password to version control? Well that will teach you. Go and change the password
  • speed of the repository? Doesn't sound so, but if I would consider a completely different system with supposedly better performance.
Jens Schauder
o Committed your entire financial records to an open-source system?
John
@John: "Oops" wouldn't begin to say it... ;-)
T.J. Crowder
"Moron!" would be a start.
Matthew Whited
@Matthew: People do make mistakes.
Dimitri C.
Especially when a non-programmer uses SVN. They often struggle to grasp the concepts even using visual tools, and commit all kinds of rubbish.
John
I know people make mistakes. That wouldn't stop me from using my right to free speech in the form of sarcasm and calling them a moron either. (I would even call myself a moron if I did something like this. And I do every time I accidently check in a password.)
Matthew Whited
You _don't get_ free speech on SO. My comment got deleted on another post because I said it was dumb someone assumed I was running Linux.
John
+3  A: 

It is possible to reduce the size of a SVN repository by doing a dump and load. Essentially if you say that you never want to revert to something more than a couple years old it is possible to dump the repository, filter based on time, then reload the dump. Wanting to get rid of a single file due to size is probably an indication that the file didn't really belong in a source control system in the first place.

tloach
On that note, why are you checking third-party libraries into your repository? If you absolutely must keep them in your system, have a separate repository for third-party libs and use `externals` to link them into your source tree.
bta
+4  A: 

There is some scripting which helps you obliterate data. Follow this mailing list thread for more info.

It's a hard way to do it as the essence of version control is not losing data, as opposed to deleting it permanently. But if you prune once a year or something like that it can be done.

extraneon
Can you give an estimation about how long such an operation typically takes?
Dimitri C.
Not really, but it involves modifying the tables, and then dumping and reading the repository. So it won't be fast but might possibly be automated.
extraneon
+7  A: 

There is a fair amount of discussion of svn obliterate on the problem ticket at the Apache Subversion site, most of it ending about 2008. There seems to be general agreement that it's a good capability to have, although its use should be rare.

There are two main reasons to want it.

First, checking in confidential information can be a problem. Leaving it in there, deleted, is not necessarily an option, depending on the level of confidentiality and exposure of the repository.

Second, checking in a large amount of stuff that shouldn't be checked in can drastically increase the size of the repository. Disk space is generally cheap nowadays, but it isn't unlimited, and there are other ways file space can matter. If it's necessary to send a repository over a net connection, that's extra time which may or may not be important. There can be real advantages to being able to burn a CD-ROM or DVD-ROM that contains the whole repository.

Therefore, it's a useful capability which is currently done by dumping, filtering, and reloading the repository. This is error-prone according to reports I've seen, can be slow, and requires shutting down the repository.

Obviously, it's not a high-priority feature for the Subversion team, given that what it's needed for quite a few years is somebody to do the work to come up with a design and implement it. After all, it should be done very rarely, and there is a workaround. However, anybody who wants to do a whole lot of work on Subversion could provide a patch that would (if good enough quality) probably be implemented.

David Thornley
+2  A: 

Obliterate is not an essential feature of Subversion, because it actually breaks the basic principles of version control (which is: to record all history).

And it isn't an essential feature because there are workaround to get this done anyway (using svnadmin and filtering).

Also, the feature is currently heavily worked on. See this post for details.

Stefan