tags:

views:

732

answers:

7

I've long been using subversion (and before that CVS) to store not only source files but later the LaTeX files for my research and eventually some word files and other materials.

I like the fact that I can work with multiple computers and synchronize the latest things from each, while still being able to maintain some hierarchy of my backups and projects.

I'm sure I can't be the only one doing it.

I am now thinking of using CVS or subversion as a primary backup mechanism for a family computer that includes many frequently changing office documents. Is this a good/bad idea? The main issue I can think of is that the files are considered binary so the server will bloat a little.

However, I would like to hear about other things I should be aware of or careful about.

In addition, where can I find good examples of scripts that can automate checkins?

+1  A: 

Actually this is not a bad idea. Subversion uses xdelta to store differences even in binary documents, so the server is not so overloaded with different versions.

As for the scripts, just svn update when you enter a machine, and don't forget to do a svn commit when you leave. I've been doing this for my own data for several years and no problem at all.

Diego Sevilla
A: 

Why not just use a backup tool for your documents? If you don't need revisions and only need the latest one, then a backup tool/scheduled backups is probably best.

If you want to get at revisions, then go ahead with your plan. The only thing I could suggest in that case is a scheduled task that does a commit at the root of all your files. The commit comment could be "automated commit [date/time]" and I would have a different user name for the tasks.

Tim
+5  A: 

Actually, I don't really recommend doing this, having been down this path before. First of all, I think that it goes without saying that if you use an SCM repository for such a task, use SVN instead of CVS! This goes double for a situation like this where it is almost guaranteed that you'd be storing binary data, which is a huge pain with CVS.

Anyways, I used to store a lot of non-programming related stuff in SVN repositories myself, but now only use time machine to back up the files that I care about, and a small web-based repo for my dotfiles and such. I think that the key thing which gets in the way is that you don't really have the same attitude with normal data files as you do with source code. It's very unlikely, in most cases, that you're interested in diff'ing two versions of a report you wrote, or reverting your working copy version to some draft you wrote two weeks ago. With such documents, you generally only care about the latest version, and the tools and security that SCM provides tend to be more annoying than helpful in this regard, especially when it comes to check-in comments, merging, and so on.

Also, I highly un-recommend (is that a word? ;) ) making non-programmers use SCM. The amount of explanation needed is too great for the tool to be of benefit for them, especially when applied to a task which the tool was not originally intended for. I've done this in a few environments where we thought it wouldn't be a problem, since the individuals in question were not stupid, and they were dealing with artifacts related to the software. But inevitably merge conflicts and other SCM "gotchas" resulted in confusion, and ultimately, phone calls to me during the evening hours.

I'd say you should look into document sharing portals like Sharepoint for collaborating office documents and such. They are better designed for dealing with these type of things without causing a lot of headache to non-technical folks, and can gracefully deal with version history, binary data, etc. This might be overkill for your family, but setting up a little portal to hold important data shouldn't be much of a problem -- you just need to look around a bit and find something that fits your needs.

Nik Reiman
A: 

I tried using TortoiseSVN on a domestic NAS drive, but it didn't work - I think because the disk was FAT32. I use SyncBack to maintain multiple copies of my files at home, since I'm not bothered about maintaining a revision history.

Ian Hopkinson
+1  A: 

Yes and no.

Assuming that you use version control (CVS or any other system) properly, it is a good way to keep track of old versions of files, and you are more likely to find that version of your file exactly before or after you did some particular change if you have it.

However version control does not protect you against disasters, such as actual loss of your data (power surge ate your machine with all its disks). So you need to backup your repository regularly to some safe media. Depending on how critical it is, just writing it to CDs/DVDs and another mobile disk stored somewhere else may be enough.

Also, bad things can happen to your version control repository. Due to bugs in the version control software, crashes, collisions, or the like, part of it may become corrupted. Worse, you may not actually notice it until you find out one day that some versions can not be recovered. So have some procedure for consistency-checking your repository before backups. And don't override all backups at once.

+3  A: 

You should investigate subversion's Autoversioning option. It allows you to, for example, set up a network share that any computer can see and write to, but which automatically performs the necessary commit actions whenever data is written. If someone accidentally deletes their document, it can be recovered using the subversion commands.

Whatsit
+2  A: 

I use Mercurial for saving office documents. It does everything I need -

  1. Keep track of old revisions.
  2. Protect against disaster. Since Mercurial is a distributed version control system (DVCS), I have a repository on my work machine, laptop, other laptop, and home machine.
  3. Allow commits "offline" because it's a DVCS.
  4. Simple web interface to show changes

... and much more. Most of what it does is useful to me as a programmer, but it also keeps things simple enough for document management.

Oh, and one major benefit - on the Mac, many "documents" are actually folders that have a bunch of files in them. With SVN, you would quickly clutter these folders with files that the application deleted but SVN wants to keep around. With Mercurial, when you delete a file, it gets "deleted" for that revision, but then if you check out a previous revision, it comes back! The perfect solution for Mac apps!

Dave
I use Mercurial and SVN for programming, but on the office side I'm a very heavy user of Microsoft Office Groove (distributed peer-to-peer collaboration repository) and LOVE what it's done for my group's productivity. But it's breaking - my repositories are exceeding 2GB and things are going haywire, and there's no search or history. These may be fixed in Sharepoint Workspace (Groove's successor). However, an open-source, platform-independent, Groove-like tool that uses Mercurial as the backend...that would be sweet.
David Cuccia