views:

81

answers:

2

Hello,

This might not be a hard core programming question, but it's related to some of the tools used by programmers I suspect.

So we're a bunch of people each with a bunch of documents and a bunch of different computers on a bunch of operating systems (well, only 2, linux and windows). The best way these documents can be stored/managed is if they were available offline (the laptop might not always be online) but also synchronized between all the machines. Having a server with extra reliable storage be a "base repository" seems like a good idea to me.

Using a SCM comes to my mind and I've tried Subversion, and it seems to be a good thing that it uses a centralized repository - but:

  • When checking out the total size of the checkout is roughly double the original size.
  • Big files or big repositories seem to slow it down.

Also I've tried rsync, which might work - but it's a bit rough when it comes to the potential conflict.

Finally I've tried Unison (which is a wrapping of rsync, I think) and while it works it becomes horribly slow for the big directories we have here since it has to scan everything.

So the question is - is there a SCM tool out there that is actually practial to use for a big bunch of both small and big files? If thats a NO - does anyone know other tools that do this job?

Thanks for reading :)

+2  A: 

You can try on of the distributed version control systems, like Mercurial, Git or Bazaar. Seems that one of those is perfect for what are you trying to accomplish.

Joel Spolskey has a great little mercurial tutorial here: hginit.com. Thanks camainc.

vladv
Mercurial is a great tool, and it has the distributed part down solid. For the Windows users, TortoiseHG is a great GUI for it: http://tortoisehg.bitbucket.org/
camainc
Joel Spolskey has a great little tutorial: http://hginit.com/
camainc
+1. At work we change from ClearCase to Mercurial and the tests are very good.It's a lot faster for code but don't know about documents (where i guess diff/changesets can't always be applied easily...)
Sebastien Lorber
@jalf: added Bazaar to the answer, thanks.
vladv
Thank you for suggesting alternatives - I'm going to check out Mercurial !
tsunade
A: 

Some details will allow us to provide a more meaningful answer. For instance:

What types of documents? Are you dealing with images, Word documents, text files? All or none of the above?

Subversion (and any source control system worth its salt) works by saving only the deltas for checkins. That is, when you check in a file, only the differences between that file and the previous version are saved. This makes it easier to save space. Checking in a 1MB Photoshop that has a few pixels changed will take up less repository space that an entirely new document. This is typically file-type agnostic (ie, it works for binaries as well as text).

If your checkouts are resulting in files that are larger than what was checked in, I'd say you have some sort of configuration or process problem. If you check in a 200KB file, you will receive a 200KB file on check out. Could you describe your checkout/modify/checkin process?

SVN, TFS and others are used on very large scales in many different environments, and it's an easy, free and very reliable solution. However, if your audience is predominantly non-programmers, a more user-friendly SCM may be a better choice.

David Lively
Thank you for your insight!It's a bit of everything, actually.Some of our people take alot of photos, and they're the primary concern when it comes to the double size I was writing about. To add detail on that, what happened with SVN was that I had checked out say 2 JPEGs totaling 4 MB, but then my .svn dir was also 4 MB. And it might very well be a configuration issue if that sounds impossible, but I should've been running a mostly default config...
tsunade
Subversion keeps a copy of all the content in the .svn/text-base directory in order to detect changes/do diffs without comparing with the repository itself. This means subversion checkouts by default takes twice the size of the repository. I don't know if it's possible to turn this off somehow.
MatsT
I'd forgotten about the shadow copy.
David Lively