ansaurus

Question

using subversion with a really really big site

Answer 1

+12 A:

Unless I'm mistaken after the first check in/out you only handle .diff files so you only have to download/upload the changes (not the whole file just the lines that have been changed as long as the file is ASCII) that exists between the files on the client/server.

The first comit/update will be horrendous though.

Chad Moran 2008-10-06 15:38:48

200mb shouldn't be too bad for the initial commit, provided you have a reasonable internet connection. You're right, svn will only send the changes from then on.

Ferruccio 2008-10-06 15:42:48

For most residential connections being half-duplex uploading that much will render your connection almost useless until you're done.

Chad Moran 2008-10-06 15:44:51

@Chad: I have to disagree here. I have no problem over cable or DSL.

Geoffrey Chetwood 2008-10-06 15:46:56

@Rich: I didn't say everyone I said for most residential connections. If they're half-duplex and you're uploading 200MB at your full upload potential your download is rendered almost useless. Though this isn't the topic of the OP.

Chad Moran 2008-10-06 15:57:43

@Chad: And I am just adding that I have two very mediocre residential connections I use regularly and this wouldn't be a problem on either one.

Geoffrey Chetwood 2008-10-06 16:06:52

Alternatively, you could commit it in stages, ala: "svn add --non-recursive projectbase; cd projectbase; svn add [a-e]*; svn commit", then "svn add [f-m]*; svn commit", and so on. It will take the same total time, but you won't risk failing at 99% into a 200MB commit.

Just Some Guy 2008-10-06 16:28:22

Almost no connections are half-duplex. Sometimes heavy uploads can "starve-out" TCP acknowledges on other, downloading connections, but that's not at all the same thing. It is true that most connections are asymmetric, with a much higher download rate.

wnoise 2008-10-06 16:41:20

And even binary files are updated with just a dif file, not only text files. Text is just a specialized subset of binary, and subversion handles everything that way.

cdeszaq 2009-03-12 13:31:39

Answer 2

A:

If lots of changes are being made frequently, why not have a cron entry that does a subversion update to keep your local copy up to date, say every 6 hours?

That way you're getting recent diffs (or none if it hasn't updated in a few hours) rather than the whole shebang.

EDIT: for clarification, if lots of changes are happening, but only on a few pages at a time, any given commit/update will be small; if they're being made to all/most of the site, then frequently keeping up to date will be important.

warren 2008-10-06 15:38:54

Answer 3

A:

it will only send the changes when you update or comment. You should be fine.

Bob Dizzle 2008-10-06 15:39:10

Answer 4

A:

Subversion only gets diffs/updates, so you only have the full checkout the first time you get it. Later updates you'll only get changes.

To assist in merges, it might be good to have two working copies - one pointed to the main codeline, one pointed at your task branch. That way you don't have to switch your working copy from one Subversion codeline to another - that can be expensive, like checking out the code to begin with.

Travis Illig 2008-10-06 15:39:54

Answer 5

A:

It will only download the files that have been touched between that time. However if you are going to be branching (as you should be) then you might be waiting a long time.

How much of the project is actually needed? I doubt that there is 200mb of source. If a lot of the data is resources that change very infrequently (ie images) then you might think about splitting the repositories into smaller projects.

graham.reeds 2008-10-06 15:40:04

Answer 6

+1 A:

As said before, commit/update transfers diffs only and is quite fast. Checkouts are more time consuming - use svn switch to jump between branches quickly.

Also, the HTTP/WEBDAV transport protocol is not very efficient, especially when dealing with lots of small files (e.g. source code :) ) - you could consider using svnserve instead.

200 MB of data should not be too much trouble for Subversion - but if disk space and efficient data transfer are really a problem you could also look into git or mercurial. Especially git is much more efficient, but you'll probably need a little more time to wrap your head around the concepts of distributed source control, and you have to live without fancy GUI tools for now (also the command line tools have become much more usable lately).

This link might be interesting, too: Website Auto Update

VolkA 2008-10-06 15:58:44

the transport protocol efficiency is a very important point

Jean 2008-10-08 15:40:26

Answer 7

+1 A:

I run sites that are totally around 5 GB or more. (and a build system that makes changes to many many files for each build). So, the delta would be around 200MB easily (and to a remote site). SVN handles it perfectly fine. It also depends on how good your apache can hold (if you are using apache).

Ram Prasad 2008-10-06 15:59:58

Answer 8

+5 A:

Another thing to bear in mind is that you can make copies of your checked out folders and they will still be valid working copies:

svn checkout http://server/path/to/repos my_working_copy
cp -a my_working_copy another_working_copy
svn status another_working_copy

That can save a lot of time/bandwidth if you need multiple working copies. It also makes branching and switching a lot faster

svn checkout http://server/path/to/trunk my_trunk
cp -a my_trunk my_branch
cd my_branch
svn switch http://server/path/to/branches/stable

As has been pointed out in other replies, you'll only have to download the differences between the trunk and branch.

Ken 2008-10-06 16:29:52

ansaurus

tags:

views:

answers:

using subversion with a really really big site

related questions