tags:

views:

374

answers:

8

Hi guys,

I'm a big subversion fan and am just about to take over a big site (200mb approx.) I've trimmed down the main site from an original size of 500MB!!

I'm about to check this site into a new subversion repository. The problem is, my subversion repository is remotely hosted so that another colleague can also work on the site.

I'm concerned about having to check in and out 200MB every time I have to make updates to the site.

Development is quite active so there will be lots of things changing on an ongoing basis.

Assuming I get everything checked in ok, will subversion ensure it's only download new/amended files/folders each time I do a new checkout or will I be waiting for 200MB to download every time?

+12  A: 

Unless I'm mistaken after the first check in/out you only handle .diff files so you only have to download/upload the changes (not the whole file just the lines that have been changed as long as the file is ASCII) that exists between the files on the client/server.

The first comit/update will be horrendous though.

Chad Moran
200mb shouldn't be too bad for the initial commit, provided you have a reasonable internet connection. You're right, svn will only send the changes from then on.
Ferruccio
For most residential connections being half-duplex uploading that much will render your connection almost useless until you're done.
Chad Moran
@Chad: I have to disagree here. I have no problem over cable or DSL.
Geoffrey Chetwood
@Rich: I didn't say everyone I said for most residential connections. If they're half-duplex and you're uploading 200MB at your full upload potential your download is rendered almost useless. Though this isn't the topic of the OP.
Chad Moran
@Chad: And I am just adding that I have two very mediocre residential connections I use regularly and this wouldn't be a problem on either one.
Geoffrey Chetwood
Alternatively, you could commit it in stages, ala: "svn add --non-recursive projectbase; cd projectbase; svn add [a-e]*; svn commit", then "svn add [f-m]*; svn commit", and so on. It will take the same total time, but you won't risk failing at 99% into a 200MB commit.
Just Some Guy
Almost no connections are half-duplex. Sometimes heavy uploads can "starve-out" TCP acknowledges on other, downloading connections, but that's not at all the same thing. It is true that most connections are asymmetric, with a much higher download rate.
wnoise
And even binary files are updated with just a dif file, not only text files. Text is just a specialized subset of binary, and subversion handles everything that way.
cdeszaq
A: 

If lots of changes are being made frequently, why not have a cron entry that does a subversion update to keep your local copy up to date, say every 6 hours?

That way you're getting recent diffs (or none if it hasn't updated in a few hours) rather than the whole shebang.

EDIT: for clarification, if lots of changes are happening, but only on a few pages at a time, any given commit/update will be small; if they're being made to all/most of the site, then frequently keeping up to date will be important.

warren
A: 

it will only send the changes when you update or comment. You should be fine.

Bob Dizzle
A: 

Subversion only gets diffs/updates, so you only have the full checkout the first time you get it. Later updates you'll only get changes.

To assist in merges, it might be good to have two working copies - one pointed to the main codeline, one pointed at your task branch. That way you don't have to switch your working copy from one Subversion codeline to another - that can be expensive, like checking out the code to begin with.

Travis Illig
A: 

It will only download the files that have been touched between that time. However if you are going to be branching (as you should be) then you might be waiting a long time.

How much of the project is actually needed? I doubt that there is 200mb of source. If a lot of the data is resources that change very infrequently (ie images) then you might think about splitting the repositories into smaller projects.

graham.reeds
+1  A: 

As said before, commit/update transfers diffs only and is quite fast. Checkouts are more time consuming - use svn switch to jump between branches quickly.

Also, the HTTP/WEBDAV transport protocol is not very efficient, especially when dealing with lots of small files (e.g. source code :) ) - you could consider using svnserve instead.

200 MB of data should not be too much trouble for Subversion - but if disk space and efficient data transfer are really a problem you could also look into git or mercurial. Especially git is much more efficient, but you'll probably need a little more time to wrap your head around the concepts of distributed source control, and you have to live without fancy GUI tools for now (also the command line tools have become much more usable lately).

This link might be interesting, too: Website Auto Update

VolkA
the transport protocol efficiency is a very important point
Jean
+1  A: 

I run sites that are totally around 5 GB or more. (and a build system that makes changes to many many files for each build). So, the delta would be around 200MB easily (and to a remote site). SVN handles it perfectly fine. It also depends on how good your apache can hold (if you are using apache).

Ram Prasad
+5  A: 

Another thing to bear in mind is that you can make copies of your checked out folders and they will still be valid working copies:

svn checkout http://server/path/to/repos my_working_copy
cp -a my_working_copy another_working_copy
svn status another_working_copy

That can save a lot of time/bandwidth if you need multiple working copies. It also makes branching and switching a lot faster

svn checkout http://server/path/to/trunk my_trunk
cp -a my_trunk my_branch
cd my_branch
svn switch http://server/path/to/branches/stable

As has been pointed out in other replies, you'll only have to download the differences between the trunk and branch.

Ken