views:

2533

answers:

8

My project is currently using a svn repository which gains several hundred new revisions per day. The repository resides on a Win2k3-server and is served through Apache/mod_dav_svn.

I now fear that over time the performance will degrade due to too many revisions.
Is this fear reasonable?
We are already planning to upgrade to 1.5, so having thousands of files in one directory will not be a problem in the long term.

Subversion on stores the delta (differences), between 2 revisions, so this helps saving a LOT of space, specially if you only commit code (text) and no binaries (images and docs).

Does that mean that in order to check out the revision 10 of the file foo.baz, svn will take revision 1 and then apply the deltas 2-10?

+3  A: 

Subversion only stores the delta (differences), between 2 revisions, so this helps saving a LOT of space, specially if you only commit code (text) and no binaries (images and docs).

Additionally I´ve seen a lot of very big projects using svn and never complained about performance.

Maybe you are worried about checkout times? then I guess this would really be a networking problem.

Oh, and I´ve worked on CVS repositories with 2Gb+ of stuff (code, imgs, docs) and never had an performance problem. Since svn is a great improvement on cvs I don´t think you should worry about.

Hope it helps easy your mind a little ;)

Decio Lira
+2  A: 

The only operations which are likely to slow down are things which read information from multiple revisions (e.g. SVN Blame).

RB
+15  A: 

Subversion stores the most current version as full text, with backward-looking diffs. This means that updates to head are always fast, and what you incrementally pay for is looking farther and farther back in history.

Brad Wilson
+5  A: 

I personally haven't dealt with Subversion repositories with codebases bigger than 80K LOC for the actual project. The biggest repository I've actually had was about 1.2 gigs, but this included all of the libraries and utilities that the project uses.

I don't think the day to day usage will be affected that much, but anything that needs to look through the different revisions might slow down a tad. It may not even be noticeable.

Now, from a sys admin point of view, there are a few things that can help you minimize performance bottlenecks. Since Subversion is mostly a file-based system, you can do this:

  • Put the actual repositories in a different drive
  • Make sure that no file locking apps, other than svn, are working on the drive above
  • Make the drives at least 7,500 RPM. You could try getting 10,000 RPM, but it may be overkill
  • Update the LAN to gigabit, if everybody is in the same office.

This may be overkill for your situation, but that's what I've usually done for other file-intensive applications.

If you ever "outgrow" Subversion, then Perforce will be your next step up. It's hands down the fastest source control app for very large projects.

hectorsosajr
+2  A: 

We're running a subversion server with gigabytes worth of code and binaries, and it's up to over twenty thousand revisions. No slowdowns yet.

Hans Sjunnesson
+19  A: 

What type of repo do you have? FSFS or BDB?

(Let's assume FSFS for now, since that's the default.)

In the case of FSFS, each revision is stored as a diff against the previous. So, you would think that yes, after many revisions, it would be very slow.

Howver, this isn't the case. FSFS uses what are called "skip deltas" to avoid having to do too many lookups on previous revs.

(So, if you are using an FSFS repo, Brad Wilson's answer is wrong.)

In the case of a BDB repo, the HEAD (latest) revision is full-text, but the earlier revisions are built as a series of diffs against the head. This means the previous revs have to be re-calculated after each commit.

For more info: http://svn.apache.org/repos/asf/subversion/trunk/notes/skip-deltas

P.S. Our repo is about 20GB, with about 35,000 revisions, and we have not noticed any performance degradation.

msemack
In your repo of 20GB, is it stored as FSFS or BDB?
Scott Markwell
It's FSFS (at least it is now). For the 1st year or so of our repo's lifespan it was BDB (FSFS didn't exist yet). As some point we did a dump/load cycle to convert to FSFS. We weren't having any specific problems with BDB, but FSFS seems architecturally better (hence FSFS is now the default).
msemack
That's an interesting piece of information. I have a repository with 73000 files (roughly 350 MB) and it's unbelievable slow. I have to inquire what they are using.
Till
As a side-note, the PHP repository is stored on Subversion with (at time of writing) 295,197 revisions. http://svn.php.net/repository/php/php-src/trunk/
jevon
A: 

I am not sure..... I am using SVN with apache on Centos 5.2. Works ok. Revision number was 8230 something like that... And on all client machines Commit was so slow that we had to wait at least 2min for a file that is 1kb. I am talking about 1 file that has no big filesize.

Then I made a new repository. Started from rev. 1. Now works ok. Fast. used svnadmin create xxxxxx. did not check if it is FSFS or BDB.....

A: 

Maybe you should consider improving your workflow.

I don't know if a repos will have perf issues in these conditions, but you ability to go back to a sane revision will.

In your case, you may want to include a validation process, so a team commit in a team leader repo, and each of them commit to the team manager repo who commit to the read-only clean company repos. You have make a clean selection at it stage of what commit must go to the top.

This way, anybody can go back to a clean copy, with an easy to browse history. Merge are much easier, and dev can still commit their mess as much as they want.

e-satis