tags:

views:

250

answers:

6

I am working with an SVN repository that is over 3 years old, contains over 6,100 commits and is over 1.5 GB in size. I want to reduce the size of the SVN repository (I'm not talking about the size of a full SVN export - I mean the full repository as it would exist on the server) before moving it to a new server.

The current repository contains the source code for all of our software projects but it also contains relatively large binary files of no significance such as:

  • Full installers for a number of 3rd party tools.
  • .jpg & .png files (which are unmodified exports of PSDs that live in the same folder).
  • Bin and Obj folders (which are then 'svn ignored' the next commit).
  • Resharper directories.

A number of these large files have been 'SVN deleted' since they were added, creating a further problem of identifing the biggest offenders.

I want to either:

  • Create a new SVN repository that contains only the code for all of the software projects - it is really important that the copied files maintain their SVN history from the old repository.
  • Remove the large binary commits and files from the existing repository.

Are either of these possible?

A: 

Isn't this just a different problem, with an extra step? I.e. you need to locate files that you consider to be large and binary, and then check if they are indeed managed by SVN or have been built locally (or imported from the parallel asset system, if it's already in place).

So, just find the files, then do svn info on them to find out if they're part of the repository.

unwind
The SVN repository has been alive for over 3 years and during that time a large percentage of the files I'm referring to have been 'SVN deleted'. There is also the problem of large binary files that were in flux during development (like large PSDs) that have since then solidified and will no longer change - so there may be 20 MB in deltas across varies commits for such a file (which I'm not sure how to find).
CuriousCoder
I have substantially updated the question based on your answer to make sure I'm communicating the situation correctly. I hope it helps clarify a number of points. Thanks for the initial answer.
CuriousCoder
A: 

Could you give an explicit example, something like:

dirA
 +- txtfileA [rev1] [rev2] [rev3]
 +- txtfileB [rev1] [rev2]
 +- binfileC [rev1]
dirB
 +- binfileD [rev1] [svndelete]
dirC
 +- dirD
    +- binfileE [rev1] [svnmove]
...

For files like binfileC, you should be able to use svn propget svn:mime-type to check if the type is not "text/".

What other specific files do you need to find?

martin jakubik
Does SVN automatically set svn:mime-type? If so I think that's a great starting point. Trying to remove all non 'text' could work.
CuriousCoder
I just found this:Subversion also helps users by running a binary-detection algorithm in the svn import and svn add commands. These commands will make a good guess and then (possibly) set a binary svn:mime-type property on the file being added.(from: http://svnbook.red-bean.com/en/1.5/svn.forcvs.binary-and-trans.html)So, with luck... it could work.
martin jakubik
+3  A: 

You will have to use svnadmin dump to get a dump file of your current repository and possibly svndumpfilter to process the dump file. You can also manually modify the dumpfile as long as you're carefull.

It's probably not going to be a quick and easy job, but it can be done. I've done something similar, only to a much smaller repository. I had a repo with about 150 revisions that took about 600MB.

Make a dump from your current repository, make the necessary changes and try to load the modified dumpfile in a new repository. Then check the new repository to make sure everything is still making sense (History is still correct, no weird changes in paths, ...).

Otherside
A: 

If you deleted files from the repository using "SVN Delete", you didn't actually deleted the files. This would be the beauty of the SVN. Once a file is added to the repository, it is there forever (unless using dump & load). Upon "deleting" the files, you actually create a new revision that marks the deletion, but the files continue to exist in previous revisions.

I've done some dump & load, but to a much much bigger repository. Around 60,000 (!!!) revisions. It took time but at the end, after careful loading, the repository is again built.

Your only way is to list the revisions that the files were added, modified and deleted. Then dump the revisions in between, and load them in the right order. BE AWARE, there is no room for mistakes. If you make a mistake, you will have to start over. Dump & load from the start.

My suggestion, if the large files are such a problem, consider creating a newly fresh repository with no history. Keep the old one for history comparison, and start working from fresh.

Good Luck.

Oded
A: 

Just a small thought, you say that the current state of the repository (the current HEAD) is good, i.e. the large binary files have been svn delete'ed in the past. Therefore your issue is purely the size of the repository?

I know you said you would like to keep all the commit history, but as an option, you could do two dumps, one for the whole revision history, and one for the current HEAD revision.

If you put the full dump on to a DVD for example you would have the data available if you ever needed it, but you could then delete the whole repository and svn load the revision dump, leaving you with a small clean repository.

it is also possible to dump from a specific revision onwards, rather than just the head, so for example you could keep the last 3 months of revisions and dump everything older on to a DVD....

BParker
+2  A: 

Otherside is right about svnadmin dump, etc. Something like this will get you a rough pointer to revisions that added lots of data to your repo, and are candidates for svndumpfilter:

for r in `svn log -q | grep ^r | cut -d ' ' -f 1 | tr -d r`; do
   echo "revision $r is " `svn diff -c $r | wc -c` " bytes";
done

You could also try something like this to find revisions that added files with a particular extension (here, .jpg):

svn log -vq | egrep "^r|\.jpg$" | grep -B 1 "\.jpg$"
Matt McHenry