tags:

views:

68

answers:

4

I have a big repository, 100,000+ revisions with a very high branching factor. The initial fetch of the full SVN repository using git-svn has been running for around 2 months and it's only up to revision 60,000. Is there any way to speed this thing up?

I'm already regularly killing and restarting the fetch due to git-svn leaking memory like a sieve. The transfer is occurring over the local LAN, so link speed shouldn't be an issue. The repository is on a dedicated machine backed by dedicated fiber channel arrays so the server should have plenty of oomph. The only other thing that I can think of is do the clone from a local copy of the SVN repository.

What have other people done in similar circumstances?

+1  A: 

I think you are on the right track

Local file access could give you 1 to 2 order speedup.

Not sure if running git svn against a bdb or files based svn backend would be faster.

kevpie
A: 

I've not dumped anything that large via git-svn before.

Importing locally certainly sounds worth a try.

If there were some way to import the repo in smaller chunks (say 1000 revisions at a time) that might help git-svn's leaks to not thrash the box (?).

ldav1s
I've got a script which kills and relaunches git-svn every hour
MrEvil
A: 

I've downloaded a close-to-100,000-revision SVN repository using git-svn before. It took around 48 hours and was not over a local LAN. Admittedly, you did say that your repository has a high branching factor, while the repository I downloaded did not (although it did have several dozen branches)

I'd suggest working on figuring out where the bottleneck lies. Are git-svn and its subprocesses using 100% CPU? Are the disk lights on the client or the SVN server constantly lit? How much bandwidth is being used? Once you know what the limiting factor is, you can work on figuring out how to fix it.

Daniel Stutzbach
We have at least several hundred branches, and whenever git-svn encounters a branch it wants to replay the entire history r0-rwhatever.
MrEvil
@MrEvil: After some digging with Google, it sounds like that was a problem in older versions of Git, but it shouldn't reply the entire history for each branch in the latest version. I haven't verified that myself. Which version are you running?
Daniel Stutzbach
1.7.0.3. I'm making a local mirror of my SVN repository right now usig svnsync. I've only been at it for about 4 hours and I'm already at the 60k mark. I'm going to try: http://github.com/barrbrain/svn-dump-fast-export
MrEvil
+1  A: 

At work I use git-svn against a ~170000 revision SVN repo. What I did was use git-svn init + fetch -r... to limit my initial fetch to a reasonable number of revisions. You must be careful to choose a revision that is actually in the branch you want. Everything is fully functional even with truncated history except git-blame, which obviously attributes all the lines older than your starting rev to the first rev.

You can further speed this up with ignore-paths to prune out subtrees that you don't want.

You can add more revisions later, but it will be painful. You will have to reset the rev-map (sadly I even wrote git-svn reset and I can't say offhand if it will remove all revisions, so it may be by hand). Then git-svn fetch more revisions and git-filter-branch to reparent your old root to the new tree. That will rewrite every commit but it won't affect the source blobs themselves. You have to do similar surgery when people undertake big reorgs of the svn repo.

If you actually need all of the revisions (for example for a migration) then you should be looking at some flavor of svn-fast-export + git-fast-import. There may be one that adds rev tags to match git-svn, in which case you could fast-import and then just graft in the svn remote. Even if the existing svn-fast-export options don't have that feature, you can probably add it before your original clone completes!

Ben Jackson