views:

418

answers:

4

I am trying to use git as a frontend to a svn repository in order to be able to use git's nice features like simple branching, stashing etc.

The problem is that the SVN repository is quite large (8,000 revs) and contains lots of branches and tags (old as well as new).

It's a near standard layout, with a config containing fetch, branches and tags directives.

Since the oldest branch and tag refers to revision 10, it means that every "svn fetch" reads the entire repository history from revision 10 and forward, which can takes hours on the slow connection.

If I only track trunk, then it's fine, but I still want to make git aware of new branches and tags.

I usually look at git log -1 on the branch I'm at and gets the SVN revision from the comment, so I can do "git svn fetch -r7915:HEAD" or similar. I guess that's what "git svn fetch --parent" does. But why do I need to do this?

I'm on Windows, and use TortoiseGit which has quite nice support for git-svn, but since TortoiseGit only runs "git svn fetch" I'm kind of stuck.

Am I doing something wrong? I expect "svn fetch" to be a fast operation when the first "svn clone -s" is complete.

+1  A: 

You're using it correctly: the initial import of a Subversion repository with lots of history will be very slow.

The bad news is because Subversion's branches and tags are only directories, git-svn is forced to take the pessimistic route of reading each branch from its head all the way back to the first revision. Yes, if you've been disciplined in your use of Subversion, this will result in many fetches of the same data, but real-world usage patterns make this an unlikely case.

Start the clone in the evening and come back to a nice git repo the next morning!

Once you've cloned, git svn fetch even warns you:

This may take a while on large repositories

Subversion is simple and stupid, so git has to take things slowly.

Greg Bacon
Thanks for answering. I have no problem that the initial clone takes time, but that every fetch operation after that must go through almost all revisions seems wrong.
Henrik Steensland
+1  A: 

If you do not need to have full history in the git repository, I recommend you take a look at the "git + svn" approach, detailed in the link below, instead of the standard git-svn integration. Your initial import into git should be very quick, since you will not be importing history.

Make sure to read the section entitled "Benefits, Drawbacks, and Lessons Learned".

http://www.lostechies.com/blogs/derickbailey/archive/2010/02/03/branch-per-feature-how-i-manage-subversion-with-git-branches.aspx

Jordan
+2  A: 

Thanks for the answers. They did not really help me, though.

This command is the best solution so far:

git svn log --all -1 | sed -n '2s/r\([0-9]\)./\1/p' | xargs --replace=from git svn fetch -r from:HEAD

It uses git svn log --all to find the highest SVN revision number fetched so far, and fetches everything from that point onwards. I wish git svn fetch would behave have an option to behave like this. Unless the SVN revisions are changed, there is no reason git svn should fetch the same revisions over and over each time.

Henrik Steensland
Thanks for putting this out here. A lot of people are looking for ways to use Git with other source control systems.
Jordan
A: 

Do you have symlinks in the SVN repo? If not, have you tried this setting:

svn.brokenSymlinkWorkaround

This disables potentially expensive checks to workaround broken symlinks checked into SVN by broken clients. Set this option to "false" if you track a SVN repository with many empty blobs that are not symlinks. This option may be changed while git svn is running and take effect on the next revision fetched. If unset, git svn assumes this option to be "true".

inger