views:

3658

answers:

12

I want to split a directory from a large Subversion repository to a repository of its own, and keep the history of the files in that directory.

I tried the regular way of doing it first

svnadmin dump /path/to/repo > largerepo.dump
cat largerepo.dump | svndumpfilter include my/directory >mydir.dump

but that does not work, since the directory has been moved and copied over the years and files have been moved into and out of it to other parts of the repository. The result is a lot of these:

svndumpfilter: Invalid copy source path '/some/old/path'

Next thing I tried is to include those /some/old/path as they appear and after a long, long list of files and directories included, the svndumpfilter completes, BUT importing the resulting dump isn't producing the same files as the current directory has.

So, how do I properly split the directory from that repository while keeping the history?

EDIT: I specifically want trunk/myproj to be the trunk in a new repository PLUS have the new repository include none of the other old stuff, ie. there should not be possibility for anyone to update to old revision before the split and get/see the files.

The svndumpfilter solution I tried would achieve exactly that, sadly its not doable since the path/files have been moved around. The solution by ng isn't accetable since its basically a clone+removal of extras which keeps ALL the history, not just relevant myproj history.

BUMP C'moon, there must be someone who definitely knows if this is doable or not, and how!

+2  A: 

Why not replicate the entire repository, dump it in to a new one. Then branch out the trunk, delete the head and merge the portions you want back in to the trunk from the branch. Then you have kept the history and split out the parts you want to a new repository.

  1. Dump to /trunk
  2. Branch /trunk to /branches/trunk
  3. Delete /trunk
  4. Merge /branches/trunk/whatever back in to /trunk or /trunk/whatever

This way you have kept all the history, and selectively picked the parts you want.

ng
I can't seem to get it to work, can you add more specific commands to do that. It just skips the non-existent files, so I'm probably doing it wrong.Btw, how is that different from replicating the repo and deleteting everything else besides my dir? I also want to get rid of non-related history etc?
Tuminoid
There is no difference in just removing what you don't want. However, if you want the new repository /trunk to be the old repositories /trunk/whatever then you need to copy the full /trunk of the dump to /branches the copy back only what you want to /trunk, ill add another answer with specifics.
ng
A: 

The specific commands are as follows, I am going to assume the repository is hosted on a http(s):// server, although the same commands will work for svn:// or file://.

svnadmin dump /path/to/repository > dumpfile  
svnadmin create /path/to/new_repository 
svnadmin load /path/to/new_repository < dumpfile 
svn co https://localhost/svn/new_repository_url new_repository_checkout 
cd new_repository_checkout 
svn move https://localhost/svn/new_repository_url/trunk  https://localhost/svn/new_repository_url/branches/head -m "Moving HEAD to branches" 
svn move https://localhost/svn/new_repository_url/branches/head/whatever https://localhost/svn/new_repository_url/trunk -m "Creating new trunk" 
svn update 
cd branches 
svn remove head
svn commit

You should now have the part you want from the old repository as the trunk of the new one.

ng
This is still the "keep history of everything" solution.. I need a solution that replicates the spirit of the svndumpfilter solution :/
Tuminoid
+2  A: 

This could potentially help you: Quote from http://svnbook.red-bean.com/en/1.5/svn.reposadmin.maint.html#svn.reposadmin.maint.replication

In Subversion 1.5, svnsync grew the ability to also mirror a subset of a repository rather than the whole thing. The process of setting up and maintaining such a mirror is exactly the same as when mirroring a whole repository, except that instead of specifying the source repository's root URL when running svnsync init, you specify the URL of some subdirectory within that repository. Synchronization to that mirror will now copy only the bits that changed under that source repository subdirectory. There are some limitations to this support, though. First, you can't mirror multiple disjoint subdirectories of the source repository into a single mirror repository—you'd need to instead mirror some parent directory that is common to both. Second, the filtering logic is entirely path-based, so if the subdirectory you are mirroring was renamed at some point in the past, your mirror would contain only the revisions since the directory appeared at the URL you specified. And likewise, if the source subdirectory is renamed in the future, your synchronization processes will stop mirroring data at the point that the source URL you specified is no longer valid.

The Problem of course is losing the pre-rename history...

Alphager
A: 

I see this is quite old now, but does adding "--skip-missing-merge-sources" help any? It seems like it might...

Sorry, but no. I come up with either empty dump or with 'invalid copy source path' errors like before :(
Tuminoid
A: 

If you don't need the entire history you can pick it up from just after the error. If your error was at revision 412 then you can try picking it up right after with:

svnadmin dump /path/to/repo -r 413:HEAD > largerepo.dump

I realize this may not be a perfect solution either but it may be good enough in your case.

You may want to also just do this all in one step

svnadmin dump /path/to/repo -r 413:HEAD | svndumpfilter include my/directory > mydir.dump
Brawndo
+2  A: 

I had a similar problem splitting a repository ..

svndumpfilter: Invalid copy source path /dir/old_dir

What I did to get around the problem was to include the additional old directories that is was requesting, or that you know you moved. In my case I had moved 3 directories into another directory.

eg. Moved Folders A,B,C in to Folder D

cat project.dump | svndumpfilter include A B C D > new.dump

This seemed to solve my problem. I was able to separate Folder D from the rest of the Repo. On the flip-side, when excluding D I did not get the error, I would guess because removing D didn't require the links/history to A,B,C

This solved my problem as well, and I hope others see it. We had an issue where our main work folder was renamed from "Abc_Fun" to "AbcFun" to "Fun[Abc]" and so on, so including the extra paths wasn't an issue.
DonaldRay
+6  A: 

This problem occurs when one of the directories/files included by svndumpfilter originally was copied or moved from a section of the tree that is not being included.

To solve the problem use this script: svndumpfilter3

auriarte
That script got me past the described problem in creating the new dump file, but I see a different problem when I try to load the dump into a new repository.<<< Started new transaction, based on original revision 1868svnadmin: File not found: transaction '1867-1fv', path 'dm/dm_trunk' * adding path : dm/dm_trunk ...
Allan Anderson
A: 

Some more info about svndumpfilter and how to fix - http://blog.rlucas.net/uncategorized/some-gotchas-with-using-svndumpfilter/

Or you can try svndumpfilter replacement script, now called as svndumpfilter2 - http://cogo.wordpress.com/2009/03/10/problems-with-svndumpfilter/

I didn't tried that script, coz i need some time to make a repo backup, to test in on that (i have a backup dump to play with but on Windows, and it is a linux script).

Alex
that new script really helped me, dump was as it should be.. no errors, no warning. SVNADMIN LAOD went ok, too. Our programmer told new repo is as it should be. So 5*
Alex
Helped.. yes, to dump folder from dump with no error, and even to load it into empty repo.But beware: your new repo with this kind of dump is not ok.Some of your data is lost and can be a huge problem whil using build server (Hudson or Cruise Cntrol, for example). You'll probably have:Could not access revision times. [500, #0][client 10.0.0.71]orUnable to deliver content. [409, #0][client 10.0.0.229]So think twice and test it thee time, before going production.
Alex
+1  A: 

I'm also looking for an answer on this question (having to deal with it myself). Based on Alex' answer, I found http://furius.ca/pubcode/pub/conf/common/bin/svndumpfilter3.html which claims to fix some of the svndumpfilter2 issues. I believe it is a partial solution.

The good:

A rewrite of Subversion's svndumpfilter in pure Python, that allows you to untangle move/copy operations between excluded and included sets of files/dirs, by converting them into additions. If you use this option, it fetches the original files from a given repository.

Concern:

Important

Some people have been reporting a bug with this script, that it will create an empty file on a large repository. It worked great for the split that I had to do on my repository, but I have no time to fix the problem that occurs for some other people's repositories

Adriaan
+1  A: 

This is a wild and crazy stab in the over-complicating-things dark but what about importing the SVN repo into git using git-svn/tailor, splitting off the directory using git-split, then exporting it back to svn with git-svn/tailor?

rjp
+1 for the ingenuity
jalexiou
+1  A: 

I encountered this problem and ended up using svndumpfilter2 : http://svn.tartarus.org/sgt/svn-tools/svndumpfilter2?view=markup .

Specifically, this command: sudo svnadmin dump /home/setup/svn/repos/main_repl | sudo ./svndumpfilter2.py /home/setup/svn/repos/main_repl Development QA compliance > ~/main_repl_dump.trim

I did get the out of memory error mentioned, however, since I was running svn on a VM, I just bumped the memory up to 2G. While I realize that this may not be an option for everyone, I noticed that it ran much faster than it had with 512M. ( 2G probably wasn't necessary. )

Currently, it is processing revision 18,631.

In case anyone wonders, the reason why I needed to break out part of the repo was because we were creating tags/copies for distribution to implementation of files in another path of the repo. For some reason, this process was causing the repo to balloon to huge proportions. ( We're at 17G now. )

I'm doing this on a replication repo of svn, version 1.5.6, on Debian Lenny, 5.0.4.

cognitiaclaeves
A: 

Hi Guys,

just ran into this problem and wrote a little script to retry dumping until all invalid source paths are resolved.

#!/usr/bin/env ruby

require 'open3'
include Open3

paths = [ "/your/path" ]
command = ""

new_path = "xx"
while (! new_path.nil?)
lines = nil
popen3(" svndumpfilter include #{paths.join(' ')} > svn.result.dump < svn.original.dump") do |i, o, err|
  i.close
  puts "Processing, please wait ..."
  lines = err.readlines
end

 new_path = nil
 lines.each do |line|
  if line =~ /Invalid copy source path '(.*)'/
    new_path = $1
  end
 end
 puts "Adding #{new_path}"
 paths << new_path
end
next2you
Just a comment, the dumping was succesfull, but the reimport did not succeed. So, no luck there. (and switched to git last week with git svn clone)
next2you