How do I find the tree that is closest to another tree?

views:

121

answers:

How do I find the tree that is closest to another tree?

Here's the scenario: I have a local git repository that mirrors the contents of another source control system (a proprietary one). I've written a script that periodically syncs my git branch with that system's latest copy of the same branch (called by another term in the other system but conceptually similar).

Now, suppose that in the other system, someone creates a branch from the branch I'm currently syncing and starts hacking on it. What I'd like to do is pull down the first version of that other branch, then find the commit in my git version of the main branch that is closest to the new branch. If I can do this, I'll know which commit from the main branch to make as the parent of this new branch.

This sounds to me like a problem of computing "tree distances". But as SHA1 hashes don't have a distance metric, is there another way to do this besides the obvious manual deep search on each commit to find out which one has the most number of similar blobs?

UPDATE: See below, found a domain-specific way to do it.

+1 A:

It's worse than that; in the general case you'll have to count edit distance on the blobs to see how similar they are.

Hoping this is a rare event, I would clone the git repository and start rolling back versions to locate the commit that is closest to the tree you wish to duplicate. It would be nice to think of using git bisect for this, but since there's no total ordering and no absolute concept of good or bad, I don't see how to avoid trying every commit.

Mininum edit distance is NP-hard as well, so you have a real pain in the ass here.

If you are lucky, in the other system, you can recover the date and time the new branch is created. Then maybe you can just grab the last commit before that timestamp?

Norman Ramsey 2009-04-11 01:39:00

+1 A:

Why not just work in your own branch and merge with the trunk when you need to make commits?

Sounds like you may need a Vendor Branch for the solution.

Chris Ballance 2009-04-11 01:39:14

I would, but this is going to be done in a "master git repository" for the company, and we have to do frequent merges between branches, which are hard in the proprietary system and easy in git. Thus, the best way to do it is to "mirror" the branch topology of the target system on git.

tophat02 2009-04-11 18:28:57

+2 A:

One total ass brained way to do it is to create patch files against each of the candidate branches and see which one is smallest.

ojblass 2009-04-11 01:44:00

Be warned its not an exact science...

ojblass 2009-04-11 01:52:55

+1 A:

Thanks for the answers!

It turns out I'm in luck with my particular application. The target system drops a description file that contains the files and version numbers that make up the current state of the branch. I commit these, so I can find all of these such files and use a simple scoring system to find out how "close" two of these files are to each other, positive scores meaning yours is newer, negative meaning the branch is newer. The pairing with the score closest to zero finds the commit that's most similar to the new branch.

I'm not going to mark this answer as the best one though, because it only applies to my situation.

All others, I was browsing around the git source code and found match_trees.c. This is currently used by the subtree merging strategy, but it has a nifty score_tree() method that could be surfaced to the user for this purpose.

tophat02 2009-04-11 18:34:36

ansaurus

tags:

views:

answers:

How do I find the tree that is closest to another tree?

related questions