tags:

views:

1011

answers:

4

I need to calculate the edit distance between trees for a personal project of mine. This paper describes an algorithm, but I can't make heads or tails out of it. Are you aware of any resources that describe an applicable algorithm in a more approachable way? Pseudocode or code would be helpful, too.

A: 

I think you're just looking for the shortest path between two verticies. If so, you can use Dijkstra's algorithm. (The wikipedia page has pseudocode on it for you to refer to.)

Kevin
Tree Edit Distance is the cost associated with the "edit script" (series of edit operations) that turns one tree into another. I don't think you can directly use Dijkstra's algorithm for that.
Naaff
@Naaff: In fact you can use Dijkstra's algorithm for that (I wouldn't recommend it though). The nodes of the graph will be the OP's problem's trees, and they will have edges to trees with edit distance 1. This graph is infinite and therefore you won't store it in memory, but rather will compute it as you go. For trees that are not very near this algorithm will have an utterly horrible performance and memory consumption.
yairchu
@yairchu: Thanks for the insight. Interesting to see how one *could* use Dijkstra's algorithm, even if one *shouldn't*. :)
Naaff
+1  A: 

Here's some java source code (gzipped tarball at the bottom) for a Tree Edit Distance algorithm that might be useful to you.

The page includes references and some slides that go through the "Zhang and Shasha" algorithm step-by-step and other useful links to get you up to speed.

Naaff
A: 

There is a journal version of the ICALP2007 paper you refer to at http://www.wisdom.weizmann.ac.il/~oweimann/Publications/TEDinTALG.pdf This version also has a pseudocode.

A: 

There are many variations of tree edit distance. If you can go with top-down tree edit distance, which limits insertions and deletes to the leaves, I suggest trying the following paper: http://portal.acm.org/citation.cfm?id=671669&dl=GUIDE&coll=GUIDE&CFID=68408627&CFTOKEN=54202965. The implementation is a straightforward dynamic programming matrix with O(n2) cost.

Davi