I've been playing around with natural language parse trees and manipulating them in various ways. I've been using Stanford's Tregex and Tsurgeon tools but the code is a mess and doesn't fit in well with my mostly Python environment (those tools are Java and aren't ideal for tweaking). I'd like to have a toolset that would allow for easy hacking when I need more functionality. Are there any other tools that are well suited for doing pattern matching on trees and then manipulation of those matched branches?
For example, I'd like to take the following tree as input:
(ROOT
(S
(NP
(NP (NNP Bank))
(PP (IN of)
(NP (NNP America))))
(VP (VBD used)
(S
(VP (TO to)
(VP (VB be)
(VP (VBN called)
(NP
(NP (NNP Bank))
(PP (IN of)
(NP (NNP Italy)))))))))))
and (this is a simplified example):
- Find any node with the label NP that has a first child with the label NP and some descendent named "Bank", and a second child with the label PP.
- If that matches, then take all of the children of the PP node and move them to end of the matched NP's children.
For example, take this part of the tree:
(NP
(NP (NNP Bank))
(PP (IN of)
(NP (NNP America))))
and turn it into this:
(NP
(NP (NNP Bank) (IN of) (NP (NNP America))))
Since my input trees are S-expressions I've considered using Lisp (embedded into my Python program) but it's been so long that I've written anything significant in Lisp that I have no idea where to even start.
What would be a good way to describe the patterns? What would be a good way to describe the manipulations? What's a good way to think about this problem?