views:

353

answers:

6

Are there regular expression equivalents for searching and modifying tree structures? Concise mini-languages (like perl regex) are what I am looking for.

Here is an example that might clarify what I am looking for.

<root>
  <node name="1">
    subtrees ....
  </node>
  <node name="2">
    <node name="2.1">
     data
    </node>
    other subtrees...
  </node>
</root>

An operation that would be possible on the above tree is "move subtree at node 2.1 into the subtree at node 1." The result of the operation might look something like..

<root>
  <node name="1">
    subtrees ....
    <node name="2.1">
     data
    </node>
  </node>
  <node name="2">
    other subtrees...
  </node>
</root>

Search and replace operations like find all nodes with atleast 2 children, find all nodes whose data starts with "a" and replace it with "b" if the subtrees have atleast 2 other siblings, etc. should be supported.

For strings, where the only dimension is across the length of the string, we can do many of above operations (or their 1D equivalents) using regular expressions. I wonder if there are equivalents for trees. (instead of a single regex, you might need to write a set of transformation rules, but that is ok).

I would like to know if there is some simple mini language (not regex per.se, but something that is as accessible as regex via libraries, etc..). to perform these operations? Preferably, as a python library.

A: 

Navigating through a binary search tree requires state (in which node I am?) and comparisons (is that value less or greater than that?), things that cannot be done by a finite state automaton.

Sure, you can search for the node with a given value, but how then you could, for example, remove a node that isn't a leaf if you don't know its parent?

And even if you know the parent via the information supplied by the node, how do you determine the minimum of the left subtree, remove it and place it in the node?

I think you're asking too much to FSA.

akappa
The automaton could work if each node contains the relevant data (and states relating to that) for all data that might be matched such as ancestry and parent-state?
Aiden Bell
-- continuation -- Then subexpressions relating to other nodes can invoke a sub-engine to return a state or boolean mapped to a transition.
Aiden Bell
But, on removal, you have to "refresh" the relevant data to each node...
akappa
+3  A: 

I don't know a general purpose langugae that can do that, but it seems to me that you are looking for something like XPath.

Daniel Brückner
I've looked at XPath. It seems promising but it doesn't seem to handle expressions over sets of nodes (eg, find all nodes who have atleast 2 siblings). It has limited functionality.
JSN
+2  A: 

There is TXL for pattern based tree rewriting.

Tree rewriting with patterns is also done with parser toolkits such as ANTLR

Code generation with bottom up tree rewriting, google BURS or BURG.

Doug Currie
TXL seems very promising, however both ANTLR and TXL assume a context free grammar, which is important when you also need to perform parsing. However, for the purposes of transformation and regex like behavior on trees it should explicitly be context dependent. See my clarification of the question above for some use cases that I would like (eg: search with conditions on siblings).
JSN
A: 

This article gives some tasty hints about recursive Perl regular expressions, but honestly it's rare to see a tree structure approached this way.

More typically, one would write a state machine style parser, that might use regexes to parse each particular node in the tree.

Expat is probably a good example to look at.

Sean Cavanagh
A: 

Pattern Matching, provided by languages such as Scala, F#, Erlang and Haskell (I'm sure there's more) is designed to succinctly manipulate data structures like trees, esp when used with recursion.

here is a very high level view of what pattren matching can do in Scala. The examples shown really don't do pattern matching justice.

Wikipedia has a couple of references to pattern matching, too. Here and here.

Barry Carr
A: 

I'm somewhat surprised that XSLT hasn't come up as an answer. Granted, I don't think it's a particularly elegant language, and most existent solutions tend to favour procedural approaches rather than pattern matching, and it's gotten a mighty bad rep from being blindly applied just because it's XML being applied to XML -- but otherwise it fits the bill. Pity its canonical representation is so verbose, though...

Pontus Gagge
Right now, XSLT seems to be the closest to what I want, but writing context sensitive queries seems convoluted, my question was to find something better than xslt.
JSN