I am working on a tree library, and part of the required functionality, is to be able to search a node for child nodes that match a pattern.
A 'pattern' is a specification (or criteria) that lays out the structure, as well as attributes of nodes in the subtree(s) to be matched.
For example, suppose a tree represents data regarding a particular species of bird. Further assume that the nodes of such a tree have the following attributes:
- location
- sex
- wingspan
- weight
- brood_size
Given a parent node, I would like to issue a search in plain English thus:
"Fetch me all male birds that are descendants of this bird, and live in XXX city and have a weight > 100g. Any such bird found should also have at least 2 brothers and one sister, and must itself have at least one child"
< note >
Just to clarify, I do not expect to be able to query using plain English as I have done above. I only used the "plain English query" to illustrate the type of matching I would like to be performing on the tree. I fully expect to use symbols for the matching (as opposed to plain text) in practice.
< /note >
I am thinking of possibly using a regex type pattern matching to match trees. One way would be to have a string representation of each node, so I could use a normal regex - but this is likely of be quite inefficient, as there will be a lot of repeated data - i.e. string representation of child nodes will be supersets of their parent representation, which will be supersets of their parents representational string, and so on, recursively, up the tree - this could very easily become unwieldy for event modestly sized trees - there has to be a better way.
Is anyone aware of an algorithm that will allow me to select nodes (subtrees) in a node, based on a pattern?
Although I asked for a general algorithm, I am implementing this in Python. any snippets that further illustrate such an algorithm (if one can indeed be written), would be immensely useful.