I want to parse a search string similar to that provided by Gmail using Perl. An example input would be "tag:thing by:{user1 user2} {-tag:a by:user3}". I want to put it into a tree structure, such as
{and => [
"tag:thing",
{or => [
"by:user1",
"by:user2",
]},
{or => [
{not => "tag:a"},
"by:user3",
]},
}
The general rules are:
- Tokens separated by space default to the AND operator.
- Tokens in braces are alternative options (OR). The braces can go before or after the field specifier. i.e. "by:{user1 user2}" and "{by:user1 by:user2}" are equivalent.
- Tokens prefixed with a hyphen are excluded.
These elements can also be combined and nested: e.g. "{by:user5 -{tag:k by:user3}} etc".
I'm thinking of writing a context-free grammar to represent these rules, and then parsing it into the tree. Is this unnecessary? (Is this possible using simple regexps?)
What modules are recommended for doing parsing context-free grammars?
(Eventually this will be used to generate an database query with DBIx::Class.)