views:

42

answers:

4

I'm creating a search form that allows boolean expressions, like: "foo AND bar" or "foo AND NOT bar".

Is there a library for PHP, Ruby or Java that can transform boolean expressions to a concrete syntax tree?

(I could write my own lexer/parser, but I rather use something tried and tested)

EDIT: To clarify, I'm not parsing arrhythmic expressions. It's going to be used for parsing full text queries that allow boolean operators.

A: 

just stumbled over this while looking for a sat solver... http://jboolexpr.sourceforge.net/ may be just what you need.

tarrasch
A: 
$patterns=array(
   '/\band\b/i',
   '/\bor\b/i',
   '/\bfoo\b/i',
   '/\bbar\b/i',
   '/\bnot\b/i');
$replace=array(
   '*',  // value for 'and'
   '+',  // value for 'or'
   '1',  // value for 'foo'
   '0',  // value for 'bar'
   '!'); // value for 'not'
$expression="foo AND NOT bar";
$result=eval(preg_replace($pattern, $replace, $expression));

You can easily add your own facts, it will automatically handle brackets and precedence. It might be a good idea to think about how you'll handle unexpected inputs in $expression (hint the replaced version should only contain 1, 0, +, *, (, ) and !)

C.

symcbean
+2  A: 

For most parser framework, one of the standard introductory grammars is usually a mathematical expression parser. It parses things like 3+4*5-(6/-7), etc. Some tutorials would go even further and show you how to build the syntax tree and evaluate the expression.

A typical boolean expression grammar is usually directly analogous to mathematical expression grammar in this infix notation.

  • AND and OR are binary operators (i.e. they take 2 operands), just like + and *
  • NOT is a unary operator (it takes 1 operand), just like the unary -
  • Some operators have higher precedence than others
  • You may use (…) to group subexpressions

You should therefore be able to go through the calculator tutorial of most parser package and adapt it to your problem. In fact, in many ways boolean expression parsing is simpler because e.g.:

  • In mathematical expressions, the - operator is overloaded (may be binary or unary)
  • Some operator, e.g. exponentiation, is right-associative

In boolean expression grammar, usually the operators aren't overloaded, and everything is left-associative.

Links

polygenelubricants
+1  A: 

I recommend Treetop. It's a minilanguage that generates a parser for your PEG (Parsing Expression Grammar). It's easier to work with than a LALR grammar and more powerful than a recursive descent parser (i.e. it can do some lookaheads more than one symbol).

Although Treetop isn't that complex, it helps to have some simple examples to to start with. You will find them at threedaymonk's treetop examples.

Wayne Conrad