views:

412

answers:

7

Do you know a way to organize boolean expressions in a database while allowing infinite nesting of the expressions?

Example:

a = 1 AND (b = 1 OR b = 2)

The expression as a whole shouldn't be stored as varchar to preserve data integrity.

+3  A: 

An expression is a treelike structure. So you need a way to present the tree in a table.

You can for example use the fields:

  • ID
  • TypeExpression (and, or etc...)
  • FirstChildID
  • SecondChildID

In this case, you have the following types:

  1. AND, Children point to other expression.
  2. OR, Children point to other expression.
  3. Equal, Children point to other expression.
  4. Literal, FirstChild points to an entry in a literal table.
  5. VariableLookup, FirstChild points to an entry in a varable table.

But I think there are better ways to organise expression. I once made a simple expression evaluator that accepts a string and produces a numeric result.

Gamecat
+3  A: 

Option 1 would be to use a nested table (a tree with id / parent_id structure), like Gamecat suggested. This is relatively expensive to do, and requires issuing SQL queries repetitively to build the equivalent of a single nested expression.

Option 2 would be to use a serialized object and store it into a varchar column. For example, JSON would be a good choice. It is not white-space sensitive, can be created and parsed in a vast number of languages, and it retains data integrity.

As soon as you have parsed your expression string into a tree object in memory, you can serialize it and store it. If there was no need to manipulate the expression on the database level, I guess I would go that route.

Tomalak
You are making a very good point here: one should not be using option 1 just because it can be used; there must be certain requirements in place to justify option 1. If such requirements are not imminent, I'd probably start from option 2.
Yarik
+2  A: 

This is going to be difficult to represent relationally, because by its nature it is both hierarchical and polymorphic (the leaves of your tree can be either variables or constants).

Ned Batchelder
+2  A: 

This type of expression is most usually expressed as a tree (a hierarchy), which are notoriously annoying to query in SQL.

We'll assume that a and b are numeric for the moment and that literals ('1', '2') are distinguished from variables.

Table Nodes
id
type (Variable|Literal)
name (nullable for literal)
value

Table Operators
id
name (=, AND, OR, NOT)
leftNodeId
rightNodeId

This structure is very flexible, but querying it to retrieve a complex expression is going to be "fun" (read that "challenging").

And you still have to parse the structure to begin with and evaluate the expression after it has been reconstructed.

Ken Gentle
+1  A: 

The traditional way to model Boolean functions is to use Binary Decision Diagrams, especially Reduced Order Binary Decision Diagrams. It's possible you can find an extension for your DBMS that provides good support for the concept.

UPDATE: Alternately, if you don't need to query on the Boolean logic, you could use a BDD library and just serialize the BDD into a BLOB or equivalent. It beats using a varchar field because the BDD library would ensure the data was valid.

Hank Gay
A: 

Adding to @Gamechat answer

I think it should be like this

ID

TypeExpression (and, or etc...)

FirstChildID --This can be a leaf node or a pointer to another row in same table

SecondChildID --This can be a leaf node or a pointer to another row in same table

isFirstChildLeaf

isSecondChildLeaf

Varun Mahajan
A: 

I would store the expression in polish form, in a varchar/text column. An expression in polish form (operand before operands, no brackets) is much easier to parse with a recursive function (or a stack of course).

a = 1 AND (b = 1 OR b = 2)

in polish form shows like this:

AND = a 1 OR = b 1 = b 2

cipak