views:

357

answers:

5

I can't seem to find a pointer in the right direction, I am not even sure what the terms are that I should be researching but countless hours of googling seem to be spinning me in circles, so hopefully the collective hive of intelligence of Stack Overflow can help.

The problem is this, I need a way to filter data in what I can only call a compound logic tree. Currently the system implements a simple AND filtering system. For example, lets say we have a dataset of people. You add a bunch of filters such that show all the people where (Sex = Female) AND (Age > 23) AND (Age < 30) AND ( Status = Single). Easy enough, iterate through each item, add to a valid items collection only if every condition is true.

The problem I'm encountering is how do I handle the user being able to build complex queries involved and's and or's? I'm thinking of something like a tree where each node represents and expression evaluating its children to true or false. A simplistic example would be - filter down to ((Sex == Male AND Age == 25) OR (Sex == Female AND Status == Single)) AND IQ > 120. Sorry I can't think of a better example at the moment. But how would you go about representing this type of expression tree, and evaluating the items in a collection against these filters. What are some references that would help? Hell, what are some damn Google searching that might lead into a positive direction?!

Thanks to anyone that can provide any help.

Here is an example of a compound query in tree form using a dataset of people

  • Query - Show me all people where sex is male and eyes are green or sex is female, eyes are blue, or status is single. In Paren form (Sex==Male && Eyes == Green) || ( Sex == Female && ( Eyes == Blue || Status == Single))

So In tree form im Thinking

o-Root Node
  - And - Sex = Male
     - And - Eyes = Blue
  - Or - Sex = Female
     - And Eyes = Blue
     - Or Status = Single

I believe the solution is to represent each node such in a data structure like

Node
{
   OpType - AND or OR
   ExpressionField - The field to evaluate
   ExpressionOp -   =, !=, >, >=, <, <=
   ExpressionValue - the value to compare the field's value against

   Function Evaluate() - returns a bool
}

So for a given node, evaluate the chilren, if you are an AND node, then return true if your expression results in true and all your AND children evaluate to true or any OR child evaluates to true and recurse up.

Seems to satisfy every conceptual condition I can throw at it, but we will since once I implement it. I will post the real code up later when its working and pictures to help describe this problem better for others.

A: 

I have to say that this is why database engines are built. You can do all that you require with set logic and you may even arrive at the result you are looking for, but theses are standard problems solved by databases and SQL. You can also look at linq for a in code solution.

rerun
I think he means on how he can dynamically build this expression within sql.
Sem Dendoncker
Yes, this is about presenting a solid UI to the user that allows them to easily build a query. I think I have figured it out with a fairly simple tree structure and a few simple recursive functions, bit I know others have studied this problem and would like to learn their thoughts and experiments, but as of yet have not figured out how to find the others.Also, in this case all the data is in memory.
JTtheGeek
An option would be to take the Set of criteria and allow the user to create a set of and criteria and Store the Set. So Set 1 would be Female IQ > 120 Then allow users to specify Multiple sets for the ors. This could be done graphically in an interesting fashion allowing users to drag and drop sets. Perhaps you could place sets inside of set to create a intersection or a join of sets. Just an Idea Sorry I didn't get the crux of your question the first go around.
rerun
Thanks for the help, I believe I have come up with a good solution to this, and will post more later.
JTtheGeek
+1  A: 

Sounds like you need to create a user interface that allows the creation of a simple parse tree. When the presses GO you can then walk the tree and create a LINQ expression tree from that user interface structure. Execute the LINQ query and then process the results as needed. I would therefore recommend you read up on LINQ expression trees.

Phil Wright
+1  A: 

Your parsing of the expression ((Sex == Male AND Age == 25) OR (Sex == Female AND Status == Single)) AND IQ > 120 looks odd. I would parse it as:

* And
    * Or
        * And
            * ==
                * Sex
                * Male
            * ==
                * Eyes
                * Blue
        * And
            * ==
                * Sex
                * Female
            * ==
                * Status
                * Single
    * >
        * IQ
        * 120

The tree type would be :

Node
{
    bool evaluate ()
}

AndNode : Node
{
    Node left
    Node right

    bool evaluate ()
    {
        return left.evaluate () && right.evaluate ()
    }
}

// OrNode is similar

EqualsNode : Node
{
    Field field
    Value value

    bool evaluate ()
    {
        return field.value () == value
    }
}

// Likewise for <, >, etc
jon hanson
JTtheGeek
Not sure i understand. Strictly evaluate() should take as an argument some sort of context object (maybe a row of a table), which would yield a value for the Field (which might be different for each row).
jon hanson
+1  A: 

These kinds of queries are often presented as an ORed array of ANDed clauses. That is, a tabular format in which you read across multiple conditions ANDed together, and then read down to OR them. That leads to some repetition of conditions, but is easy for users to read, write, and understand. Your sample ((Sex == Male AND Age == 25) OR (Sex == Female AND Status == Single)) AND IQ > 120 would look like

Sex == Male   & Age == 25        & IQ > 120 
Sex == Female & Status == Single & IQ > 120
Hyman Rosen
+1  A: 

Hi

You might want to Google for terms such as 'predicate calculus' and 'conjunctive normal form'.

Regards

Mark

High Performance Mark