views:

165

answers:

5

I'm looking for a way to parse some user-input. The input should show which searches have to be performed and how they have to be combined.

  • 1 AND 2
  • (3 AND 2) OR 1
  • (3 AND 2) OR (1 AND 4)
  • ( (3 OR 4) AND 1) OR 2
  • etc.

The first example should combine the results of search 1 and 2 in an AND-fashion. The second example should combine the results of search 3 and 2 in an AND-fashion, and combine the results of this combination to the results of search 1 in an OR-fashion. Etc.

Any ideas on how to do this?

+1  A: 

Just some inspiration on how to create parsers for generic keyword search ...

Even though you tagged the question java, here's an example of a searchparser in python. It uses pyparsing, a parser generator, which takes a grammar and creates code, which can be run to parser user input.

http://pyparsing.wikispaces.com/file/view/searchparser.py

293 lines of code, including a test suite. Maybe it helps you as a starting point ...

The MYYN
Although it's 'not helpful' for the question, the answer is to good you shouldn't loose reputation for it. I suggest marking the answer Community Wiki, so it doesn't count. (BTW - wasn't me downvoting the answer)
Andreas_D
+2  A: 

Think of your 'result' as an Object that offers methods for and and or like in the following interface:

public interface AndOrCapable<T> {
  public T and(T anOtherResult);
  public T or(T anOtherResult);
}

Then you can translate your user input into something like:

Result total = r2.or(r1.and(r3.or(r4))); // your fourth example

This is just to clarify the concept - in your case you need a dynamic evaluator because you use user input.

So you still need a validator/parser to transform the user input into an (syntax) tree, which will be the model you'd use to calculate the total.

Hope it helped a bit!

Andreas_D
+1  A: 

The clean solution would be to write an infix parser; there are quite a few code examples online. In your example, a simpler algorithm might suffice, however, since you do not need operator precedence etc.

As a coding remark: The StreamTokenizer class might help you in parsing the input string.

Heinzi
+1  A: 

On the implementation end (once you have a parser, organizing and performing searches):

  1. What about creating a Condition tree, where Condition objects may be simple conditions, or compound conditions joining 2 simple conditions with a boolean (IE ANDCondition parent node with children RangeCondition and EqualsCondition).

    Then you evaluate the top of the tree against each item. This solution is O(mn), where m is number of conditions, and n is number of of items to search, but you can optimize this by removing redundant conditions. It is much faster if the first condition eliminates most items.

  2. Version 2: assign a unique key to each item (say, an array index), and perform the searches for each condition, building a HashSet<Key> for each condition. Then, starting with the smallest set of required keys, remove or add keys for each condition until you have final results. This may be faster than the above, depending.

Note: these approaches mimic how an SQL Database will operate -- if your system is sufficiently large or complex, you should probably investigate using a database instead of writing your own code to do the same thing.

BobMcGee
+2  A: 

JavaCC is a good tool to use to generate a parser for this. Alternately, if you can change the syntax a little you may be able to the scripting facilities with java using a scheme interpreter, e.g.

( (3 OR 4) AND 1) OR 2

becomes

(OR (AND (OR 3 4) 1) 2)

Then you just need to implement the AND/OR

vickirk