views:

61

answers:

3

I have some search queries like so:

George AND NOT Washington OR Abraham

Dog OR cat AND NOT Wolf

for these searches I would want to get back results for George or Abraham but not Washington, etc.

basically I want to take the string and be able to submit a contextual search to my full-text catalog stored procedure search.

I am assuming I should use Regex but I am very unfamiliar with Regex in C#.

I found this article: http://support.microsoft.com/kb/246800 which I think is what I need to do, but I was hoping that I could have some help with the implementation.

Assuming you take a string as parameter and would like to return a string:

string input = 'George Washington AND NOT Martha OR Dog';

private string interpretSearchQuery(input)
{
     // HALP!

        /* replace ' AND ' | ' AND NOT ' with
         * " AND "
         * " AND NOT "
         * 
         * replace ' OR ' | ' OR NOT ' with
         * " OR "
         * " OR NOT "
         * 
         * add " to beginning of string and " to end of string
         */

     return '"George Washington" AND NOT "Martha" OR "Dog"';
}
+1  A: 

I would parse your string using Postfix notation (or Polish notation).

**Postfix algorithm**
The algorithm for evaluating any postfix expression is fairly straightforward:

While there are input tokens left    

  Read the next token from input.

  If the token is a value
    Push it onto the stack.

  Otherwise, the token is an operator (operator here includes both operators, and functions). 
   It is known a priori that the operator takes n arguments. 

   If there are fewer than n values on the stack 
     (Error) The user has not input sufficient values in the expression. 
   Else, Pop the top n values from the stack. 

   Evaluate the operator, with the values as arguments. 
   Push the returned results, if any, back onto the stack. 

If there is only one value in the stack 
  That value is the result of the calculation. 

If there are more values in the stack 
  (Error) The user input has too many values.

So taking your input string:

'George Washington AND NOT Martha OR Dog'

And simplifing it to:

A = George 
B = Washington
C = Martha
D = Dog
& = AND
! = NOT
| = OR

We would get the postfix notation of

AB&C!D|

Which means:

  1. Push value A (George)
  2. Push value B (Washington)
  3. AND by popping previous two values and pushing the result (George AND Washington)
  4. Push value C (Martha)
  5. NOT by popping previous two values and pushing the result (George AND Washington) NOT (Martha)
  6. Push value D (Dog)
  7. OR by popping previous two values and pushing the result ((George AND Washington) NOT (Martha)) OR (Dog)
GalacticJello
this is what I was initially thinking, I just was hoping there would be a good way to do it with regex instead.
samandmoore
Once you write a quick parser to take your query string and return a postfix array, doing the query is pretty simple.
GalacticJello
I'm going to use this method, very intelligent solution.
samandmoore
+2  A: 

This might get you started... I would refactor the crap out of this to make it more robust.

string input = "George Washington AND NOT Martha OR Dog";

private string interpretSearchQuery(string input)
{
    StringBuilder builder = new StringBuilder();
    var tokens = input.Split( ' ' );

    bool quoteOpen = false;
    foreach( string token in tokens )
    {
        if( !quoteOpen && !IsSpecial( token ) )
        {
            builder.AppendFormat( " \"{0}", token );
            quoteOpen = true;
        }
        else if( quoteOpen && IsSpecial( token ))
        {
            builder.AppendFormat( "\" {0}", token );
            quoteOpen = false;
        }
        else
        {
            builder.AppendFormat( " {0}", token );
        }
    }

    if( quoteOpen )
    {
        builder.Append( "\"" );
    }

    return "'" + builder.ToString().Trim() + "'";
}

public static bool IsSpecial( string token )
{
    return string.Compare( token, "AND", true ) == 0 ||
        string.Compare( token, "OR", true ) == 0 ||
        string.Compare( token, "NOT", true ) == 0;
}
Jerod Houghtelling
Your concept inspired me. My solution is not perfect as using postfix would be, but it gets the job done.
samandmoore
@samandmoore I would have choosen the postfix answer as well! It is a much better 'general' solution than this hack.
Jerod Houghtelling
A: 

Here is a solution that I came up with. The only issue is that malformed search queries will not parse properly and fail:

private string interpretSearchTerm(string searchTerm)
        {
            string term = "";
            /* replace ' AND ' | ' AND NOT ' with
             * " AND "
             * " AND NOT "
             * 
             * replace ' OR ' | ' OR NOT ' with
             * " OR "
             * " OR NOT "
             * 
             * add " to beginning of string and " to end of string
             */
            if (searchTerm.IndexOf("AND") > -1
                || searchTerm.IndexOf("OR") > -1
                || searchTerm.IndexOf("AND NOT") > -1
                || searchTerm.IndexOf("OR NOT") > -1)
            {
                term = searchTerm.Replace(" AND NOT ", "\"AND NOT\"")
                        .Replace(" AND ", "\"AND\"")
                        .Replace(" OR NOT", "\"OR NOT\"")
                        .Replace(" OR ", "\"OR\"");
                term = "\"" + term + "\"";
                return term;
            }
            else if (searchTerm.IndexOf("\"") > -1) return searchTerm;
            else return "\"" + searchTerm + "\"";
        }

I will now go about implementing the postfix algorithm that GalacticJello suggested. I will post it up when I get it working.

samandmoore