views:

83

answers:

1

I'm working with the fulltext search engine of MSSQL 2008 which expects a search string like this:

("keyword1" AND "keyword2*" OR "keyword3")

My users are entering things like this:

engine 2009
"san francisco"     hotel december xyz
stuff* "in miami"   1234
something or "something else"

I'm trying to transform these into fulltext engine compatible strings like these:

("engine" AND "2009")
("san francisco" AND "hotel" AND "december" AND "xyz")
("stuff*" "in miami" "1234")
("something" OR "something else")

I have a really difficult time with this, tried doing it using counting quotation marks, spaces and inserting etc. but my code looks like horrible for-and-if vomit.

Can someone help?

+2  A: 

Here you go:

class Program {
    static void Main(string[] args) {
        // setup some test expressions
        List<string> searchExpressions = new List<string>(new string[] { 
            "engine 2009", 
            "\"san francisco\"     hotel december xyz", 
            "stuff* \"in miami\"   1234 ", 
            "something or \"something else\""
        });

        // display and parse each expression
        foreach (string searchExpression in searchExpressions) {
            Console.WriteLine(string.Concat(
                "User Input: ", searchExpression, 
                "\r\n\tSql Expression: ", ParseSearchExpression(searchExpression), 
                "\r\n"));
        }

        Console.ReadLine();

    }

    private static string ParseSearchExpression(string searchExpression) {
        // replace all 'spacecharacters' that exists within quotes with character 0
        string temp = Regex.Replace(searchExpression, @"""[^""]+""", (MatchEvaluator)delegate(Match m) {
            return Regex.Replace(m.Value, @"[\s]", "\x00");
        });

        // split string on any spacecharacter (thus: quoted items will not be splitted)
        string[] tokens = Regex.Split(temp, @"[""\s]+", RegexOptions.IgnoreCase | RegexOptions.ExplicitCapture);

        // generate result
        StringBuilder result = new StringBuilder();
        string tokenLast = string.Empty;
        foreach (string token in tokens) {
            if (token.Length > 0) {
                if (!token.Equals("OR", StringComparison.OrdinalIgnoreCase)) {
                    if (result.Length > 0) {
                        result.Append(tokenLast.Equals("OR", StringComparison.OrdinalIgnoreCase) ? " OR " : " AND ");
                    }
                    result.Append("\"").Append(token.Replace("\"", "\"\"").Replace("\x00", " ")).Append("\"");
                }
                tokenLast = token;
            }
        }
        if (result.Length > 0) {
            result.Insert(0, "(").Append(")");
        }

        return result.ToString();
    }
}
Fredrik Johansson
Almost perfect, except for the case where "and" is in the search term. So this: something and "something else" will turn into "something" AND "and" AND "something else". The "and" shouldn't be searched for, just like the "or" isn't searched for.
Alex
Fredrik Johansson