views:

497

answers:

5

I have a search method that takes in a user-entered string, splits it at each space character and then proceeds to find matches based on the list of separated terms:

string[] terms = searchTerms.ToLower().Trim().Split( ' ' );

Now I have been given a further requirement: to be able to search for phrases via double quote delimiters a la Google. So if the search terms provided were:

"a line of" text

The search would match occurrences of "a line of" and "text" rather than the four separate terms [the open and closing double quotes would also need to be removed before searching].

How can I achieve this in C#? I would assume regular expressions would be the way to go, but haven't dabbled in them much so don't know if they are the best solution.

If you need any more info, please ask. Thanks in advance for the help.

+1  A: 

Regular expressions would definitely be the way to go...

You should check this MSDN link out for some info on the Regex class: http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.aspx

and here is an excellent link to learn some regular expression syntax: http://www.radsoftware.com.au/articles/regexlearnsyntax.aspx

Then to add some code examples, you could be doing it something along these lines:

string searchString = "a line of";

Match m = Regex.Match(textToSearch, searchString);

or if you just want to find out if the string contains a match or not:

bool success = Regex.Match(textToSearch, searchString).Success;
Robban
+2  A: 

Here's a regex pattern that would return matches in groups named 'term':

("(?<term>[^"]+)"\s*|(?<term>[^ ]+)\s*)+

So for the input:

"a line" of text

The output items identified by the 'term' group would be:

a line
of
text
Drew Noakes
+1  A: 

use the regular expression builder here

http://gskinner.com/RegExr/

and you will be able to manipulate the regular expression to how you need it displayed

Craig Angus
+1  A: 

Use Regexs....

string textToSearchIn = ""a line of" text";
string result = Regex.Match(textToSearchIn, "(?<=").*?(?=")").Value;

or if more then one, put this into a match collection...

MatchCollection allPhrases = Regex.Matches(textToSearchIn, "(?<=").*?(?=")");

xoxo
A: 

The Knuth-Morris-Pratt (KMP algorithm)is recognised as the fastest algorithm for finding substrings in strings (well, technically not strings but byte-arrays).

using System.Collections.Generic;

namespace KMPSearch
{
    public class KMPSearch
    {
        public static int NORESULT = -1;

        private string _needle;
        private string _haystack;
        private int[] _jumpTable;

        public KMPSearch(string haystack, string needle)
        {
            Haystack = haystack;
            Needle = needle;
        }

        public void ComputeJumpTable()
        {
            //Fix if we are looking for just one character...
            if (Needle.Length == 1)
            {
                JumpTable = new int[1] { -1 };
            }
            else
            {
                int needleLength = Needle.Length;
                int i = 2;
                int k = 0;

                JumpTable = new int[needleLength];
                JumpTable[0] = -1;
                JumpTable[1] = 0;

                while (i <= needleLength)
                {
                    if (i == needleLength)
                    {
                        JumpTable[needleLength - 1] = k;
                    }
                    else if (Needle[k] == Needle[i])
                    {
                        k++;
                        JumpTable[i] = k;
                    }
                    else if (k > 0)
                    {
                        JumpTable[i - 1] = k;
                        k = 0;
                    }

                    i++;
                }
            }
        }

        public int[] MatchAll()
        {
            List<int> matches = new List<int>();
            int offset = 0;
            int needleLength = Needle.Length;
            int m = Match(offset);

            while (m != NORESULT)
            {
                matches.Add(m);
                offset = m + needleLength;
                m = Match(offset);
            }

            return matches.ToArray();
        }

        public int Match()
        {
            return Match(0);
        }

        public int Match(int offset)
        {
            ComputeJumpTable();

            int haystackLength = Haystack.Length;
            int needleLength = Needle.Length;

            if ((offset >= haystackLength) || (needleLength > ( haystackLength - offset))) 
                return NORESULT;

            int haystackIndex = offset;
            int needleIndex = 0;

            while (haystackIndex < haystackLength)
            {
                if (needleIndex >= needleLength)
                    return haystackIndex;

                if (haystackIndex + needleIndex >= haystackLength)
                    return NORESULT;

                if (Haystack[haystackIndex + needleIndex] == Needle[needleIndex])
                {
                    needleIndex++;
                } 
                    else
                {
                    //Naive solution
                    haystackIndex += needleIndex;

                    //Go back
                    if (needleIndex > 1)
                    {
                        //Index of the last matching character is needleIndex - 1!
                        haystackIndex -= JumpTable[needleIndex - 1];
                        needleIndex = JumpTable[needleIndex - 1];
                    }
                    else
                        haystackIndex -= JumpTable[needleIndex];


                }
            }

            return NORESULT;
        }

        public string Needle
        {
            get { return _needle; }
            set { _needle = value; }
        }

        public string Haystack
        {
            get { return _haystack; }
            set { _haystack = value; }
        }

        public int[] JumpTable
        {
            get { return _jumpTable; }
            set { _jumpTable = value; }
        }
    }
}

Usage :-

using System;
using System.Collections.Generic;
using System.Text;
using System.Reflection;
namespace KMPSearch
{
    class Program
    {
        static void Main(string[] args)
        {
            if (args.Length != 2)
            {
                Console.WriteLine("Usage: " + Environment.GetCommandLineArgs()[0] + " haystack needle");
            }
            else
            {
                KMPSearch search = new KMPSearch(args[0], args[1]);
                int[] matches = search.MatchAll();
                foreach (int i in matches)
                    Console.WriteLine("Match found at position " + i+1);
            }
        }

    }
}
Rob Cowell