I want to implement incremental search on a list of strings. Consider I have an array containing which contains the strings store,state,stamp,crawl,crow. My application has a text box in which the user enters the search string. Now, as the user enters the text, I need to highlight all the matches. For example, when the user enters "st" I need to highlight "Store,state,stamp" now when he types "a", I need to remove "Store" from the list.I am developing the application using c# with .net framework. What I plan to do is , on event on which text changes, I do a search in background and show the results. Is there any other way to solve this ?
Below is a function that will incrementally search a string for a substring to match.
public IEnumerable<int> FindAllMatches(string toMatch, string source) {
var last = 0;
do {
var cur = source.IndexOf(toMatch,last);
if ( cur < 0 ) {
break;
}
yield return cur;
last = cur + toMatch.Length;
while(true);
}
Instead of an array of strings you could use a generic collection. This way you can use the FindAll method with a delegate to search through the items.
string searchString = "s";
List<string> sl = new List<string>();
sl.Add("store");
sl.Add("state");
sl.Add("stamp");
sl.Add("crawl");
sl.Add("crow");
List<string> searchResults = sl.FindAll(delegate(string match)
{
return match.StartsWith(searchString, StringComparison.CurrentCultureIgnoreCase);
});
A trie data structure would scale well, if your list can grow to significant length (more than hundreds of entries). Check out e.g. this example implementation.
You could just look at the newly entered letter; if the new third letter is an 'a' just throw out all elements without 'a' at position three. If the user deletes a letter you have to rescan the whole original list and bring back all priviously removed items.
But what if the user pastes multiple letters from the clipboard, deletes multiple letters by selecting them, inserts or deletes a single or multiple letters somewhere in the middle?
You have just to many cases to watch for. You could do the method with the newly entered letter an fall back to a complete rescan if the search text changed in a way other than adding a single letter, but even this simple method is probably not worth the effort just to avoid a few ten or hundred string comparisons. As already mentioned, a Trie or Patricia trie is the way to go if you have really large data sets or want to be really quick.
I've had to do something similar in the past, using a collection that contained approximately 500,000 words. I found that a directed acyclic word graph worked well. A DAWG has roughly the same performance as a trie, but will be more space efficient. It is, however, slightly more complex to implement.
Unfortunately, my work was in C, and I don't have a good reference for a DAWG implementation in C#.