views:

100

answers:

1

I have build a litte asp.net form that searches for something and displays the results. I want to highlight the search string within the search results. Example:

Query: "p"
Results: a<b>p</b>ple, banana, <b>p</b>lum

The code that I have goes like this:

public static string HighlightSubstring(string text, string substring)
{
 var index = text.IndexOf(substring, StringComparison.CurrentCultureIgnoreCase);
 if(index == -1) return HttpUtility.HtmlEncode(text);
 string p0, p1, p2;
 text.SplitAt(index, index + substring.Length, out p0, out p1, out p2);
 return HttpUtility.HtmlEncode(p0) + "<b>" + HttpUtility.HtmlEncode(p1) + "</b>" + HttpUtility.HtmlEncode(p2);
}

I mostly works but try it for example with HighlightSubstring("ß", "ss"). This crashes because in Germany "ß" and "ss" are considered to be equal by the IndexOf method, but they have different length!

Now that would be ok if there was a way to find out how long the match in "text" is. Remember that this length can be != substring.Length.

So how do I find out the length of the match that IndexOf produces in the presence of ligatures and exotic language characters (ligatures in this case)?

+2  A: 

This may not directly answer your question but perhaps will solve your actual problem.

Why not substitute instead?

using System.Text.RegularExpressions;

public static string HighlightString(string text, string substring)
{
    Regex r = new Regex(Regex.Escape(HttpUtility.HtmlEncode(substring)),
                        RegexOptions.IgnoreCase);
    return r.Replace(HttpUtility.HtmlEncode(text), @"<b>$&</b>");
}

But what of the culture? If you specify a Regex as case-insensitive, it is culture-sensitive by default according to http://msdn.microsoft.com/en-us/library/z0sbec17.aspx.

Andrew
I upvoted this, because it is a solution, but I will not use it because the performance of creating a fresh regex about 100 times per request will be too much for my purposes.
usr
It is quite possible to solve the recompilation issue if you have to use the Regex multiple times per request (I take it there are multiple text strings to check?). I can think of two ways. First, you could use the static Regex.Replace method rather than creating the instance method as I do in the code. Using the static method causes .NET to cache the regex (see http://msdn.microsoft.com/en-us/library/8zbs0h2f.aspx). Or, create the regex outside the HighlightString method and reuse the regex for each Replace. Finally, if the issue is multiple substrings, create a regex that combines them.
Andrew