views:

248

answers:

3

I need to write some code that performs an HTML highlight on specific keywords in a string.

If I have comma separated list of strings and I would like to do a search and replace on another string for each entry in the list. What is the most efficient way of doing it?

I'm currently doing it with a split, then a foreach and a Regex.Match. For example:

string wordsToCheck = "this", "the", "and";
String listArray[] = wordsToCheck.Split(',');
string contentToReplace = "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.";

foreach (string word in listArray)
{
    if (Regex.Match(contentToReplace, word + "\\s+", RegexOptions.IgnoreCase).Success)
    {
        return Regex.Replace(contentToReplace , word + "\\s+", String.Format("<span style=\"background-color:yellow;\">{0}</span> ", word), RegexOptions.IgnoreCase);
    }
}

I'm not sure this is the most efficient way because the list of words to check for could get long and the code above could be part of a loop to search and replace a bunch of content.

A: 

You could search for "(this|the|end)" and call Regex.Replace once with a match evaluator, a method, that takes the match and returns a replacement string.

You can build the match pattern by taking your string array and calling Regex.Escape on every element, then join it with String.Join using | as a separator.

Simon Svensson
+1  A: 

Don't do that if the wordsToCheck can be modified by a user!

Your approach works perfectly without Regexes. Just do a normal String.Replace.

If the input is safe, you can also use one regex for all keywords, e.g.

return Regex.Replace(contentToReplace, "(this|the|and)", String.Format("<span style=\"background-color:yellow;\">{0}</span> ", word), RegexOptions.IgnoreCase);

where "this|the|and" is simply wordsToCheck where the commas are replaces with pipes "|".

BTW, you might want to take the list keywords directly as a regex instead of a comma separated list. This will give you more flexibility.

vog
A: 

As for your considerations on performance issues - other users told about using 1 regex, and they are right, for even better perf (theoretically) you could use compiled flag, especially that you won't rather change your regex, for more information you may read this.

Ravadre