I am writing some C# code to parse RSS feeds and highlight specific whole words in the content, however, I need to only highlight words that are outside HTML. So far I have:
string contentToReplace = "This is <a href=\"test.aspx\" alt=\"This is test content\">test</a> content";
string pattern = "\b(this|the|test|content)\b";
string output = Regex.Replace(contentToReplace, pattern, "<span style=\"background:yellow;\">$1</span>", RegexOptions.Singleline | RegexOptions.IgnoreCase);
This works fine, except it will highlight the word "test" in the alt tag. I can easily write a function that strips the HTML, then does the replace, but I need the keep the HTML to display the content.