I am writing code for a search results page that needs to highlight search terms. The terms happen to occur within table cells (the app is iterating through GridView Row Cells), and these table cells may have HTML.
Currently, my code looks like this (relevant hunks shown below):
const string highlightPattern = @"<span class=""Highlight"">$0</span>";
DataBoundLiteralControl litCustomerComments = (DataBoundLiteralControl)e.Row.Cells[CUSTOMERCOMMENTS_COLUMN].Controls[0];
// Turn "term1 term2" into "(term1|term2)"
string spaceDelimited = txtTextFilter.Text.Trim();
string pipeDelimited = string.Join("|", spaceDelimited.Split(new[] {" "}, StringSplitOptions.RemoveEmptyEntries));
string searchPattern = "(" + pipeDelimited + ")";
// Highlight search terms in Customer - Comments column
e.Row.Cells[CUSTOMERCOMMENTS_COLUMN].Text = Regex.Replace(litCustomerComments.Text, searchPattern, highlightPattern, RegexOptions.IgnoreCase);
Amazingly it works. BUT, sometimes the text I am matching on is HTML that looks like this:
<span class="CustomerName">Fred</span> was a classy individual.
And if you search for "class" I want the highlight code to wrap the "class" in "classy" but of course not the HTML attribute "class" that happens to be in there! If you search for "Fred", that should be highlighted.
So what's a good regex that will make sure matches happen only OUTSIDE the html tags? It doesn't have to be super hardcore. Simply making sure the match is not between < and > would work fine, I think.