tags:

views:

63

answers:

3

Hello, I have the following code to first remove html tags and then highlight the search term within the resulting text:

protected void ListView1_ItemDataBound(object sender, ListViewItemEventArgs e)
{
    try
    {
        // get search value query string
        string searchText = Request.QueryString["search"].Trim();
        string encodedValue = Server.HtmlEncode(searchText);

        Literal Content = e.Item.FindControl("Content") as Literal;
        string contentText = Content.Text;
        Content.Text = Regex.Replace(contentText, @"<(.|\n)*?>", string.Empty).Replace(encodedValue, "<font class='highlight2'>" + encodedValue + "</font>");
    }
    catch
    {
        // do nothing
    }
}

This works to a degree but the second replace is not case insensitive. How can I do the second replace also with regex.replace() so case sensitivity is not an issue? Thank you!

+2  A: 

Use this overload which takes in RegexOptions. You'll want the IgnoreCase value.

rchern
That'd be `String.Replace`, which doesn't have this overload.
Kobi
@Kobi, If it needs to be case insensitive, it has to use Regex.Replace. Per the [docs](http://msdn.microsoft.com/en-us/library/fk49wtc1.aspx), *This method performs an ordinal (case-sensitive and culture-insensitive) search to find oldValue.*
rchern
@rchern: Yes, but what Scott is *calling* is String.Replace. He's calling Replace on the value returned by Regex.Replace, which is a String.
Alan Moore
+1  A: 

First let's talk about the regex you're using to remove the tags, <(.|\n)*?>. If you want the dot to match anything including a newline, you should use Singleline mode. It's also known as DOTALL mode in some flavors, because that's what it does: allows the dot to match newlines. You can use the RegexOptions.Singleline flag for that, or embed it in the regex with an inline modifier:

`(?s)<.*?>`

This is still pretty fragile, but I'll leave it at that because there's no way to make it bulletproof; regexes and HTML are fundamentally incompatible.

As for the second replacement, the first thing you need to do is break up those chained method calls--in fact, I would say they never should have been chained. Feeding the result of a Regex.Replace directly to String.Replace is either an error or excessively clever. In either case, you have to split them up if you want to call Regex.Replace twice.

You also need to escape any regex metacharacters the search expression, assuming you still want to do a literal search and not a regex search. You can use the Escape method for that.

string searchText = Request.QueryString["search"].Trim();
string encodedValue = Server.HtmlEncode(searchText);
string escapedValue = Regex.Escape(encodedValue);

string contentText = Content.Text;
contentText = Regex.Replace(contentText, @"(?s)<.*?>", string.Empty);
contentText = Regex.Replace(contentText, escapedValue, 
    "<font class='highlight2'>$&</font>", RegexOptions.IgnoreCase);
Content.Text = contentText;

There are a few other things in your code that don't seem right to me (like why you seem to be permanently removing all the tags), but I'm trying to stay focused on your actual question. To that end, I've tried to make the minimum necessary changes in the code to illustrate my answer. But there's one more thing I just have to comment on:

catch
{
    // do nothing
}

Don't do that. At the very least, send an error message to the console or rethrow the exception for the calling code to deal with, but never silently swallow them.

Alan Moore
A: 

@Alen, You the man! Thank you so much for taking the time for a thorough explanation.

Scott W.