views:

91

answers:

2

I have a vb.net class that cleans some html before emailing the results.

Here is a sample of some html I need to remove:

    <div class="RemoveThis">
      Blah blah blah<br /> 
      Blah blah blah<br /> 
      Blah blah blah<br /> 
      <br /> 
    </div>

I am already using RegEx to do most of my work now. What would the RegEx expression look like to replace the block above with nothing?

I tried the following, but something is wrong:

'html has all of my text
html = Regex.Replace(html, "<div.*?class=""RemoveThis"">.*?</div>", "", RegexOptions.IgnoreCase)

Thanks.

+2  A: 

Add the Singleline option:

html = Regex.Replace(html, "<div.*?class=""RemoveThis"">.*?</div>", "", RegexOptions.IgnoreCase Or RegexOptions.Singleline)

From MSDN:

Singleline: Specifies single-line mode. Changes the meaning of the dot (.) so it matches every character (instead of every character except \n).

PS: Parsing HTML with regular expressions is discouraged. Your code will fail on something like this:

<div class="RemoveMe">
    <div>bla</div>
    <div>bla</div>
</div>
Heinzi
Thanks, but it is not working. The things I want to remove only have text and <br /> in them.
Bobby Ortiz
Are you sure? I tried it here: http://regexlib.com/RETester.aspx and it seems to work fine...
Heinzi
:( Yes. I am sure. I think there is something different about the .NET version. Or, it could be that my html string has alot more text. 5K atleast.
Bobby Ortiz
Strange. Unfortunately, I don't have Visual Studio available right now (I'm on a Linux machine at university), but I'll test it at home tonight (CET), unless someone else finds the solution earlier.
Heinzi
Well. I double checked, and it did work. Thanks.
Bobby Ortiz
That's good to hear, thanks for the feedback!
Heinzi
+2  A: 

Try:

RegexOptions.IgnoreCase Or RegexOptions.Singleline

The RegexOptions.Singleline option changes the meaning of the dot from 'match anything except new line' to 'match anything'.

Also, you should consider using an HTML parser instead of regular expressions if need to parse HTML.

Mark Byers