ansaurus

Question

Replacing <p>, <div> tags within <td> tags?

Answer 1

+2 A:

Have you thought about looking into the HTML Agility Pack, which would have a lot of parsing options built in in which to manipulate tags?

Dillie-O 2009-07-23 17:50:18

I'd prefer not to use a library; see above.

NickAldwin 2009-07-23 18:03:14

Answer 2

A:

I don't have an answer as far as writing it with Regular Expressions, but I'd highly recommend the HTML Agility Pack for something like this. You should be able to find the nodes easily with a simple selector and just replace them with whatever you want.

Chris Doggett 2009-07-23 17:52:15

I'd prefer not to use a library; see above.

NickAldwin 2009-07-23 18:04:00

Answer 3

A:

So if you can't use the agility pack. What if you created a simple match that checked for the existence of the block. If it exists then you can do all the proper replacements for tags within the block, otherwise have a second set of replacements that works for tags not within the block.

No need to rewrite the existing replacements, just creating one more simple one for your other condition. I guess this would depend on how much text is getting parsed in one "unit" of HTML stripping.

Dillie-O 2009-07-23 21:22:15

It varies between one line and an entire document.

NickAldwin 2009-07-24 04:20:32

Answer 4

+2 A:

Found the answer:

  // remove p/div/tr inside of td's
  result = System.Text.RegularExpressions.Regex.Replace(result, @"<td\b(?:[^>""']|""[^""]*""|'[^']*')*>.*?</td\b(?:[^>""']|""[^""]*""|'[^']*')*>", new MatchEvaluator(RemoveTagsWithinTD));

This code calls this separate method for each match:

  //a separate method
  private static string RemoveTagsWithinTD(Match matchResult) {
      return Regex.Replace(matchResult.Value, @"<(div|tr|p)\b(?:[^>""']|""[^""]*""|'[^']*')*>", "");
    }

This code was (again) based on another recipe from the Regular Expressions Cookbook (which was sitting in front of me the whole time, d'oh!). It's really a great book.

NickAldwin 2009-07-30 16:37:13

I'm glad to hear you like Regular Expressions Cookbook. If any of your friends don't have a copy yet, O'Reilly and I are doing a giveaway at regexguru.com in which anyone can participate until the end of the month (28 Feb 2010).

Jan Goyvaerts 2010-02-25 01:47:33

ansaurus

tags:

views:

answers:

Replacing <p>, <div> tags within <td> tags?

related questions