tags:

views:

321

answers:

3

I have a string that contains dynamic HTML content.

I want to be able to find and replace all occurrances of specific HTML tags and replace them, but not the content within them.

The specific HTML tags would be for a table - i.e. TABLE, TR, and TD. The tags may contain attributes, or they may not. How would one go about doing this in C#?

Thanks in advance for any help!

+4  A: 

Don't use Regexs. Use the Html Agility Pack.

See this question for why not.

John Gietzen
A: 
  e = "(< *?/*)div( +?|>)";
  repl = "\\1boo\\2"; 

Frankly I am befuddled by this mantra being imposed on everyone to never use regex for html.

Mark
Read the article: http://www.codinghorror.com/blog/archives/001311.html
TrueWill
I Read it. The OP at least is only diatribe, assertion, humor and hyperbole. Understanding going in that html is in a different language class may clue you in to the causes for why your query in a particular case may be getting unwieldy. But that doesn't mean every sort of operation you might need to perform on HTML would be effected by the language class of HTML. Admittedly the solution I give above is not complete, as it will perform the transformation on even comments and on quoted content of attributes. But at least for excluding comments a simple addition would suffice.
Mark
Excluding quoted sections not a problem either.
Mark
I inadvertently just read the quoted part of that codinghorror - I'll read the rest.
Mark
OK, this is my diatribe I guess. Natural language is in the highest language class of all - much higher than even regular expressions or html. Does that mean regex should never be used to alter text written by a human? Maybe you should only use a competely accurate natural language parser. In that case be prepared to wait maybe another decade at least until such a thing exists.)
Mark
A: 

This function might be sufficient:

public static string ReplaceTag(string input, string soughtTag, string replacementTag)
{
    return Regex.Replace(input, "(</?)" + soughtTag + @"((?:\s+.*?)?>)", "$1" + replacementTag + "$2");
}
Nick Higgs