tags:

views:

2129

answers:

5

Hi

I would like to parse out any HTML data that is returned wrapped in CDATA.

As an example <![CDATA[<table><tr><td>Approved</td></tr></table>]]>

Thanks!

+1  A: 

I know this might seem incredibly simple, but have you tried string.Replace()?

string x = "<![CDATA[<table><tr><td>Approved</td></tr></table>]]>";
string y = x.Replace("<![CDATA",string.Empty).Replace("]]>", string.Empty);

There are probably more efficient ways to handle this, but it might be that you want something that easy...

Scott Anderson
+1  A: 

Not much detail, but a very simple regex should match it if there isn't complexity that you didn't describe:

/<!\[CDATA\[(.*?)\]\]>/
Chad Birch
Beat me to it. :) +1
Tomalak
Though I don't think escaping "<" is really necessary.
Tomalak
Escaping < and > is not necessary in c# regex
patjbs
Thanks, updated.
Chad Birch
+1  A: 

The regex to find CDATA sections would be:

(?:<!\[CDATA\[)(.*?)(?:\]\]>)
Tomalak
+3  A: 

The expression to handle your example would be

\<\!\[CDATA\[(?<text>[^\]]*)\]\]\>

Where the group "text" will contain your HTML.

The C# code you need is:

using System.Text.RegularExpressions;
RegexOptions   options = RegexOptions.None;
Regex          regex = new Regex(@"\<\!\[CDATA\[(?<text>[^\]]*)\]\]\>", options);
string         input = @"<![CDATA[<table><tr><td>Approved</td></tr></table>]]>";

// Check for match
bool   isMatch = regex.IsMatch(input);
if( isMatch )
  Match   match = regex.Match(input);
  string   HTMLtext = match.Groups["text"].Value;
end if

The "input" variable is in there just to use the sample input you provided

Ron Harlev
thank you! very helpful solution
Garrett
A: 
Regex r = new Regex("(?<=<!\[CDATA\[).*?(?=\]\])");
patjbs
The regex is wrong. CDATA can contain "]".
Tomalak
Fixed! Sorry, didn't know that was valid in there :)
patjbs