tags:

views:

302

answers:

1

In a .net web application I talk to a 3rd party CMS api which gives back html. I need to convert it to well formed xml, so I use an .NET wrapper around HTML tidy. This generates a nice DOM, but things go wrong when characters such as   are used.

I need those to be converted to their code format like   in order for an XmlDocument to accept it.

Can't set any more options on the tidy wrapper other than making it XHTML. So with the string returned, I need to do some magic, but I think it will come down to regular expressions using a mapping of my own right?

+1  A: 

If your .NET wrapper lets you give tidy all the options possible on the command line and in its config file, you should get what you need by setting 'numeric-entities' and 'output-xml' both to 'true'.

ChuckB