views:

77

answers:

3

I need an extensible class to parse a string and allow certain XHTML tags and attributes. If the given string contains invalid tags, they shall simply be encoded to display on the page as entered. I need to be assured that no user input will be lost.

Thank you!

A: 

Since XHTML is valid XML, you can easily use any XML tool to process this. You could use an XmlReader instance to read the XML nodes and then when you come across a tag you don't wish to include in the output, just include it in a CDATA section in the output.

You might also be able to use an XSLT transformation here as well, but I don't know if it will allow for insertion of nodes in a CDATA section.

casperOne
User will be typing in comments and such. I want to let them add links and other tags to their message. Group of users is intelligent and knows HTML. Any suggestions?
Josh Stodola
@Josh Stodola: Exactly what I said, you would walk through the XML document, and filter out the tags that you don't want as you walk through it. XHTML is XML, so you can use any XML tools you wish to process the fragment.
casperOne
Just make certain you test invalid markup with this setup. Most XML libraries throw exceptions when they encounter invalid XML (which may be completely acceptable, given your users).
AaronSieb
+1  A: 

The HtmlAgilityPack works quite well and will handle poorly formed HTML too.

MichaelGG
What does "poorly formed HTML" have to do with my question? I want a white list of allowed tags. The rest should just be encoded. HtmlAgilityPack does not help.
Josh Stodola
A: 

The SGMLReader project is also very useful library for dealing with user generated markup. (all schemas not just XHTML)

e.g. use it as a 1st stage "cleaner" to parse a markup entered by someone in a textbox and convert it to valid XML.

stephbu