ansaurus

Question

Convert special chars to HTML entities, without changing tags and parameters.

Answer 1

A:

I would suggest parsing through each element using Linq to Xml and encoding the value of each element and attribute node. I'll try to come up with some code but hey it's 5pm on a Friday!

David 2009-12-11 22:55:52

Answer 2

+1 A:

If you've got a mixture of < meaning start a tag and < meaning a literal less-than sign, you can't possibly tell which is ‘a tag’ to ignore and which isn't.

About all you could do would be to detect < usages that weren't a conventionally-formed start or end tag, using a nasty unreliable regex something like:

<(?!\w+(\s+\w+="[^"<]*")*\s*/?>|/\w+\s*>)

and replace them with <. Similarly for & with &:

&(?!\w+;|#\d+;|#x[0-9A-Fa-f]+;)

(> does not normally have to be escaped.)

This won't allow every possible valid way of constructing elements, and it will allow broken mis-nested elements, and non-existent entities, and would mess up non-element constructs like comments. Because regex can't parse HTML, let alone HTML with added crunchy broken bits.

So it's hardly foolproof. If you want proper markup that won't break your page when they accidentally leave a div open, the best first step is to parse it as XHTML and refuse it with an error if it's not well-formed XML.

If you have a rich text editor component that generates output where a literal < is not escaped, then it's time to replace that component with something less appalling. But in general it's not a good idea to let users create HTML, because they're really rubbish at it. Plus allowing anyone to input HTML gives them complete control over wrecking the site and its security with JavaScript. A simpler text-markup language is often a win.

bobince 2009-12-11 23:07:45

literal < is precisly the only one that is escaped!

backslash17 2009-12-11 23:13:22

So your only problem is bare ampersands? The second regex should fix that.

bobince 2009-12-11 23:57:58

Answer 3

A:

After searching a lot, I've found that I was using the wrong property of the FreeTextBox component. The property was ConvertHtmlSymbolsToHtmlCodes wich has to be true.

It also helps to use FormatHtmlTagsToXhtml if you need to insert your code into XHTML pages, because it uses a strong validation with tags parameters and quotes surrounding them.

backslash17 2009-12-23 16:44:10

ansaurus

tags:

views:

answers:

Convert special chars to HTML entities, without changing tags and parameters.

related questions