views:

267

answers:

2

I want to write an application that consumes RSS. I want to be able to show some items in the item description of the RSS feed as HTML, such as images, links, br, etc. However, I don't want any embedded scripts to run, unruly css elements, etc. I don't want to re-invent the wheel either. Are their any libraries that strip out just the correct level of HTML?

The issue that I am running into is that I'm generating an RSS feed from phpBB, so the posts do have br and a (link) tags already. However, a user can paste a script tag in a post and it will be encoded properly to display as text on the page.

However, when I look at the post in an RSS reader, all html in the post is encoded as < and >...etc. This blurs the distinction between the br tag and the (less than)script(greaterthan) tag as they both appear with & l t ; and & g t ;

I feel like this should be easier, and I'm just missing something obvious...I hope.

A: 

Your question isn't perfectly clear, but typically when trying to clean up html for output you want to only allow a whitelist of tags.

Here's a javascript implementation of strip_tags you could easily adapt to .net

http://kevin.vanzonneveld.net/techblog/article/javascript_equivalent_for_phps_strip_tags/

jayrdub
A: 

I figured it out. I was using a RSS script that was causing the html-encoded angle brackets to be 'mixed in' with the real html in the rss feed

This is waht the source looked like in phpBB:

<a href="link">link</a>
&lt;script&gt;alert("hack you");&lt;/script&gt;

But in my rss feed, it was being generated as: (notice no distinction between escaped html and the non-escaped html)

&lt;a href="link"&gt;link&lt;/a&gt;
&lt;script&gt;alert("hack you");&lt;/script&gt;

I made a change to the rss.php file so it turned it into this:

&lt;a href="link"&gt;link&lt;/a&gt;
&amp;lt;script&amp;gt;alert("hack you");&amp;lt;/script&amp;gt;

That way it was displayed in the RSS feed properly.

Thanks!

Joel