I want to write an application that consumes RSS. I want to be able to show some items in the item description of the RSS feed as HTML, such as images, links, br, etc. However, I don't want any embedded scripts to run, unruly css elements, etc. I don't want to re-invent the wheel either. Are their any libraries that strip out just the correct level of HTML?
The issue that I am running into is that I'm generating an RSS feed from phpBB, so the posts do have br and a (link) tags already. However, a user can paste a script tag in a post and it will be encoded properly to display as text on the page.
However, when I look at the post in an RSS reader, all html in the post is encoded as < and >...etc. This blurs the distinction between the br tag and the (less than)script(greaterthan) tag as they both appear with & l t ; and & g t ;
I feel like this should be easier, and I'm just missing something obvious...I hope.