views:

47

answers:

3

I have data that is coming in from a rss feed. I want to be safe and use htmlentities but then again if I use it if there is html code in there the page is full of code and content. I don't mind the formatting the rss offers and would be glad to use it as long as I can display it safely. I'm after the content of the feed but also want it to format decently too (if there is a break tag or paragraph or div) Anyone know a way?

+1  A: 

Do you want to protect from XSS in the feed? If so, you'll need an HTML sanitizer to run on the HTML prior to displaying it:

  1. HTMLSanitizer
  2. HTMLPurifier

If you just want to escape whatever is there, just call htmlspecialchars() on it. But any HTML will appear as escaped text...

ircmaxell
+1  A: 

You can use the strip_tags tags function and specify the allowed tags in there:

echo strip_tags($content, '<p><a>');

This way any tag not specified in allowed tags will be removed.

Sarfraz
That doesn't take away other bad things such as javascript events or other potentially dangerous attributes...
ircmaxell
So if I don't have any allowed tags then it still isn't safe? Don't you need a tag to define javascript and wouldn't strip_tag remove the javascript tag? What do you mean by dangerous attributes?
keith
I currently have this...htmlentities(strip_tags($description)).Is that not safe?
keith
@keith: It will remove all tags even the script tag but when you allow certain tags, you will still have those tags unchanged. So you need to specify what is allowed in your html you are expecting from rss in the second parameter. For your second comment, it should be safe but then you don't have any html just plain text, for this reason you can specify allowed tags.
Sarfraz
A: 

You can transform the HTML into mark down and then back up again using various libraries.

Allain Lalonde