views:

27

answers:

2

Hello everybody. After I implemented my sanitize functions (according to requested specifics), my boss decided to change the accepted input. Now he wants to keep some specific tag and its attributes. I suggested to implement a BBCode-like language which is safer imho but he doesn't want to because it would be to much work.

This time I would like to keep it simple so I will not kill him the next time he asks me to change again this thing. And I know he will.

Is it enough to use first the strip_tags with the tag parameter to preserve and then htmlentities?

A: 

You can keep specific tags using strip_tags with this syntax: strip_tags($text, '<p><a>');

That snippet would strip all tags except p and a. Attributes are kept for tags you have allowed (p and a in the above example).

However, this doesn't mean that the attributes are safe. Does he want specific attributes or does he want to keep all of them on allowed tags? For the first case, you would need to parse each tag and remove the ones desired, sanitizing the values. To keep all attributes on allowed tags, you still need to sanitize them. I would recommend running htmlentities on the attribute values to sanitize them (for display, I would assume).

SimpleCoder
Nope, no specific request about what attribute to keep. Actually I was going to use `htmlentities` also to store data in the db.
dierre
@dierre: HTML-encoding is an output-stage concern only. Don't pollute your database with encoded data, it'll make it hard to search, use database string-manipulation, and use in non-HTML contexts. HTML-encode, using `htmlspecialchars()`, at the output stage, where you put text into HTML, and not before.
bobince
@dierre: don't use `htmlentities` for storage. Use `mysql_real_escape` or similar. Also, `htmlentities` may botch certain attributes so play around with it. However, you still want to escape the attributes so some wiseguy can't do this: `<img src="javascript: alert ('pwned');"/>`. I would check out the link @bobince provided below.
SimpleCoder
+2  A: 

strip_tags does not necessarily result in safe content. strip_tags followed by htmlentities would be safe, in that anything HTML-encoded is safe, but it doesn't make any sense.

Either the user is inputting plain text, in which case it should be output using htmlspecialchars (in preference to htmlentities), or they're inputting HTML markup, in which case you need to parse it properly, fixing broken markup and removing elements/attributes that aren't in a safe whitelist.

If that's what you want, use an existing library to do it (eg. htmlpurifier). Because it's not a trivial task and if you get it wrong you've given yourself XSS security holes.

bobince
Yeah, e.g. javascript function we'll be called if they're in an attribute of an allowed tag. +1 for the link to htmlpurifier. I'll have a look.
dierre