views:

332

answers:

5

Hi guys. I run a website (sorta like a social network) that I wrote myself. I allow the members to send comments to each other. In the comment; i take the comment and then call this line before saving it in db..

$com = htmlentities($com);

When I want to display it; I call this piece of code..

$com = html_entity_decode($com);

This works out well most of the time. It allows the users to copy/paste youtube/imeem embed code and send each other videos and songs. It also allows them to upload images to photobucket and copy/paste the embed code to send picture comments.

The problem I have is that some people are basically putting in javascript code there as well that tends to do nasty stuff such as open up alert boxes, change location of webpage and things like that.. I am trying to find a good solution to solving this problem once and for all.. How do other sites allow this kind of functionality?

Thanks for your feedback

+1  A: 

Yes, this is a problem. A lot of sites solve it by only allowing their own custom markup in user fields.

But if you really want to allow HTML, you'll need to scrub out all "script" tags. I believe there are libraries available that do this. But that should be sufficient to prevent JS execution in user-entered code.

levand
+2  A: 

I so hope you are scrubbing the data before you send it to the database. It sounds like you are a prime target for a SQl injection attack. I know this is not your question, but it is something that you need to be aware of.

WolfmanDragon
i thought htmlentities would do it :(
shergill
htmlentities is a insanely poor method of preventing SQL injection. Injection has little to nothing to do with HTML.
Syntax
ok.. just added mysql_real_escape_string (and a stripslashes when i read it out).. thanks!
shergill
A: 

This is how Stackoverflow does it, I think, over at RefacterMyCode.

Peter Coulton
That is a very good solution, however I think his question is specifically PHP-related.
Cal Jacobson
But it's the same problem being solved, he just needs to translate it.
Peter Coulton
i guess what i'll have to do is rewrite this to handle the <object> and <embed> tags and such..
shergill
@Peter: yes, but if PHP libraries already exist to do this, why reinvent the wheel?
Cal Jacobson
@Carl.. what php library? (other than strip_tags)
shergill
@Carl: I agree, using something like the Zend Framework if the functinality is already there makes all kinds of sense.
Peter Coulton
@shergill: I've used Zend_Filter (see elsewhere in this thread)
Cal Jacobson
Stack Overflow doesn't allow a user to embed HTML code in their site. All strings are escaped.
troelskn
@troelskn: That is not correct - http://stackoverflow.com/questions/31657/what-html-tags-are-allowed-on-stack-overflow
Peter Coulton
I stand corrected.
troelskn
A: 

You may want to consider Zend Filter, it offers a lot more than strip_tags and you do not have to include the entire Zend Framework to use it.

Cal Jacobson
Correct me if I'm wrong, but isn't Zend_Filter just a wrapper around various buil-ins, such as strip_tags ?
troelskn
Zend_Filter_HtmlEntities seems to be little more than a wrapper, but Zend_Filter_StripTags also provides whitelist control over tag attributes (not just the tags themselves). Zend_Filter also provides other convenient filters such as Alnum and RealPath. Personally, I think it's a useful library.
Cal Jacobson
+7  A: 

First: htmlentities or just htmlspecialchars should be used for escaping strings that you embed into HTML. You shouldn't use it for escaping string when you insert them into a SQL query - Use mysql_real_escape_string (For MySql) or better yet - use prepared statements, which have bound parameters. Make sure that magic_quotes are turned off or disabled otherwise, when you manually escape strings.

Second: You don't unescape strings when you pull them out again. Eg. there is no mysql_real_unescape_string. And you shouldn't use stripslashes either - If you find that you need, then you probably have magic_quotes turned on - turn them off instead, and fix the data in the database before proceeding.

Third: What you're doing with html_entity_decode completely nullifies the intended use of htmlentities. Right now, you have absolutely no protection against a malicious user injecting code into your site (You're vulnerable to cross site scripting aka. XSS). Strings that you embed into a HTML context, should be escaped with htmlspecialchars (or htmlentities). If you absolutely have to embed HTML into your page, you have to run it through a cleaning-solution first. strip_tags does this - in theory - but in practise it's very inadequate. The best solution I currently know of, is HtmlPurifier. However, whatever you do, it is always a risk to let random user embed code into your site. If at all possible, try to design your application such that it isn't needed.

troelskn
Thanks for the detailed answer. I guess I dont really understand the 'htmlentities'.. so basically if i use Htmlpurifier; then I can just store the 'purified' html in the DB as is?
shergill
Yes. You use htmlentities if you don't want the data to be interpreted as HTML. In most cases, you should default to use that on strings that you insert into HTML. But in this case, you just pass it through (After filtering it through HtmlPurifier)
troelskn
And just to make sure it's clear: You still need to escape strings when you insert them into SQL. So: $_POST -> HtmlPurifier -> mysql_real_escape_string. When you select the data out from the database, *don't* unescape in any way.
troelskn