views:

541

answers:

7

I am creating a forum software using php and mysql backend, and want to know what is the most secure way to escape user input for forum posts.

I know about htmlentities() and strip_tags() and htmlspecialchars() and mysql_real_escape_string(), and even javascript's escape() but I don't know which to use and where.

What would be the safest way to process these three different types of input (by process, I mean get, save in a database, and display):

  1. A title of a post (which will also be the basis of the URL permalink).
  2. The content of a forum post limited to basic text input.
  3. The content of a forum post which allows html.

I would appreciate an answer that tells me how many of these escape functions I need to use in combination and why. Thanks!

+4  A: 

mysql_real_escape_string() escapes everything you need to put in a mysql database. But you should use prepared statements (in mysqli) instead, because they're cleaner and do any escaping automatically.

Anything else can be done with htmlspecialchars() to remove HTML from the input and urlencode() to put things in a format for URL's.

Rusky
If the "content" field can contain some HTML, htmlspecialchars is not to be used on it : it'll escape all HTML, including the tags that are "allowed"
Pascal MARTIN
Exactly. htmlspecialchars() would be for the content limited to basic text input.
Rusky
+5  A: 

When generating HTLM output (like you're doing to get data into the form's fields when someone is trying to edit a post, or if you need to re-display the form because the user forgot one field, for instance), you'd probably use htmlspecialchars : it will escape <, >, ", ', and & -- depending on the options you give it.

strip_tags will remove tags if user has entered some -- and you generally don't want something the user typed to just disappear ;-)
At least, not for the "content" field :-)


Once you've got what the user did input in the form (ie, when the form has been submitted), you need to escape it before sending it to the DB.
That's where functions like mysqli_real_escape_string become useful : they escape data for SQL

You might also want to take a look at prepared statements, which might help you a bit ;-)
with mysqli - and with PDO

You should not use anything like addslashes : the escaping it does doesn't depend on the Database engine ; it is better/safer to use a function that fits the engine (MySQL, PostGreSQL, ...) you are working with : it'll know precisely what to escape, and how.


Finally, to display the data inside a page :

  • for fields that must not contain HTML, you should use htmlspecilchars : if the user did input HTML tags, those will be displayed as-is, and not injected as HTML.
  • for fields that can contain HTML... This is a bit trickier : you will probably only want to allow a few tags, and strip_tags (which can do that) is not really up to the task (it will let attributes of the allowed tags)
    • You might want to take a look at a tool called HTMLPUrifier : it will allow you to specify which tags and attributes should be allowed -- and it generates valid HTML, which is always nice ^^
    • This might take some time to compute, and you probably don't want to re-generate that HTML each time is has to be displayed ; so you can think about storing it in the database (either only keeping that clean HTML, or keeping both it and the not-clean one, in two separate fields -- might be useful to allow people editing their posts ? )


Those are only a few pointers... hope they help you :-)
Don't hesitate to ask if you have more precise questions !

Pascal MARTIN
+1  A: 

The answer to this post is a good answer

Basically, using the pdo interface to parameterize your queries is much safer and less error prone than escaping your inputs manually.

Charles Ma
You can also do that with MySQLi, btw.
Rusky
A: 
Dave
Uhhh... Ugly code!
Alix Axel
Ugly Sitiation (sic)
Dave
+3  A: 

There are two completely different types of attack you have to defend against:

  • SQL injection: input that tries to manipulate your DB. mysql_real_escape_string() and addslashes() are meant to defend against this. The former is better, but parameterized queries are better still
  • Cross-Site scripting (XSS): input that, when displayed on your page, tries to execute JavaScript in a visitor's browser to do all kinds of things (like steal the user's account data). htmlspecialchars() is the definite way to defend against this.

Allowing "some HTML" while avoiding XSS attacks is very, very hard. This is because there are endless possibilities of smuggling JavaScript into HTML. If you decided to do this, the safe way is to use BBCode or Markdown, i.e. a limited set of non-HTML markup that you then convert to HTML, while removing all real HTML with htmlspecialchars(). Even then you have to be careful not to allow javascript: URLs in links. Actually allowing users to input HTML is something you should only do if it's absolutely crucial for your site. And then you should spend a lot of time making sure you understand HTML and JavaScript and CSS completely.

Michael Borgwardt
A: 

First of all, general advice: don't escape variables literally when inserting in the database. There are plenty of solutions that let you use prepared statements with variable binding. The reason to not do this explicitly is because it is only a matter of time then before you forget it just once.

If you're inserting plain text in the database, don't try to clean it on insert, but instead clean it on display. That is to say, use htmlentities to encode it as HTML (and pass the correct charset argument). You want to encode on display because then you're no longer trusting that the database contents are correct, which isn't necessarily a given.

If you're dealing with rich text (html), things get more complicated. Removing the "evil" bits from HTML without destroying the message is a difficult problem. Realistically speaking, you'll have to resort to a standardized solution, like HTMLPurifier. However, this is generally too slow to run on every page view, so you'll be forced to do this when writing to the database. You'll also have to ensure that the user can see their "cleaned up" html and correct the cleaned up version.

Definitely try to avoid "rolling your own" filter or encoding solution at any step. These problems are notoriously tricky, and you run a large risk of overlooking some minor detail that has big security implications.

Joeri Sebrechts
A: 

I second Joeri, do not roll your own, go here to see some of the the many possible XSS attacks

http://ha.ckers.org/xss.html

htmlentities() -> turns text into html, converting characters to entities. If using UTF-8 encoding then use htmlspecialchars() instead as the other entities are not needed. This is the best defence against XSS. I use it on every variable I output regardless of type or origin unless I intend it to be html. There is only a tiny performance cost and it is easier than trying to work out what needs escaping and what doesn't.

strip_tags() - turns html into text by removing all html tags. Use this to ensure that there is nothing nasty in your input as a adjunct to escaping your output.

mysql_real_escape_string() - escapes a string for mysql and is your defence against SQL injections from little Bobby tables (better to use mysqli and prepare/bind as escaping is then done for you and you can avoid lots of messy string concatenations)

The advice given obve re avoiding HTML input unless it is essential and opting for BBCode or similar (make your own up if needs be) is very sound indeed.