views:

50

answers:

3

Hi,

I am very confused over something and was wondering if someone could explain.

In PHP i validate user input so htmlentitiies, mysql_real_escape_string is used before inserting into database, not on everything as i do prefer to use regular expressions when i can although i find them hard to work with. Now obviously i will use mysql_real_escape_string as the data is going into the database but not sure should i be using htmlentities() only when getting data from database and displaying it on a webpage as doing so before hand is altering the data entered by a person which is not keeping it's original form which may cause problems if i want to use that data later on for use for something else.

So for example, i have a guestbook with 3 fields name, subject and message. Now obviously the fields can contain anything like malicious code in js tags basically anything, now what confuses me is let say i am a malicious person and i decided to use js tags and some malicous js code and submit the form, now basically i have malicious useless data in my database. Now by using htmlentities when outputting the malicious code to the webpage (guestbook) that is not a problem because htmlentities has converted it to it's safe equivalent but then at the same time i have useless malicious code in the database that i would rather not have.

So after saying all this my question is should i accept the fact that some data in the database maybe malicious, useless data and as long as i use htmlentities on output everything will be ok or should i be doing something else aswell?.

I read so many books saying about filtering data on receiving it and escaping it on outputting it so the original form is kept but they only ever give examples like ensuring a field is only an int using functions already built into php etc but i have never found anything in regards ensuring something like a guestbook where you want users to type anything they want but also how you would filter such data apart from mysql_real_escape_string() to ensure it does not break the DB query?

Could someone please finally close this confusion for me and tell me what i should be doing and what is best practice?

Thanks to anyone who can explain.

Cheers!

+1  A: 

mysql_real_escape_string() is all you need for the database operations. It'll ensure that a malicious user can't embed something into data that'll "break" your queries.

htmlentities() and htmlspecialchars() come into play when you're working with sending stuff to the client/browser. If you want to clean up potentially hostile HTML, you'd be better off using HTMLPurifier, which will strip the data to the bedrock and hose it down with bleach and rebuild it properly.

Marc B
Wow, thank you Marc B, never knew i would get such a fast reply. Thanks for your input i am going to check that link out but also this has cleared everything up. Thankfully my site is very small so no worries but at least i can now change my code where needed and do basically what i thought i would need to do as with your confirmation aswell i feel confident now that i am on the rite track :)Obviously if anyone else wants to add any other suggestions please do.PS. Great site wish i found it ages ago, just registered :)
PHPLOVER
It's never too soon to start working on data security and integrity. There's really not that much to it, but the sooner you get into the habit of treating anything coming from outside as toxic waste, the better. As an added layer of security, you may want to investigate using PDO and prepared statements, unless you have to build queries that won't fit within its bounds.
Marc B
Thanks Marc and everyone else, Really have answered all my questions and more, i have learnt allot from making this post and feel relaxed to say the least now :)You have all been a great help so thanks to all of you.
PHPLOVER
Hi Marc, just to let you know i took a look at html purifier and it just seems way above my needs, i also did try it but i could not get it working. I found the docs to not have an ending to, they jumpy from one thing to another, i did understand how to install but it still did not work, i know that would not be the htmlpurifie's fault but it seems to OTT for me, so will just use htmlentities with ENT_QUOTES on outputting it from the database. Thanks PHPLOVER
PHPLOVER
Simple example of purifier in action in php here: http://dev.juokaz.com/php/html-filtering-and-xss-protection
Marc B
+1  A: 

This is a long question, but I think what you're actually asking boils down to:

"Should I escape HTML before inserting it into my database, or when I go to display it?"

The generally accepted answer to this question is that you should escape the HTML (via htmlspecialchars) when you go to display it to the user, and not before putting it into the database.

The reason is this: a database stores data. What you are putting into it is what the user typed. When you call mysql_real_escape_string, it does not alter what is inserted into the database; it merely avoids interpreting the user's input as SQL statements. htmlspecialchars does the same thing for HTML; when you print the user's input, it will avoid having it interpreted as HTML. If you were to call htmlspecialchars before the insert, you are no longer being faithful.

You should always strive to have the maximum-fidelity representation you can get. Since storing the "malicious" code in your database does no harm (in fact, it saves you some space, since escaped HTML is longer than unescaped!), and you might in the future want that HTML (what if you use an XML parser on user comments, or some day let trusted users have a subset of HTML in their comments, or some such?), why not let it be?

You also ask a bit about other types of input validation (integer constraints, etc). Your database schema should enforce these, and they can also be checked at the application layer (preferably on input via JS and then again server side).

On another note, the best way to do database escaping with PHP is probably to use PDO, rather than calling mysql_real_escape_string directly. PDO has more advanced functionality, including type checking.

Borealid
A: 

There's no reason to worry about having malicious JavaScript code in the database if you're escaping the HTML when it comes out. Just make sure you always do escape anything that comes out of the DB.

Skilldrick