Stop!
You're making a mistake here. Oh, no, you've picked the right PHP function calls to make data a bit safer, that's fine. Your mistake is one of order of operations and the intent of the functions.
When users submit data, you need to make sure that the data is the form you expect. If you expect something to be a number, make sure it's a number. If it needs to be an integer between 1 and 10, make sure it's really an integer between 1 and 10. If it appeared in a drop-down menu, make sure that the submitted value would have appeared in the menu. Same for radio buttons. If the field shouldn't have HTML in it, make sure to remove HTML or neutralize it. If the field should have HTML in it, make sure only the parts of HTML that you like are included.
Check out the featureful but weird feeling built-in filter functions, and learn to love regexes.
This is data validation. Validation is not the same thing as sanitization. You need both!
When you insert data into the database, that's when you need to sanitize. Every single database API does it differently. You've already discovered mysql_real_escape_string
, but that's not a good thing. The "mysql" API lacks a feature called prepared staements. Prepared statements let you use placeholders in your query:
SELECT ... FROM ... WHERE fieldname = ? -- That question mark is a placeholder
When working with placeholders, the very act of filling in the placeholder automatically sanitizes the data for you. If you're working with MySQL, check out the "mysql*i*" extension (prepare, bind, execute) and the PDO extension (prepare, bind, execute). Learning PDO will be worth it if you expect to work with other database types in the future, like SQLite or Postgres.
Yes, it's a bit more work than putting the string together yourself, but the end result is code you know is more secure against SQL injection.
When displaying data to the user, you need to make sure that malicious bits haven't found their way in. Unless you know that the data is completely safe and sane (numbers pulled from a database, for example), you should run pretty much everything through htmlspecialchars, unless you know it contains only safe or pre-sanitized HTML.
Overall, you need to remember to use the right type of data filtering on the right data at the right time. Don't run database escaping code on variables that will never see the database. HTML filtering is entirely unnecessary if the field shouldn't contain HTML, but must contain a value you can validate, like a number or something from a select menu.
Addendum: Others recommend htmlentities
instead of htmlspecialchars
. htmlentities
turns HTML characters into entities, and then goes one step further and also turns things like accented characters into entities. This might not be what you want. In fact, if you've set up your character sets correctly, you probably don't need to do that. Keep this in mind.