tags:

views:

78

answers:

3

Hello, I am trying to figure out what is the best way to manage the data a user inputs concerning non desirable tags he might insert:

  • strip_tags() - the tags are removed and they are not inserted in the database
  • the tags are inserted in the database, but when reading that field and displaying it to the user we would use htmlspecialchars()

What's the better, and is there any disadvantage in any of these?

Regards

+2  A: 

This depends on what your priority is:

  • if it's important to display special characters from user input (like on StackOverflow, for example), then you'll need to store this information in the database and sanitize it on display - in this case, you'll want to at least use htmlspecialchars() to display the output (if not something more sophisticated)
  • if you just want plain text comments, use strip_tags() before you stick it in the database - this way you'll reduce the amount of data that you need to store, and reduce processing time when displaying the data on the screen
Dexter
+2  A: 

the tags are inserted in the database, but when reading that field and displaying it to the user we would use htmlspecialchars()

This. You usually want people to be able to type less-than signs and ampersands and have them displayed as such on the page. htmlspecialchars on every text-to-HTML output step (whether that text came directly from user input, or from the database, or from somewhere else entirely) is the right way to achieve this. Messing about with the input is a not-at-all-appropriate tactic for dealing with an output-encoding issue.

Of course, you will need a different escape — or parameterisation — for putting text in an SQL string.

bobince
A: 

The measures taken to secure user input depends entirely on in what context the data is being used. For instance:

  • If you're inserting it into a SQL database, you should use parameterized statements. PHP's mysql_real_escape_string() works decently, as well.
  • If you're going to display it on an HTML page, then you need to strip or escape HTML tags.
  • In general, any time you're mixing user input with another form of mark-up or another language, that language's elements need to be escaped or stripped from the input before put into that context.

The last point above segues into the next point: Many feel that the original input should always be maintained. This makes a lot of sense when, later, you decide to use the data in a different way and, for instance, HTML tags aren't a big deal in the new context. Also, if your site is in some way compromised, you have a record of the exact input given.

Specifically related to HTML tags in user input intended for display on an HTML page: If there is any conceivable reason for a user to input HTML tags, then simply escape them. If not, strip them before display.

Lucas Oman