views:

305

answers:

5

I'm writing a PHP script to grab text box data from a submitted form. These are simple text boxes and I don't want to accept any HTML tags. I think I should at least use strip_tags() and addslashes(). Anything else? I wouldn't mind restricting the input to alphanumerics, should I use a regular expression to seek out nonstandard characters?

This is a simple form that actually (ugh) gets emailed to the person processing it. (No database, sadly.) And it's simple data, first and last name sort of things.

Edit: I'd also like to know specifically what I should be looking for. What's the consensus on reasonable input filtering?

+6  A: 

Use the PHP filter functions.

You can use them for sanitizing input and validating input (eg email addresses).

There are two approaches to validation (this also applies to security and lots of other things).

Firstly, you can default to allow anything except for that which is explicitly disallowed. Or you can default ti disallowing everything except that which is specifically allowed.

Generally speaking the latter approach is more secure and should be used except in cases where you have a compelling reason not to (eg it's simply too hard to know what's allowed, you're doing an app for users who aren't deemed to be a security threat and so on).

You have to be careful using this however. For people's names characters like ' and - are perfectly valid but naive implementations may restrict them. What you want to generally avoid is:

  • SQL injection: always use mysql_real_escape_string() on any input;
  • XSS (Cross site scripting): generally speaking you should strip out HTML tags from user input. You will of course sometimes have to allow them (eg rich text editor boxes) but even in those cases you will have a list of tags that you allow and you should strip out all others (especially tags); and
  • Tpically you should strip out low characters (below ASCII 20? or so); and
  • Depending on your internationalization requirements you may want to strip out high characters (above ASCII 127).

A good default value to use is:

$var = filter_var($var, FILTER_SANITIZE_STRING);

but pick the right filter for the situation.

cletus
Thanks, but I'd also like to know specifically what I should be looking for. What's the consensus on reasonable input filtering? Editing original question accordingly.
lynn
+1  A: 

This is a very common question with alot of not so clear answers. Functions like addslashes() can actually do more harm than good in some setups. Some basic rules to follow when dealing with user input, is don't trust anything and if it's not in the format you are expecting, don't try and fix it just raise an error.

If you only require alphanumeric, then a simple regex will handle that but a little more information would help.

What are you going to be doing with the data? How are you currently (or planning on) handling the input, e.g., user submits a form, you process the form and store data in a DB to later display (like a comment engine).

Edit: If it is as simple as sending a text box via email for a human to process. My biggest concerns would be XSS and smtp header injection (depending on how the email is being sent). Try and go with the simplest solution, If you just need to receive alpha-numeric data for now use a regex and only accept that. Another solution would be to use htmlentities with ENT_QUOTES.

Gerry
Simple form that actually (ugh) gets emailed to the person processing it. And it's simple data, first and last name sort of things. (Editing question.)
lynn
+1  A: 

I don't want to accept any HTML tags. I think I should at least use strip_tags()

Maybe, but not if you want to allow people to type ‘<’/‘>’ characters that just mean less-than and greater-than, and aren't anything to do with tags.

On input for free-text fields you won't really want to filter out much more than the non-newline control characters (which you usually don't want anywhere), and, if you are using UTF-8, invalid/redundant sequences.

Then when you output the value back to the page you will of course remember to use htmlspecialchars() so that ‘<’ gets escaped to ‘&lt;’ and appears as a literal ‘<’ on-screen, right? You need to be using htmlspecialchars() any time you output a text value into HTML in a template, regardless of whether that string came from a form submission, or the database, or somewhere else.

For non-free-text fields where you want all input to match a particular restricted format, then yes, a regexp can be a good way to match this.

and addslashes().

addslashes() is almost always the wrong thing. A good rule of thumb is: don't use this.

addslashes() is inadequate for SQL escaping because it does not match the actual SQL string literal escape format, so you can construct strings that are still dangerous when addslashed. When you're using MySQL, you should use mysql_real_escape_string() instead. Other databases have their own particular escaping functions. Use them (or, easier, use parameterised queries so you don't have to manually escape text to SQL at all).

(addslashes() is inadequate for HTML escaping because it doesn't attempt to do anything with HTML special characters at all. That's not what it's for.)

In any case, trying to cope with output-escaping at the input filtering stage is backwards. Instead, keep all the strings that are internal to your application as plain text, and escape them on the way out of the application: mysql_real_escape_string when they're going out to take part in an SQL query, htmlspecialchars() when they're going out onto an HTML page, and so on.

bobince
A: 

I have a few php forms and am looking for a complete code to stop < and > tags from being submitted in the form. Can anyone email me the complete code that I may add to my form page? My email is [email protected]

anomaly
A: 

I'm just displaying user input...its just an open discussion forum kind of thing... here is the link to my site: http://smokescreen.freehostia.com/vikas/main.htm

vikas