tags:

views:

125

answers:

3

I am working a lot on a PHP-based CMS at the moment, and while I'm at it I would like to move all the handling and sanitation of user input to one central place. (At the moment, it's a $_REQUEST here, a $_GET there, and so on).

I like filter_input() very much and would like to use it for basic sanitation, but I'm unclear as to whether this function is really production ready. For example, the documentation names the following parameters for $type

INPUT_GET, INPUT_POST, INPUT_COOKIE, INPUT_SERVER, INPUT_ENV, INPUT_SESSION (not implemented yet) and INPUT_REQUEST (not implemented yet).

the function exists since 5.2.0, why are two crucial elements not implemented yet? If I want to fetch data from $_REQUEST, you have to use a workaround from the user contributed notes. Is there a special reason for this? Is this function still in some kind of beta? Is it trustworthy as the first call to handle incoming data?

Maybe somebody familiar with the PHP development process can shed some light on this.

+1  A: 

In programming, you must be as restrictive on your input as possible. That goes for data sources as well. $_REQUEST contains everything in $_GET, $_POST and $_COOKIE, which may lead to problems.

Think for example what happens if a plugin of your CMS introduces a new special key in one of them, which happens to exist as a meaningful key in another plugin?

So DON'T ever use $_REQUEST. Use $_GET, $_POST or $_COOKIE, whichever fits your scenario. It's a good practice to be as strict as possible, and that has nothing to do with PHP, but with programming in general.

Flavius
Valid point for $_REQUEST, but then they should say so instead of leaving it unimplemented.
Pekka
+4  A: 

I would like to move all the handling and sanitation of user input to one central place

Yes, how lovely that would be. It can't be done. That's not how text processing works.

If you're inserting text from one context into another you need to use the right escapes. (mysql_real_escape_string for MySQL string literals, htmlspecialchars for HTML content, urlencode for URL parameters, others for specific contexts). At the start of your script when you're filtering, you don't know where your input is going to end up, so you don't know how to escape it.

Maybe one input string is going both into the database (needs to be SQL-escaped) and directly onto the page (needs to be HTML-escaped). There's no one escape that covers both those cases. You can use both escapes one after the other, but then the value in the HTML will have weird backslashes appearing in it and the copy in the database will be full of ampersands. A few rounds of this misencoding and you get that situation where every time you edit something, long strings of \\\\\\\\\\\\\\\\\\\\ and & come out.

The only way you can safely filter in one go at start time is by completely removing all characters that need to be escaped in any of the contexts you're going to be using them in. But that means no apostrophes or backslashes in your HTML, no ampersands or less-thans in your database, and probably a whole load of other URL-unfriendly punctuation has to go too. For a simple site that doesn't take arbitrary text you could maybe get away with that. But usually not.

So you can only escape on the fly when one type of text goes into another. The best strategy to avoid the problem is to avoid concatenating text into other contexts as much as much as you possibly can, for example by using parameterised queries instead of SQL string building, and either defining an echo(htmlspecialchars()) function with a nice short name to make it less work to type, or using an alternative templating system that HTML-escapes by default.

bobince
You do write awfully long answers (good explanation though).
C. Ross
@bobince: I say it can be done to a degree if you 1.) know what you will need in the script, and 2.) mark the sanitized variables as what they are. I have had my share of \\\\\\\\\\\\\\'s to know what you're talking about. :) My main goal is to have one basic "security checkpoint" with a defined set of checks, instead of pulling stuff from the arrays all over the code.
Pekka
A simple explanation is that filtering/sanitization is only part of the process your data needs to go through. Sanitized data still needs to be escaped. e.g. you would not stick an email address unquoted into a SQL query, no matter how valid it is.
Ben James
@Ben James: Yes, absolutely. But I still don't see what's wrong with pulling incoming data in one place, and run a number of generic checks on it first. Say you have a "from" parameter that redirects to a different URL. Regardless of how it is output, you have, say, the common characteristic that you don't want it to point to an external URL. I have read the links in @stereofrog's reply and acknowledge the points made; still, checks like this are something I want to check at input time, and not at output time. Maybe "sanitation" was the wrong choice of word, in that case I apologize.
Pekka
Well, you could make yourself a separate `$_POST_html_encoded` array, a `$_POST_sql_encoded` array and a `$_POST_url_encoded` array, then choose which one you needed at each point. But you'd still need to do the same for every other source of data, so each `$row` you grab from the database you have to make `$row_html_encoded`, `$row_sql_encoded` and so on. In the end this gets you nothing.
bobince
@bobince: Did you read my comment above?
Pekka
@Pekka: sure, you generally need to do application-specific, parameter-specific checks on input. But those kind of checks can't be automated or applied over everything at once as a cure-all; each has to be done individually. You *could* use `filter_input` to check each parameter one at a time, but it would be no different to just using whichever existing filter or escape you wanted for that particular field, and it wouldn't take care of escaping issues at all.
bobince
@bobince: I am not planning to automate anything. I discovered filter_input() not long ago, and thought to myself: Hey, why not use this? And if it's just to attach custom checking functions to it like, say, local_url_only(). Why not use filter_input() as an agreed coding standard to pull input from the outside rather than fetching $_XYZ[]? Still, your point is well noted. Anyway, my initial question was why the implementation of a basic PHP function is so blatantly incomplete, without any explanation, for a number of versions now. Is there a history behind this? A conflict? Sloppiness?
Pekka
Ah, okay, I guess I'm a little oversensitive to the word “sanitisation” and how it is usually applied to PHP. The filter functions are a messy bag of unrelated string-formatting functions, some of which are useful for validation but some of which are quite unsuitable for an input stage. I agree that it's a bit rough. Although INPUT_REQUEST wouldn't be needed much if you can use INPUT_POST|INPUT_GET|INPUT_COOKIE (who wants to lump cookies into form pars anyway?), it would seem trivial enough to implement. There doesn't seem to have been much development since filter moved from PECL to builtin.
bobince
I saw your comment only now. Cheers bobince, I didn't know these functions came from PECL. That explains a lot.
Pekka
+2  A: 

"input filtering" or "sanitation" is an absurd idea. Stay away from it.

Explanations and further discussion

http://stackoverflow.com/questions/129677/whats-the-best-method-for-sanitizing-user-input-with-php/130323#130323

http://stackoverflow.com/questions/1642635/what-else-should-i-be-doing-to-sanitize-user-input/1642830#1642830

stereofrog
Read my discussion with bobince above.
Pekka