views:

51

answers:

3

Output or Input filtering?

I constantly see people writing "filter you inputs", "sanitize your inputs", don't trust user data, but I only agree with the last one, where I consider trusting any external data a bad idea even if it is internal relative to the system.

Input filtering: The most common that I see. Take the form post data or any other external source of information and define some boundaries when saving it, for example making sure text is text, numbers are numbers, that sql is valid sql, that html is valid html and that it does not contain harmful markup, and then you save the "safe" data in the database.

But when fetching data you just use the raw data from the database.

In my personal opinion, the data is never really safe. Although it sounds easy, just filter everything you get from forms and url's, in reality it is much harder than that, it might be safe for one language but not another.

Output filtering: When doing it this way I save the raw unaltered data, whatever it might be, with prepared statements into the database and then filter out the problematic code when accessing the data, this has it's own advantages: This adds a layer between html and the server side script. which I consider to be data access separation of sorts.

Now data is filtered depending on the context, for example I can have the data from the database presented in a html document as plain-escaped-text, or as html or as anything anywhere.

The drawbacks here are that you must not ever forget to add the filtering which is a little bit harder than with input filtering and it uses a bit more CPU when providing data.

This does not mean that you don't need to do validation checks, you still do, it's just that you don't save the filtered data, you validate it and provide the user with a error message if the data is somehow invalid.

So instead of going with "filter your inputs" maybe it should be "validate your inputs, filter your outputs".

so should I go with "Input validation and filtering" or "Input validation and output filtering"?

A: 

The best solution is to filter both. Doing just one makes it more likely that you miss a case, and can leave you open to other types of attacks.

If you only do input filtering, an attacker could find a way to bypass your inputs and cause a vulnerability. This could be someone with access to your database entering data manually, it could be an attacker uploading a file through FTP or some other channel that is not checked, or many other methods.

If you only do output filtering, you can leave yourself open to SQL injection and other server side attacks.

The best method is to filter both your inputs and outputs. It may cause more load, but greatly reduces the risk of an attacker finding a vulnerability.

Alan Geleynse
I find that filtering both ends up in data corruption :S
YuriKolovsky
If you end up with data corruption you are probably not filtering well. Ideally, your filter should just remove data that is invalid and leave the rest so when the second filter is applied nothing is done. You just need to make sure the result is not also invalid.
Alan Geleynse
The problem here as I find it is that often enough data validation is different for different scenarios ending up with corruption if you do it for both, it's like doing htmlspecialchars twice on one string in my experience.
YuriKolovsky
A: 

Sounds like semantics to me. Either way the important thing to remember is to make sure bad data doesn't get in the system.

Doing output filtering instead of input filtering is asking for an SQL Injection .

alt text

Byron Whitlock
Agreed, sensitization=filtering, validation of input is something else (size, diversity etc). Output filtering is beneficial in many cases, and more about storing data in a normalized form, rather than with expectations of how it'll be used. Plus I heart XKCD
Rudu
SQL injections are a completely different story and the above pic has a fundamental flaw http://bobby-tables.com/ I think separating data is the solution, not input filtering.
YuriKolovsky
+1 for cartoon in response. Epic.
bpeterson76
-1 for posting this this irrelevant comic. SO people are write-only. They never read.
Col. Shrapnel
+2  A: 

There is no generic "filtering" for input and output.

Validate your input, escape your output. How you do this depends on context.

Validation is about making sure input falls within sensible ranges, like the length of strings, the numericality of dollar amounts or that a record being updated is owned by the user performing the update. This is about maintaining the logical consistency of your data and preventing people from doing things like zeroing the price of a product they are purchasing or deleting records they shouldn't have access to. It has nothing to do with "filtering" or escaping specific characters in your input.

Escaping is a matter of context, and only really makes sense when you're doing something with data that can be poisoned by injecting certain characters. Escape HTML characters in data you send to the browser. Escape SQL characters in data you send to the database. Escape quotes when you're writing data inside JavaScript <script> tags. Just be conscious of how the data you're dealing with is going to be interpreted by the system you're passing it to and escape accordingly.

meagar
@danlefree The question wasn't asking about a specific library, so my answer didn't reference a specific library.
meagar
meagar is right, I did not mean any library
YuriKolovsky