views:

171

answers:

3

Hi guys, im wondering about how to set up a clever way to have all my input 'clean', a procedure to run at the begin of every my script. I thought to create a class to do that, and then, add a 2 letter prefix in the begin of every input to identify the kind of input, for example:

in-mynumber
tx-name
ph-phone
em-email

So, at the top of my scripts i just run a function (for example):

function cleanInputs(){
    foreach($_GET AS $taintedKey => $taintedValue){
        $prefix = substr($taintedKey, 0, 2);
        switch($prefix){
            case 'in':
                //I assume this input is an integer
                $cGet[$taintedKey] = intval($taintedValue);
                break;
            case 'tx':
                //i assume this input is a normal text
                //can contains onely letters, numbers and few symbols
                if(preg_match($regExp, $taintedValue)){
                    $cGet[$taintedKey] = $taintedValue;
                }else{
                    $cGet[$taintedKey] = false;
                }
                break;
            case 'em':
                //i assume this input is a valid email
                if(preg_match('/^[a-zA-Z0-9-_.]+@[a-zA-Z0-9-_.]+.[a-zA-Z]{2,4}$/', $taintedValue)){
                    $cGet[$taintedKey] = $taintedValue;
                }else{
                    $cGet[$taintedKey] = false;
                }
                break;
        }
    }
}

..so i'll create other 2 arrays, $cGet and $cPost with the clean data respectively of $_GET and $_POST, and in my script i'lllook for use those arrays, completely forget the $_GET/$_POST I'm even thinkin about add a second prefix to determinate the input's max lenght... for example: tx-25-name ..but im not pretty sure about that.. and if i take this way, maybe a OOP approach will be better.

What do you think about that? Seem be a good way to use?

The negatives point that i can actually see (i havent still used that way, is just a wonder of this morning) 1. The prefix, and so the procedures, must be many if i want my application not to be much restrictive; 2. My sent variable's names will become little longer (but we are talking of 3-6 chars, shouldnt be a problem)

Any suggestion is really appreciated!

EDIT:

Im not triyn to reinvent the wheel, my post was't about the sistem to sanitizing input, but is about the procedure to do it. I use htmlpurifier to clen the possibly xss injection in html data, and of course i use the parametrized queryes. Im just wondering if is better take input by input, or sanitize them all at the begin and consider they clean in the rest of the script. The method i thougt is not miracolous and nothing new under the sun, but i think that truncate the input if is not in the format that i aspect, can be usefull...

Why check for sql injection in the 'name' field, that must contain just letters and the apostrophe char? Just remove everythings that is not letter or apostophe, add slashes for the last one, and run into a parametrized query. Then, if you aspect an email, just delete everythings that is not an email..

A: 

What are you trying to do? If you need to sanitize input to save data to the database, there's nothing better than parameterized queries.

See this for an example.

Anton Gogolev
which is? (parameterized queries that is...)
Svish
Parametrized queries go something like this:ExecuteQuery("INSERT INTO MyTable (Field1, Field2) VALUES (?, ?)", $Param1, $Param2);The idea is that the query string contains some markers (like the question signs) that corresspond to parameters given later. The query is sent like this to the server, which parses it with all the question marks too. When it's time to execute, the server is also given the parameter values *seperately from the query*. There is no risk of injection whatsoever.
Vilx-
The precise syntax of doing this varies for different DB servers. For MySQL you can see an example here: http://php.net/manual/en/mysqli.prepare.php
Vilx-
+2  A: 

There are many well-made PHP tested classes that already sanitize inputs. Why make another one? Besides, sanitizing input is more than just verifying data types. It implies checking for sql injections, xss attacks, etc...

m_oLogin
+1 for recommending tried-and-true existing code. If you depend on explicitly checking for known exploits vectors, you're having a problem though.
David Schmitt
Im not triyn to reinvent the wheel, my post was't about the sistem to sanitizing input, but is about the procedure to do it.I use htmlpurifier to clen the possibly xss injection in html data, and of course i use the parametrized queryes.Im just wondering if is better take input by input, or sanitize them all at the begin and consider they clean in the rest of the script.
DaNieL
A: 

The idea is fine in itself, however I wonder if it really will be very useful.

For one thing, SQL injections and HTML injections can (should) be protected in another way. SQL injections are prevented by parametrized queries (a must-have this day and age); and HTML injections are prevented by htmlspecialchars() method, which should be called right before outputting the string to the user. Don't store encoded strings in the DB or (even worse) - encode them as soon as receiving them. Working with them will be a hell later.

Other than these two injection attacks, what will your method do? Well, it can do some regexps for stuff like numbers, phone numbers, emails, names and dates. But that's about it. Unfortunately that's just a part of all the validations you will have to do. Other common cases that you cannot validate there are cross-checking of inputs (start date before end date), and checking that a value is in a list of allowed predefined values (say, for a <select> element). And there are an infinite number of custom validation steps that you will have in your application as well. Is it worth to split up all validation in "generic type validation" and "custom rule validation"? I don't know. Perhaps. Or perhaps this will just make a bigger mess.

Vilx-
Agreed. I tend to just prefix my form input names with the type of input expected ie s,i,b for string, integer or boolean. It's really an aide de memoir more than a hard-and-fast preventative measure.
Cirieno