views:

96

answers:

4

What's an effective way to handle data validation, say, from a form submission?

Originally I had a bunch of if statements that checked each value and collected invalid values in an array for later retrieval (and listing).

// Store errors here
$errors = array();

// Hypothetical check if a string is alphanumeric
if (!preg_match('/^[a-z\d]+$/i', $fieldvalue))
{
    $errors[$fieldname] = 'Please only use letters and numbers for your street address';
}

// etc...

What I did next was create a class that handles various data validation scenarios and store the results in an internal array. After data validation was complete I would check to see if any errors occurred and handle accordingly:

class Validation
{
    private $errorList = array();

    public function isAlphaNumeric($string, $field, $msg = '')
    {
        if (!preg_match('/^[a-z\d]+$/i', $string))
        {
            $this->errorList[$field] = $msg;
        }
    }

    // more methods here

    public function creditCard($cardNumber, $field, $msg = '')
    {
        // Validate credit card number
    }

    // more methods here

    public function hasErrors()
    {
        return count($this->errorList);
    }
}

/* Client code */

$validate = new Validation();
$validate->isAlphaNumeric($fieldvalue1, $fieldname1, 'Please only use letters and numbers for your street address');
$validate->creditCard($fieldvalue2, $fieldname2, 'Please enter a valid credit card number');

if ($validate->hasErrors())
{
    // Handle as appropriate
}

Naturally it didn't take long before this class became bloated with the virtually unlimited types of data to be validated. What I'm doing now is using decorators to separate the different types of data into their own classes and call them only when needed leaving generic validations (i.e. isAlphaNumeric()) in the base class:

class Validation
{
    private $errorList = array();

    public function isAlphaNumeric($string, $field, $msg = '')
    {
        if (!preg_match('/^[a-z\d]+$/i', $string))
        {
            $this->errorList[$field] = $msg;
        }
    }

    // more generic methods here

    public function setError($field, $msg = '')
    {
        $this->errorList[$field] = $msg;
    }

    public function hasErrors()
    {
        return count($this->errorList);
    }
}

class ValidationCreditCard
{
    protected $validate;

    public function __construct(Validation $validate)
    {
        $this->validate = $validate;
    }

    public function creditCard($cardNumber, $field, $msg = '')
    {
        // Do validation
        // ...
        // if there is an error
        $this->validate->setError($field, $msg);
    }

    // more methods here
}

/* Client code */

$validate = new Validation();
$validate->isAlphaNumeric($fieldvalue, $fieldname, 'Please only use letters and numbers for your street address');

$validateCC = new ValidationCreditCard($validate);
$validateCC->creditCard($fieldvalue2, $fieldname2, 'Please enter a valid credit card number');

if ($validate->hasErrors())
{
    // Handle as appropriate
}

Am I on the right track? Or did I just complicate data validation more then I needed to?

A: 

This seems overly complicated to me.

Numeric data: just cast the $_POST value

$val=(int)$_POST["val"];

Emails: there are premade functions that do that (hope to find a correct one).

$email=check_email($_POST["email"]) or die("Ha!");

Names and addresses: do nothing, since it will come the day that a stranger enters a unicode character you didn't think about and be filtered out by your function.

Phone numbers: do nothing, if he wants to give a false number he will do that anyway.

Special codes, such as post code and stuff like that: you'll usually have a very strict standard, create a function that filter using that one and you're done.

Lo'oris
+1  A: 

If anything, you're not validating enough. For reading data in $_POST and $_GET you need at least:

  • To check whether it exists (array_key_exists)
  • To check whether it's an array or not
  • If expecting UTF-8, check whether it's valid UTF-8 (preg_match with the 'u' modifier is an option)
  • Then do validation specific to the type of field

By the way, the current hip way to do validation and sanitization in PHP is to use filters. In your specific case, here is an example:

<?php
$data = array(
    "arg1good" => "sdgdf790",
    "arg1bad"  => "sdgdf7/90",
    "arg1bad2" => array("sdgdf90", "sfdssf"),
    "arg2good" => "4567576456",
    "arg2bad"  => "45675764561",
);

$validateCredCard = function ($cc) {
    if (preg_match('/^\\d{10}$/', $cc))
        return $cc;
    else
        return false;
};

$arg1filt = array('filter'  => FILTER_VALIDATE_REGEXP,
                  'flags'   => FILTER_REQUIRE_SCALAR,
                  'options' => array('regexp' => '/^[a-z\d]+$/i'),
                  );
$arg2filt = array('filter'  => FILTER_CALLBACK,
                  'flags'   => FILTER_REQUIRE_SCALAR,
                  'options' => $validateCredCard,
                  );
$args = array(
    "arg1good" => $arg1filt,
    "arg1bad"  => $arg1filt,
    "arg1bad2" => $arg1filt,
    "arg2good" => $arg2filt,
    "arg2bad"  => $arg2filt,
);

var_dump(filter_var_array($data, $args));

gives:

array(5) {
  ["arg1good"]=>
  string(8) "sdgdf790"
  ["arg1bad"]=>
  bool(false)
  ["arg1bad2"]=>
  bool(false)
  ["arg2good"]=>
  string(10) "4567576456"
  ["arg2bad"]=>
  bool(false)
}
Artefacto
A: 

You don't seem to be very clear about what your objectives are - performance? simplicity of new code? overall maintainability?

Certainly, for performance reasons I'd suggest maintaining the validations as code rather than storing regexes (and thresholds, and....) as data. The problem seems to be how you map the data items you have to the appropriate validations. While you could set up a static map as an array, since you also need to know something about the data structure to render forms and map to data base columns, maybe you should consider implementing a more formal method of meta-data management within your code.

C.

symcbean
A: 

@Lo'oris Your answer regarding casting the values is not entirely complete. Please consider the following example:

$val_1 = (int)null;    // $val_1 equals 0
$val_2 = (int)false;   // $val_2 equals 0
$val_3 = (int)'';      // $val_3 equals 0
$val_4 = (int)array(); // $val_4 equals 0

As this example is showing, this strategy works only if you expect the variable to be an integer which is also greater than 0.

In terms of "check_email" functions - you're correct that there are many implementations that can be found on the internet, but most of them are either incomplete or incorrect.

Most of the implementations use regex expressions like this one:

"^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,3})$"

or this one:

"^[a-zA-Z0-9_.-]+@[a-zA-Z0-9-]+.[a-zA-Z0-9-.]+$"

and both those regexes reject email addresses like these:

Abc\@[email protected]
customer/[email protected]
!def!xyz%[email protected] 

which are all valid (according to http://www.linuxjournal.com/article/9585?page=0,0).

Please also have a look at: http://www.regular-expressions.info/email.html

Jan Molak
I don't understand your point about casting: yes, those incorrect values will be cast to 0... fine for me! If "you" need something stricter, such as >0, well, just say that, and just code that, it's not something that exotic that needs to be explained...
Lo'oris