views:

69

answers:

6

I could have asked these 3 separately, but decided to merge them.

I would like to ask for some expert opinion with examples on:

  1. how to properly validate a alphanumeric string? (only latin letters & numbers)

  2. how to properly validate a written unicode string? (like the above but any country letters allowed)

  3. how to properly validate that a string looks like a email? I'm guessing best is filter_var($string,FILTER_VALIDATE_EMAIL) (I guess it's the same for url and ip)

Thank you.

A: 

You probably want to use regular expressions.

Sam Dufel
no you don't. Furthermore this answer isn't helpful at all -1
nikic
And exactly how helpful is your comment? Almost every other answer posted involved some form of regular expression matching.
Sam Dufel
A: 
  1. preg_match('/[a-zA-Z0-9]+/', $str)
  2. Something with this I'd think
Core Xii
A: 

filter_very neat and efficient for special purposes, but also limited.

you as well get only a filtered return string that you have to compare against the original string to see whether it fits.

there may be certain requirements and/or structures beside the allowed characters that you cannot check against in this way.

the most common way is to use pcre functions and especially preg_match. its very efficient as well and you can directly work with the return value.

and you have the whole possibilities of regular expressions. image for example you want to validate for every occouring name to be in the exacmt form "Mr/Mrs Firstname Lastname, akademic-title".

when it gets tricky is if you only want to allow certain ranges of unicode characters.

for example if you only want to allow U+0600–U+06FF (1536–1791) (arabic). plus a certain range of dingbats and brackets or something.

there are no pre defined character classes for that and defining them would be not so ellegant.

in this case the best way really would be looping over the text character by character and checking for ranges...

Joe Hopfgartner
+2  A: 

For #1, use ctype_alnum(). It's faster than regex, and you don't have to worry about if you got the regex right. I also think it's much neater.

Alan
A: 
Chris Cox
It's full of syntax errors, for example `$domainLen 255`
Gordon
My bad; the editor stripped out everything between "less than" and "greater than", for some reason. Corrected.
Chris Cox
+1  A: 

The best email validation I have seen so far is (note: it also checks the email domain):

/**
 * Validates an email address to RFC 3696 specification.
 * @source http://www.linuxjournal.com/article/9585
 * @param string $email_address Email address (raw input)
 * @return <type> Returns true if the email address has the email address
 *      format and the domain exists.
 */
public static function email($email_address) {
    if (empty($email_address)) return $email_address;

    $is_valid = true;
    $atIndex = strrpos($email_address, "@");
    if (is_bool($atIndex) && !$atIndex) {
        throw new VerificationException('The email address ('.$email_address.') does not contain an @ symbol');
        $is_valid = false;
    }
    else {
        $domain = substr($email_address, $atIndex+1);
        $local = substr($email_address, 0, $atIndex);
        $local_length = strlen($local);
        $domain_length = strlen($domain);
        if ($local_length < 1 || $local_length > 64) {
            // Local part length exceeded
            throw new VerificationException('The email address ('.$email_address.') local part exceeds maximum length');
        } else if ($domain_length < 1) {
            // Domain missing
            throw new VerificationException('The email address ('.$email_address.') is mising the domain part');
        } else if ($domain_length > 255) {
            // Domain part length exceeded
            throw new VerificationException('The email address ('.$email_address.') domain exceeds maximum length');
        } else if ($local[0] == '.' || $local[$local_length-1] == '.') {
            // Local part starts or ends with '.'
            throw new VerificationException('The email address ('.$email_address.') local part can not end with a dot (.)');
        } else if (preg_match('/\\.\\./', $local)) {
            // Local part has two consecutive dots
            throw new VerificationException('The email address ('.$email_address.') local part can not contain two consecutive dots (..)');
        } else if (!preg_match('/^[A-Za-z0-9\\-\\.]+$/', $domain)) {
            // Character not valid in domain part
            throw new VerificationException('The email address ('.$email_address.') domain contains invalid characters');
        } else if (preg_match('/\\.\\./', $domain)) {
            // Domain part has two consecutive dots
            throw new VerificationException('The email address ('.$email_address.') domain can not contain two consecutive dots (..)');
        } else if (!preg_match('/^(\\\\.|[A-Za-z0-9!#%&`_=\\/$\'*+?^{}|~.-])+$/', str_replace("\\\\","",$local))) {
            // Character not valid in local part unless
            // Local part is quoted
            if (!preg_match('/^"(\\\\"|[^"])+"$/',
            str_replace("\\\\","",$local))) {
                throw new VerificationException('The email address ('.$email_address.') contains invalid (non excaped) characters');
            }
        }
        if ($is_valid && !(checkdnsrr($domain, 'MX') || checkdnsrr($domain, 'A'))) {
            // Domain not found in DNS
            throw new VerificationException('The email address ('.$email_address.') domain could not be found with a DNS lookup');
        }
    }
    return $email_address;
}
Petah