views:

46

answers:

5

For example, if I'm doing some form input validation and I'm using the following code for the name field.

    preg_match("/^[a-zA-Z .-]$/", $firstname);

If someone types in Mr. (Awkward) Double-Barrelled I want to be able to display a message saying Invalid character(s): (, )

+1  A: 

You could search your input for ([^a-zA-Z .-]) to get all illegal characters.

Jens
+3  A: 

You can fetch all occurences of characters that are not within your character-class.
Negate the class [...] -> [^...] and then fetch all matches.

$firstname = 'Mr. (Awkward) Double-Barrelled';

if ( 0 < preg_match_all("/[^a-zA-Z .-]+/", $firstname, $cap) ) {
  foreach( $cap[0] as $e ) {
    echo 'invalid character(s): ', htmlspecialchars($e), "\n";
  }
}

using the PREG_OFFSET_CAPTURE flag described at http://docs.php.net/preg_match_all you can even tell the user where that character is in the input.

edit: Or you can use preg_replace_callback() to visually mark the invalid characters somehow. e.g (using an anonymous function/closure, php 5.3+)

$firstname = 'Mr. (Awkward) Double-Barrelled';
$invalid = array();
$result = preg_replace_callback("/[^a-zA-Z .-]+/", function($c) use(&$invalid) { $invalid[] = $c[0]; return '['.$c[0].']'; }, $firstname);
if ( $firstname!==$result ) {
  echo 'invalid characters: "', join(', ', $invalid), '" in your input: ', $result;
}

prints invalid characters: "(, )" in your input: Mr. [(]Awkward[)] Double-Barrelled

VolkerK
Is that really of any use here? How many people are going to sit there and count characters to find it? They'll just scan it for the character you said was invalid. It's not worth the extra coding.
animuson
You don't have to print "invalid character at position xyz" but you can _point_ to the character. see edit.
VolkerK
A: 

You could split along the allowed characters:

$result = preg_split('/[a-zA-Z .-]+/s', $subject);

...and get a list of all the characters that remain.

Tim Pietzcker
+1  A: 

preg_match("/[^a-zA-Z0-9\s\.\-]/", $text) should do the trick. You're really supposed to escape the ' ', '.', and '-' characters. I personally wouldn't bother wasting space to figure out which characters are invalid. If the person can't figure it out based on a statement saying 'Allowed Characters: (whatever)' then there is no hope for them.

Here's a list of regex characters which also includes a list of characters you're supposed to escape.

animuson
"You're really supposed to escape the ' ', '.', and '-' characters" - pcre is smart enough to know that dot or a _trailing_ hyphen has no special meaning within a character class. The meaning of the hyphen can change e.g. [a-z] but the dot is always a dot in a characters class. Regular expressions are hard enough to read as they are and imho putting in extra escaping characters doesn't help ;-)
VolkerK
It's just good practice. I always escape characters. What's the point in learning where you do and do not need to escape characters? If you just always escape them, you never run into problems.
animuson
Then be consequent and make it "yada \\. yadda". One backslash for php's escaping routine and one for pcre's (Just kidding, I guess it really is a subjective matter)
VolkerK
@animuson, no, I don't think it's good practice. IMO, it clutters the regex. As VolkerK said: it's a personal preference of yours (some people will agree with you, some don't).
Bart Kiers
+1  A: 

you can also simply replace valid characters with "nothing" - the rest, if any, will be invalid.

 $badchars = preg_replace(/[a-z0-9 .-]/, "", $input);
 if(strlen($badchars))
       error
stereofrog