views:

431

answers:

7

With PHP, I'd like to use a preg_replace() filter for passwords such that the only characters available for passwords are US ASCII typable, minus control codes and NULL.

What's the RegEx to achieve that which I can plugin to preg_replace()?

EDIT:

I've been advised to edit this question since I "get it" now and won't be doing this terribly unpopular technique and will permit any typable character even ones I might not have on my keyboard, just as long as they aren't control codes.

+4  A: 

Here you go:

^[ -~]+$

assuming you don't want empty passwords; otherwise it's:

^[ -~]*$

to allow empty ones.

I'm not sure why you're asking about preg_replace - I'd be wary of manipulating the passwords that people type. Better to enforce the rule that you only accept printable ASCII, and tell the user if they break that rule (or, as others have said, to not have any rules, but I assume you have reasons for them).

If you're thinking of quietly removing the characters that don't match, and someone comes along with a password of Úéåæ, then you'll be storing an empty password for them without their knowledge.

RichieHindle
+8  A: 

Personally, I've always found it highly disturbing when a web site or service tried to force me to use passwords that follow a certain (usually downright stupid) limitation.

Isn't it the whole point of passwords that they are not too easily guessable? Why would you want them to be less complex than your users want them to be? I can't imagine a technical limitation that would require the use of "ASCII only" for passwords.

Let your users use any password they like, hash them and store them as Base64 strings. These are ASCII only.

Tomalak
Well... US ASCII typable. The characters he's trying to filter are probably not valid though this does lead me to ask... why would you even need to filter characters users could not possibly type? The HTML form should be a natural buffer against control characters.
Jieren
A: 

/[\p{Cc}]/ to get control characters (I think this covers 0-31)

I agree with Richie. Use preg_match instead of preg_replace.

Jieren
A: 
function getTypable($s) {
  // gives you ASCII chars from 32 to 127
  $s = filter_var($s,FILTER_SANITIZE_STRING, FILTER_FLAG_STRIP_LOW | FILTER_FLAG_STRIP_HIGH);
  return $s;
}

...but it requires that the Filter API be installed with your version of PHP5. For about a year now, a lot of web hosting providers have been using a version of PHP5 that supports the Filter API.

Volomike
+1  A: 

Please don't filter your user passwords. That defeats a whole lot of the point. I wrote more about this here: http://www.evanfosmark.com/2009/06/why-do-so-many-websites-fail-with-password-restrictions/

Evan Fosmark
+2  A: 

As others have said, don't restrict the set of characters that are allowed in passwords. Just because your keyboard doesn't have ä, å, or ö on it is no reason to stop those of us who do have them (or know how to type them anyhow) from using those letters. You're going to be storing the password as a cryptographic hash anyhow (or at least as an encrypted string), aren't you? If so, then it doesn't matter whether your database can successfully/safely store the actual characters in the password anyhow, only the characters output by your crypto algorithm. (And if not, then storing passwords in plaintext is a far bigger problem than what characters the passwords may or may not contain - don't do that!)

Your apparent intent to enforce your character set restrictions by silently stripping the characters you dislike rather than by telling the user "Try again and, this time, only use these characters: a, e, i, o, u." makes your proposed method truly atrocious, as it means that if I attempt to use, say, the password fäîry (not incredibly secure, but should hold up against lightweight dictionary attacks), my actual password, unknown to me, will be fry (if your password is a three-letter word, straight out of the dictionary and in common use, you may as well not even bother). Ouch!

Dave Sherohman
Hey Dave. I wanted to express my thanks to you for making this clear to me, and "I get it". The problem is that this lousy post I made is like an anchor on my reputation now in this crazy system. I can't delete it and I can't improve my reputation. I've filed a complaint to the developers of stackoverflow.com. People genuinely make mistakes -- we shouldn't have to suffer through them forever. With stackoverflow, you end up suffering for eternity unless they fix this.
Volomike
Happy to help! And, if I understand the rep system here correctly, your rep will no longer be affected by the question now that it's marked as "community wiki".
Dave Sherohman
A: 

I disagree that there is no reason to reject non-ascii characters, although it's up to you to decide whether the pros outweigh the cons.

If you allow non-ascii characters, then you are in fact committing to properly internationalize that portion of your web application. For many applications, internationalization is an afterthought. For web applications, it's a very non-trivial matter.

If you don't explicitly control the character encoding when you go between characters and bytes, then you are basically relying on whatever the defaults happen to be for your deployment. If your configuration ever changes (e.g. migrating from Windows to Linux, or switching to another web server), then your defaults have a good chance of changing from under you, and then the non-ascii characters will serialize to a different byte sequence. So, all of a sudden, the hashes of people using them in their passwords will not match what's in the database, and they'll get locked out of their accounts.

I do, of course, agree that it's completely unacceptable to just filter out those characters; you have to either accept or reject the password.

ykaganovich