views:

11761

answers:

6

I'm currently using this regex ^[A-Z0-9 _]*$ to accept letters, numbers, spaces and underscores. I need to modify it to require at least one number or letter somewhere in the string. Any help would be appreciated!

This would be for validating usernames for my website. I'd actually like to support as many characters as I can, but just want to ensure that I prevent code injection and that characters will display fine for all users. So I'm definately open to regex validation suggestions that would support a wider set of characters.

+5  A: 
^[ _]*[A-Z0-9][A-Z0-9 _]*$

You can optionally have some spaces or underscores up front, then you need one letter or number, and then an arbitrary number of numbers, letters, spaces or underscores after that.

Something that contains only spaces and underscores will fail the [A-Z0-9] portion.

Daniel LeCheminant
Doesn't work on strings like 'A', '9', 'AA', 'AA', '99','_999_888', 'AAA_SS S'
Renaud Bompuis
I don't believe you have tried this, Renaud, otherwise you would know it was false.
paxdiablo
I have, and you just need to look at the regex (the current anyway) to see that it would match a simple string like 'A'
Renaud Bompuis
That matches 'A' fine, Renaud. * matches 0 or more, meaning that the 'A' would go to the [A-Z0-9] part, and the other two would just be 0.
Chris Lutz
And why exactly wouldn't Daniel's (or mine for that matter) match it? Do you not know what the "*" means in REs?
paxdiablo
Sorry, I misread the question that it should require both a letter and a digit and I was wrong.
Renaud Bompuis
+7  A: 

You simply need to specify your current RE, followed by a letter/number followed by your current RE again:

^[A-Z0-9 _]*[A-Z0-9][A-Z0-9 _]*$

Since you've now stated they're Javascript REs, there's a useful site here where you can test the RE against input data.

If you want lowercase letters as well:

^[A-Za-z0-9 _]*[A-Za-z0-9][A-Za-z0-9 _]*$
paxdiablo
On a sting like " ___ ___ ___", one that contains none of the required numbers or letters, The regex you tried will try many combinations that cannot work. I think it will try about n combinations. Daniel L's answer works better.
TokenMacGuy
That would be a pretty unsophisticated RE engine, @token. Most of the ones I've seen have optimizations to look for specific values first, such as ^, $ and [A-Z0-9]. Backtracking searches only become necessary after all these other conditions are satisfied. Not satisfied means no match.
paxdiablo
Doesn't work on strings like 'AA', '99', '_999_888', ...
Renaud Bompuis
Ok, working fine in PHP, but in JS everything is failing the regex. Is there a mistake in my JS here?if (!name.match(/^[A-Z0-9 _]*[A-Z0-9][A-Z0-9 _]*$/)){ //do something }
makeee
In addition, it's easy to come up with a test string that gives an RE worst case performance. If speed is the issue, you would not be using REs for this at all - you'd use a simple character scanner.
paxdiablo
See http://www.regular-expressions.info/javascriptexample.html for a tester you can use for JS.
paxdiablo
Sorry, I misread the question that it should require both a letter and a digit and I was wrong.
Renaud Bompuis
[A-Z0-9]*[A-Z0-9][A-Z0-9]* doesn't seem to work in JS (tried it in the regex tester linked above). "makeee" passed [A-Z0-9] just fine, but not the full regex pattern. Any ideas?
makeee
1) There are no spaces or underscores (or ^/$ either) in that RE you just posted. What is the EXACT pattern and search string you're using?
paxdiablo
Regexp of ^[A-Z0-9 _]*[A-Z0-9][A-Z0-9 _]*$ and Subject string of AA works fine. makeee as a subject string fails because it's lowercase. If you want lowercase, see update.
paxdiablo
Sorry, "makeee" fails both ^[A-Z0-9 _]*[A-Z0-9][A-Z0-9 _]*$ and ^[A-Z0-9]*[A-Z0-9][A-Z0-9]*$ (also fails if (^/$ are removed).
makeee
ah ok, thanks Pax
makeee
Also, if you're just wanting to find out if there's a match (rather than getting the matches), I'd use [if (name.search(/^[A-Z0-9 _]*[A-Z0-9][A-Z0-9 _]*$/) >= 0){ //found one }].
paxdiablo
+3  A: 

You can use a lookaround:

^(?=.*[A-Za-z0-9])[A-Za-z0-9 _]*$

It will check ahead that the string has a letter or number, if it does it will check that the rest of the chars meet your requirements. This can probably be improved upon, but it seems to work with my tests.

UPDATE:

Adding modifications suggested by Chris Lutz:

^(?=.*[^\W_])[\w ]*$/
gpojd
You shouldn't use \s. He said he wants spaces, but didn't mention tabs. But while we're at it, why not use [^\W_] instead of [A-Za-z0-9]?
Chris Lutz
Thanks. I replaced the part of the regex with something more explicit. Thanks for catching that.
gpojd
Doesn't work on strings like 'A', '9', 'AA', 'AA', '99','_999_888', 'AAA_SS S'
Renaud Bompuis
Renaud, it does in perl. What are you checking this with? perl -le '("A" =~ /(?=^.*[A-Za-z0-9])[A-Za-z0-9 _]*$/) ? print "match" : print "no match"'
gpojd
Sorry, I misread the question that it should require both a letter and a digit and I was wrong.
Renaud Bompuis
I would put the ^ anchor before the lookahead. It doesn't change the meaning, but it communicates your intention more clearly.
Alan Moore
I agree Alan. The anchor ended up there when I was toying with it and I never put it back. I moved the anchor to the front.
gpojd
+4  A: 

To go ahead and get a point out there, instead of repeatedly using these:

[A-Za-z0-9 _]
[A-Za-z0-9]

I have two (hopefully better) replacements for those two:

[\w ]
[^\W_]

The first one matches any word character (alphanumeric and _, as well as Unicode) and the space. The second matches anything that isn't a non-word character or an underscore (alphanumeric only, as well as Unicode).

If you don't want Unicode matching, then stick with the other answers. But these just look easier on the eyes (in my opinion). Taking the "preferred" answer as of this writing and using the shorter regexes gives us:

^[\w ]*[^\W_][\w ]*$

Perhaps more readable, perhaps less. Certainly shorter. Your choice.

EDIT:

Just as a note, I am assuming Perl-style regexes here. Your regex engine may or may not support things like \w and \W.

EDIT 2:

Tested mine with the JS regex tester that someone linked to and some basic examples worked fine. Didn't do anything extensive, just wanted to make sure that \w and \W worked fine in JS.

EDIT 3:

Having tried to test some Unicode with the JS regex tester site, I've discovered the problem: that page uses ISO instead of Unicode. No wonder my Japanese input didn't match. Oh well, that shouldn't be difficult to fix:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

Or so. I don't know what should be done as far as JavaScript, but I'm sure it's not hard.

Chris Lutz
+1 for the good suggestions. I updated my answer to show it your way.
gpojd
Thanks! What range of characters does unicode cover? I would love to be able to support characters such as "à".
makeee
Unicode covers basically everything, although you may have to do some more work to get webpages and programs to work with Unicode.
Chris Lutz
Whether \w matches Unicode (by which I assume you mean non-ASCII) characters varies from one regex flavor to the next. If you want to match characters from the full Unicode range, you should do so explicitly.
Alan Moore
@Alan - Okay. I think in terms of Perl, which is almost a de-facto standard against which other regex engines are measured, and I tend to expect PCRE-specific regex behaviors to work the way they do in Perl.
Chris Lutz
Unfortunately, regex flavors are all over the place on this one. Check out this table; it's almost exactly half and half (with both PHP and JS in the wrong half). http://www.regular-expressions.info/charclass.html
Alan Moore
Wow, Perl doesn't use PCRE. That's weird. Ah, well, PCRE will have to implement Unicode soon. It's almost impossible not to at this point.
Chris Lutz
PCRE is a C library that, like many other flavors, used Perl's regex flavor as its model--in other words, Perl came first. PCRE supports Unicode if that option was selected when it was compiled, but \w, \d and such still only match ASCII characters.
Alan Moore
A: 

Someone intent on code injection would turn off javascript in their browser before injecting

daniel
A: 

for me @"^[\w ]+$" is working, allow number, alphabet and space, but need to type at least one letter or number.

SOFextreme