tags:

views:

22102

answers:

10

I would like to have a regular expression that checks if a string contains only upper and lowercase letters, numbers, and underscores.

A: 

How about:

^([A-Za-z]|[0-9]|_)+$

...if you want to be explicit, or:

^\w+$

...if you prefer concise (Perl syntax).

Drew Hall
This doesn't make sure that it only contains those chars, but extracts a string that contains only those chars. Almost there, just add a ^ to the beginning and a $ to the end and it'll work.
BenAlabaster
Drew Hall
You missed à.
Windows programmer
+2  A: 

Um...question: Does it need to have at least one character or no? Can it be an empty string?

^[A-Za-z0-9_]+$

Will do at least one upper or lower case alphanumeric or underscore. If it can be zero length, then just substitute the + for *

^[A-Za-z0-9_]*$

Edit:

If diacritics need to be included (such as cedilla - ç) then you would need to use the word character which does the same as the above, but includes the diacritic characters:

^\w+$

Or

^\w*$
BenAlabaster
You missed ç.
Windows programmer
Well now that you mention it, I also missed a whole bunch of other French characters...
BenAlabaster
\w is the same as [\w] with less typing effort
Jan Goyvaerts
Yeah, you still need the + or * and the ^ and $ - \w just checks that it contains word characters, not that it *only* contains word characters...
BenAlabaster
A: 

The following regex matches alphanumeric characters and underscore:

^[a-zA-Z0-9_]+$

For example, in Perl:

#!/usr/bin/perl -w

my $arg1 = $ARGV[0];

# check that the string contains *only* one or more alphanumeric chars or underscores
if ($arg1 !~ /^[a-zA-Z0-9_]+$/) {
  print "Failed.\n";
} else {
    print "Success.\n";
}
Jay
The pattern in your code is correct, but the pattern above only checks a single instance.
BenAlabaster
You missed ñ.
Windows programmer
That was intentional, code sample was intended as a clarifying usage in actually checking a string. Also why code has the beginning and end of line markers as well which are not in the regex example.
Jay
@Windows programmer - not sure if you're just trying to be humorous or clever, but alphanumeric specifically refers to the latin alphabet and arabic numerals, so wouldn't include ñ or any of the other special chars you've referenced in the comments here.
Jay
When did ñ stop being Latin?
Windows programmer
@Jay: I think your answer would be a lot clearer if the regex above the source code snippet was the proper regex, rather than a partial regex. People who don't know Perl will look at your regex, but not at the Perl snippet.
Jan Goyvaerts
@Windows programmer - http://en.wikipedia.org/wiki/Alphanumeric - latin *alphabet*, not "latin character set" which is what includes diacritics etc. Purely a semantics issue, but I personally go with the common usage of the term alphanumeric as A-Z and 0-9.
Jay
@Jan - added the full regex anyway, though there's already an accepted answer so it probably doesn't matter. Helps if people specify the language they're working in in the first place so we don't have to guess ;)
Jay
ñ is a letter of the alphabet in Spanish, including in Latin America.
Windows programmer
"I would like to have a regular expression that checks if a string contains only upper and lowercase letters, numbers, and underscores" doesn't limit it to Latin letters. "The following regex matches alphanumeric characters and underscore" doesn't limit it to Latin letters. "^[a-zA-Z0-9_]+$" fails.
Windows programmer
A: 

To check the entire string and not allow empty strings, try

^[A-Za-z0-9_]+$
David Norman
You missed ö.
Windows programmer
+13  A: 

To match a string that contains only those characters (or an empty string), try

"^[a-zA-Z0-9_]*$"

This works for .NET regular expressions, and probably a lot of other languages as well.

Breaking it down:

^ : start of string
[ : beginning of character group
a-z : any lowercase letter
A-Z : any uppercase letter
0-9 : any digit
_ : underscore
] : end of character group
* : zero or more of the given characters
$ : end of string

If you don't want to allow empty strings, use + instead of *.

EDIT As others have pointed out, some regex languages have a shorthand form for [a-zA-Z0-9_]. In the .NET regex language, you can turn on ECMAScript behavior and use \w as a shorthand (yielding ^\w*$ or ^\w+$). Note that in other languages, and by default in .NET, \w is somewhat broader, and will match other sorts of unicode characters as well (thanks to Jan for pointing this out). So if you're really intending to match only those characters, using the explicit (longer) form is probably best.

Charlie
You missed ß.
Windows programmer
Sorry, I'm not sure what you're saying...
Charlie
If you ever go to Germany or if you ever see just about any German text you'll see what I'm saying.
Windows programmer
\w and [A-Za-z0-9_] are not equivalent in most regex flavors. \w includes letters with diacritics, letters from other scripts, etc.
Jan Goyvaerts
@Jan: Thanks, I updated the comments on \w, hopefully it's more accurate now.
Charlie
A: 

Here is the regex for what you want with a quantifier to specify at least 1 character and no more than 255 characters

[^a-zA-Z0-9 _]{1,255}

mson
+2  A: 

You want to check that each character matches your requirements, which is why we use:

[A-Za-z0-9_]

And you can even use the shorthand version:

\w

Which is equivalent (in some regex flavors, so make sure you check before you use it). Then to indicate that the entire string must match, you use:

^

To indicate the string must start with that character, then use

$

To indicate the string must end with that character. Then use

\w+ or \w*

To indicate "1 or more", or "0 or more". Putting it all together, we have:

^\w*$
Anton
You missed д. Seeing that your name is Anton, somehow I think you already knew you missed н.
Windows programmer
\w and [A-Za-z0-9_] are not equivalent in most regex flavors. \w includes letters with diacritics, letters from other scripts, etc.
Jan Goyvaerts
+1  A: 

There's a lot of verbosity in here, and I'm deeply against it, so, my conclusive answer would be:

/^\w+$/

\w is equivalent to [A-Za-z0-9_], which is pretty much what you want. (unless we introduce unicode to the mix)

Using the + quantifier you'll match one or more characters. If you want to accept an empty string too, use * instead.

kch
A: 

matching diacritics in a regexp opens a whole can of worms, especially when taking Unicode into consideration. You might want to read about Posix locales in particular.

Jean-Denis Muys
A: 

For me there was an issue in that I want to distinguish between alpha, nemeric and alpha numeric, so to ensure an alpha numeric string contains at least one alpa and at least one numeric, I used : ^([a-zA-Z_]{1,}\d{1,})+|(\d{1,}[a-zA-Z_]{1,})+$

mylesmckeown