I would like to have a regular expression that checks if a string contains only upper and lowercase letters, numbers, and underscores.
How about:
^([A-Za-z]|[0-9]|_)+$
...if you want to be explicit, or:
^\w+$
...if you prefer concise (Perl syntax).
Um...question: Does it need to have at least one character or no? Can it be an empty string?
^[A-Za-z0-9_]+$
Will do at least one upper or lower case alphanumeric or underscore. If it can be zero length, then just substitute the + for *
^[A-Za-z0-9_]*$
Edit:
If diacritics need to be included (such as cedilla - ç) then you would need to use the word character which does the same as the above, but includes the diacritic characters:
^\w+$
Or
^\w*$
The following regex matches alphanumeric characters and underscore:
^[a-zA-Z0-9_]+$
For example, in Perl:
#!/usr/bin/perl -w
my $arg1 = $ARGV[0];
# check that the string contains *only* one or more alphanumeric chars or underscores
if ($arg1 !~ /^[a-zA-Z0-9_]+$/) {
print "Failed.\n";
} else {
print "Success.\n";
}
To check the entire string and not allow empty strings, try
^[A-Za-z0-9_]+$
To match a string that contains only those characters (or an empty string), try
"^[a-zA-Z0-9_]*$"
This works for .NET regular expressions, and probably a lot of other languages as well.
Breaking it down:
^ : start of string
[ : beginning of character group
a-z : any lowercase letter
A-Z : any uppercase letter
0-9 : any digit
_ : underscore
] : end of character group
* : zero or more of the given characters
$ : end of string
If you don't want to allow empty strings, use + instead of *.
EDIT As others have pointed out, some regex languages have a shorthand form for [a-zA-Z0-9_]
. In the .NET regex language, you can turn on ECMAScript behavior and use \w
as a shorthand (yielding ^\w*$
or ^\w+$
). Note that in other languages, and by default in .NET, \w
is somewhat broader, and will match other sorts of unicode characters as well (thanks to Jan for pointing this out). So if you're really intending to match only those characters, using the explicit (longer) form is probably best.
Here is the regex for what you want with a quantifier to specify at least 1 character and no more than 255 characters
[^a-zA-Z0-9 _]{1,255}
You want to check that each character matches your requirements, which is why we use:
[A-Za-z0-9_]
And you can even use the shorthand version:
\w
Which is equivalent (in some regex flavors, so make sure you check before you use it). Then to indicate that the entire string must match, you use:
^
To indicate the string must start with that character, then use
$
To indicate the string must end with that character. Then use
\w+ or \w*
To indicate "1 or more", or "0 or more". Putting it all together, we have:
^\w*$
There's a lot of verbosity in here, and I'm deeply against it, so, my conclusive answer would be:
/^\w+$/
\w
is equivalent to [A-Za-z0-9_]
, which is pretty much what you want. (unless we introduce unicode to the mix)
Using the +
quantifier you'll match one or more characters. If you want to accept an empty string too, use *
instead.
matching diacritics in a regexp opens a whole can of worms, especially when taking Unicode into consideration. You might want to read about Posix locales in particular.
For me there was an issue in that I want to distinguish between alpha, nemeric and alpha numeric, so to ensure an alpha numeric string contains at least one alpa and at least one numeric, I used : ^([a-zA-Z_]{1,}\d{1,})+|(\d{1,}[a-zA-Z_]{1,})+$