ansaurus

Question

Regular Expression for alphanumeric and underscores

Answer 1

A:

How about:

^([A-Za-z]|[0-9]|_)+$

...if you want to be explicit, or:

^\w+$

...if you prefer concise (Perl syntax).

Drew Hall 2008-12-03 04:31:17

This doesn't make sure that it only contains those chars, but extracts a string that contains only those chars. Almost there, just add a ^ to the beginning and a $ to the end and it'll work.

BenAlabaster 2008-12-03 04:34:42

Drew Hall 2008-12-03 04:40:14

You missed à.

Windows programmer 2008-12-03 04:45:13

Answer 2

+2 A:

Um...question: Does it need to have at least one character or no? Can it be an empty string?

^[A-Za-z0-9_]+$

Will do at least one upper or lower case alphanumeric or underscore. If it can be zero length, then just substitute the + for *

^[A-Za-z0-9_]*$

Edit:

If diacritics need to be included (such as cedilla - ç) then you would need to use the word character which does the same as the above, but includes the diacritic characters:

^\w+$

Or

^\w*$

BenAlabaster 2008-12-03 04:31:41

You missed ç.

Windows programmer 2008-12-03 04:45:43

Well now that you mention it, I also missed a whole bunch of other French characters...

BenAlabaster 2008-12-03 05:54:21

\w is the same as [\w] with less typing effort

Jan Goyvaerts 2008-12-03 07:49:29

Yeah, you still need the + or * and the ^ and $ - \w just checks that it contains word characters, not that it *only* contains word characters...

BenAlabaster 2008-12-03 14:30:48

Answer 3

A:

The following regex matches alphanumeric characters and underscore:

^[a-zA-Z0-9_]+$

For example, in Perl:

#!/usr/bin/perl -w

my $arg1 = $ARGV[0];

# check that the string contains *only* one or more alphanumeric chars or underscores
if ($arg1 !~ /^[a-zA-Z0-9_]+$/) {
  print "Failed.\n";
} else {
    print "Success.\n";
}

Jay 2008-12-03 04:31:51

The pattern in your code is correct, but the pattern above only checks a single instance.

BenAlabaster 2008-12-03 04:35:41

You missed ñ.

Windows programmer 2008-12-03 04:46:21

That was intentional, code sample was intended as a clarifying usage in actually checking a string. Also why code has the beginning and end of line markers as well which are not in the regex example.

Jay 2008-12-03 04:46:23

@Windows programmer - not sure if you're just trying to be humorous or clever, but alphanumeric specifically refers to the latin alphabet and arabic numerals, so wouldn't include ñ or any of the other special chars you've referenced in the comments here.

Jay 2008-12-03 05:04:20

When did ñ stop being Latin?

Windows programmer 2008-12-03 06:41:37

@Jay: I think your answer would be a lot clearer if the regex above the source code snippet was the proper regex, rather than a partial regex. People who don't know Perl will look at your regex, but not at the Perl snippet.

Jan Goyvaerts 2008-12-03 07:48:43

@Windows programmer - http://en.wikipedia.org/wiki/Alphanumeric - latin *alphabet*, not "latin character set" which is what includes diacritics etc. Purely a semantics issue, but I personally go with the common usage of the term alphanumeric as A-Z and 0-9.

Jay 2008-12-05 04:55:35

@Jan - added the full regex anyway, though there's already an accepted answer so it probably doesn't matter. Helps if people specify the language they're working in in the first place so we don't have to guess ;)

Jay 2008-12-05 04:56:21

ñ is a letter of the alphabet in Spanish, including in Latin America.

Windows programmer 2008-12-05 05:57:34

"I would like to have a regular expression that checks if a string contains only upper and lowercase letters, numbers, and underscores" doesn't limit it to Latin letters. "The following regex matches alphanumeric characters and underscore" doesn't limit it to Latin letters. "^[a-zA-Z0-9_]+$" fails.

Windows programmer 2008-12-05 06:02:04

Answer 4

A:

To check the entire string and not allow empty strings, try

^[A-Za-z0-9_]+$

David Norman 2008-12-03 04:33:10

You missed ö.

Windows programmer 2008-12-03 04:46:57

Answer 5

+13 A:

To match a string that contains only those characters (or an empty string), try

"^[a-zA-Z0-9_]*$"

This works for .NET regular expressions, and probably a lot of other languages as well.

Breaking it down:

^ : start of string
[ : beginning of character group
a-z : any lowercase letter
A-Z : any uppercase letter
0-9 : any digit
_ : underscore
] : end of character group
* : zero or more of the given characters
$ : end of string

If you don't want to allow empty strings, use + instead of *.

EDIT As others have pointed out, some regex languages have a shorthand form for [a-zA-Z0-9_]. In the .NET regex language, you can turn on ECMAScript behavior and use \w as a shorthand (yielding ^\w*$ or ^\w+$). Note that in other languages, and by default in .NET, \w is somewhat broader, and will match other sorts of unicode characters as well (thanks to Jan for pointing this out). So if you're really intending to match only those characters, using the explicit (longer) form is probably best.

Charlie 2008-12-03 04:33:50

You missed ß.

Windows programmer 2008-12-03 04:47:28

Sorry, I'm not sure what you're saying...

Charlie 2008-12-03 05:12:57

If you ever go to Germany or if you ever see just about any German text you'll see what I'm saying.

Windows programmer 2008-12-03 06:42:35

\w and [A-Za-z0-9_] are not equivalent in most regex flavors. \w includes letters with diacritics, letters from other scripts, etc.

Jan Goyvaerts 2008-12-03 07:45:35

@Jan: Thanks, I updated the comments on \w, hopefully it's more accurate now.

Charlie 2008-12-03 16:19:52

Answer 6

A:

Here is the regex for what you want with a quantifier to specify at least 1 character and no more than 255 characters

[^a-zA-Z0-9 _]{1,255}

mson 2008-12-03 04:44:06

Answer 7

+2 A:

You want to check that each character matches your requirements, which is why we use:

[A-Za-z0-9_]

And you can even use the shorthand version:

\w

Which is equivalent (in some regex flavors, so make sure you check before you use it). Then to indicate that the entire string must match, you use:

To indicate the string must start with that character, then use

To indicate the string must end with that character. Then use

\w+ or \w*

To indicate "1 or more", or "0 or more". Putting it all together, we have:

^\w*$

Anton 2008-12-03 05:08:09

You missed д. Seeing that your name is Anton, somehow I think you already knew you missed н.

Windows programmer 2008-12-03 06:44:27

\w and [A-Za-z0-9_] are not equivalent in most regex flavors. \w includes letters with diacritics, letters from other scripts, etc.

Jan Goyvaerts 2008-12-03 07:45:01

Answer 8

+1 A:

There's a lot of verbosity in here, and I'm deeply against it, so, my conclusive answer would be:

/^\w+$/

\w is equivalent to [A-Za-z0-9_], which is pretty much what you want. (unless we introduce unicode to the mix)

Using the + quantifier you'll match one or more characters. If you want to accept an empty string too, use * instead.

kch 2008-12-05 05:25:04

Answer 9

A:

matching diacritics in a regexp opens a whole can of worms, especially when taking Unicode into consideration. You might want to read about Posix locales in particular.

Jean-Denis Muys 2009-07-10 08:56:41

Answer 10

A:

For me there was an issue in that I want to distinguish between alpha, nemeric and alpha numeric, so to ensure an alpha numeric string contains at least one alpa and at least one numeric, I used : ^([a-zA-Z_]{1,}\d{1,})+|(\d{1,}[a-zA-Z_]{1,})+$

mylesmckeown 2010-06-24 09:25:57

ansaurus

tags:

views:

answers:

Regular Expression for alphanumeric and underscores

related questions