tags:

views:

43

answers:

3

I'm not a RegEx expert so I'm using the following borrowed RegEx to validate email addresses:

^[\w\.=-]+@[\w\.-]+\.[\w]{2,3}$

A user has reported it's rejecting their email address of [email protected]. It's the "info" that's being rejected as "inf" works. So I did a bit of reading and learnt what the [\w]{2,3} syntax means and yes, that's why info is getting rejected as it's four characters. Changing it to [\w]{2,4} worked.

I like to understand my problems so dwelled upon this fragment. My question is why is the \w inside square brackets? Would not \w{2,4} also work?

Cheers, Rob.

A: 

You define character sets in square brackets. For example the first square bracketed part means "any alphanumeric character AND dot AND equal sign AND dash". Then you tell the parser with the following "+" that any character in this set should exists at least once.

In the last part since you only use \w and it means "all alphanumeric characters" by itself, you do not need any square brackets. The quantifiers(+, ?, *, {n,m}) affect the previous character OR character set(should be defined with square brackets as I've explained) only. And yes, "\w" is counted as a single character while parsing the RegExp.

BYK
+4  A: 

The outer character class in [\w] in not necessary as \w already denotes a character class. Only if you want to combine characters or predefined character classes you would need the character class notation […] like in [\w\s] (word characters and whitespace characters) or [\w-] (word characters and the hyphen). So [\w] is equal to \w.

Gumbo
very concise and to the point answer. +1
Roland Bouman
\w includes \d so [\w\d] example is a bit confusing.
BYK
@BYK: You’re right, thanks. Fixed it.
Gumbo
A: 

Whist both would work, it is generally better style / more readable to group as it makes clear exactly what is being repeated.

I would actually make this more explicit as to cut out some common, valid but unlikely cases (note that if you were to allow all technically valid email addresses, pretty much anything would go).

^[\w\.=-]+@([\w\d-]+\.){1,3}[a-zA-Z]{2,4}$
ternaryOperator