tags:

views:

52

answers:

2

Similar to this topic.

I am trying to validate a username with the following restrictions:

  • Must start with a letter or number
  • Must be 3 to 15 characters in length
  • Symbols include: . - _ ( ) [ ]
  • Symbols cannot appear consecutively, but letters and numbers can

Edit:

  • Letters and numbers are a-z A-Z 0-9

Been stumped for a while. I'm new to regex.

+7  A: 

It is not so clean to express a set of unrelated rules in a single regular expression, but it can be done by using lookaround assertions (Rubular):

@"^(?=[A-Za-z0-9])(?!.*[._()\[\]-]{2})[A-Za-z0-9._()\[\]-]{3,15}$"

Explanation:

(?=[A-Za-z0-9])            Must start with a letter or number
(?!.*[._()\[\]-]{2})       Cannot contain two consecutive symbols
[A-Za-z0-9._()\[\]-]{3,15} Must consist of between 3 to 15 allowed characters

You might want to consider if this would be easier to read and more maintable as a list of simpler regular expressions, all of which must validate successfully, or else write it in ordinary C# code.

Mark Byers
Using a regex for a length check is inefficient, unclear, and subject to breakage. (I think Mark's code is broken, as it looks like it requires a letter or number followed by 3-15 MORE characters, so it actually matches 4-16 characters.) Using a separate test for each requirement will be far easier to understand and maintain.
Ben Voigt
Thank you for the speedy reply. This did work too, but the period is missing from the (?!.*[_()\[\]-\{2}) part.should be (?!.*[._()\[\]-]{2}) otherwise a period can follow any symbol
Marlon
@Ben: unclear, yes, but Mark's code uses zero-width assertion (it consumes no characters), so the length constraint is preserved.
Chris Schmich
@Marlon: You are correct, that was a copy/paste error. I've updated my post.
Mark Byers
I think this is a clear and succinct way to describe the rules. The only ugliness is that all the rules are mashed together in one string, rather than being split up into 3 variables. To mitigate this, you could use string concatenation and split up the different rules into multiple lines: `"^" <newline> + "(?=[A-Z...])" <newline> + "(?!...)" <newline> + "[A-Z...]{3,15}" <newline> + "$"`
Merlyn Morgan-Graham
+5  A: 

As an optimization to Mark's answer:

^(?=.{3,15}$)([A-Za-z0-9][._()\[\]-]?)*$

Explanation:

(?=.{3,15}$)                   Must be 3-15 characters in the string
([A-Za-z0-9][._()\[\]-]?)*   The string is a sequence of alphanumerics,
                               each of which may be followed by a symbol

This one permits Unicode alphanumerics:

^(?=.{3,15}$)((\p{L}|\p{N})[._()\[\]-]?)*$

This one is the Unicode variant, plus uses non-capturing groups:

^(?=.{3,15}$)(?:(?:\p{L}|\p{N})[._()\[\]-]?)*$
Ben Voigt
+1 That's quite a useful variation because it avoids repeating the definitions of "letter or number" and "symbol". You don't need the innermost parentheses.
Mark Byers
You're right, that was a parenthesized atom. Redundant parens removed.
Ben Voigt
It should also be faster, since there's only one lookahead, and it's much simpler.
Ben Voigt