tags:

views:

660

answers:

10

This seems to match the rules I have defined, but I only starting learning regex tonight, so I am wondering if it is correct.

Rules:

  • Usernames can consist of lowercase and capitals
  • Usernames can consist of alphanumeric characters
  • Usernames can consist of underscore and hyphens and spaces
  • Cannot be two underscores, two hypens or two spaces in a row
  • Cannot have a underscore, hypen or space at the start or end

Regex pattern:

/^[a-zA-Z0-9]+([a-zA-Z0-9](_|-| )[a-zA-Z0-9])*[a-zA-Z0-9]+$/

Thankyou.

+2  A: 

If you download a tool such as Expresso 3.0, you can have the tool 'explain' a RegEx's pattern.

Mitch Wheat
+3  A: 

I suggest writing some unit tests to put the Regex through it's paces. This will also help a few months from now when you find a problem with the Regex and need to update it.

apathetic
+1. It's definitely a good idea to cover regex's with several unit tests
Mitch Wheat
+4  A: 
 ([a-zA-Z0-9](_|-| )[a-zA-Z0-9])*

is a 0 or more repetiton of alphanum, dashspace, alphanum.

So it would match

a_aa_aa_a

but not

aaaaa

The complete regexp can't match

a_aaaaaaaaa_a for example.

Let's look back at what you want:

* Usernames can consist of lowercase and capitals or alphanumerica characters
* Usernames can consist of alphanumeric characters
* Usernames can consist of underscore and hyphens and spaces
* Cannot be two underscores, two hypens or two spaces in a row
* Cannot have a underscore, hypen or space at the start or end

The beginning is simple ... just match an alphanum, then (ingoring the two in the row rule) an (alphanum or dashspace)* and at the and an alphanum again.

To prevent the two dashspaces in a row you probably need to understand lookahead/lookbehind.

Oh, and regarding the other answer: Please download Espresso, it REALLY helps you undestand those things.

froh42
give a man a fish, or teach a man to fish...? It's a dilemma
Mitch Wheat
Easy: give a man a fish *whilst* you teach him to fish. :)
Peter Boughton
+2  A: 
  1. Alphanumerical isn't just [a-zA-Z0-9], it's accented, Cyrillic, Greek and other letters, which can be used in username.

  2. (_|-| ) can be replaced by [-_ ] character class

ymv
`[_- ]` is "everything between underscore and space". You want to have the hyphen first to have it interpreted properly: `[-_ ]`
Welbog
+1  A: 

Another recommendation for Expresso 3.0 here - very easy to use and build up strings with.

Daniel May
+1 from me!....
Mitch Wheat
A: 

Your regex doesn't work. The hard part is the check for consecutive spaces/hyphens. You could use this one, which uses look-behind:

/^[a-zA-Z0-9][a-zA-Z0-9_\s\-]*[a-zA-Z0-9](?<![_\s\-]{2,}.*)$/
Philippe Leybaert
This won't work in the majority of regexp engines because the look-behind string is not fixed length.
mikej
I'm spoiled by the .NET regex engine :)
Philippe Leybaert
Yeah, I meant to mention in my comment that the .NET engine is one of a handful where it *will* work :-)
mikej
Some engines are ok with a limited-length, rather than fixed-length. e.g. just use `.{0,999}` instead of `.*`
Peter Boughton
(For a username, you can probably get away with {0,99} or even lower.)
Peter Boughton
The `{0,n}` trick would work in Java, but most regex flavors require a fixed-length expression, if they support lookbehinds at all.
Alan Moore
A: 

By the looks of it, that rule wouldn't match something like "a_bc", "ab_c", "a_b" or "a_b_c".

Try: /^[a-zA-Z0-9]+([_\s\-]?[a-zA-Z0-9])*$/ which matches the above cases but not any combination of spaces, dashes or underscores next to each other. Eg: "_-" or " _" are not allowed.

PhantomCode
A: 

Using the POSIX character class for alphanumeric characters to make it work for accented and other foreign alphabetic characters:

/^[[:alnum:]]+([-_ ]?[[:alnum:]])*$/

More efficient (prevents captures):

/^[[:alnum:]]+(?:[-_ ]?[[:alnum:]]+)*$/

These also prevent sequences of more than one space/hyphen/underscore in combination. It doesn't follow from your specification whether that is desirable, but your own regex seems to indicate this is what you want.

Lars Haugseth
+10  A: 

It's always fascinating to see the responses this kind of question elicits.

The specs in the question aren't very clear, so I'll just assume the string can contain only ASCII letters and digits, with hyphens, underscores and spaces as internal separators. The meat of the problem is insuring that the first and last character are not separators, and that there's never more than one separator in a row (that part seems clear, anyway). Here's the simplest way:

/^[A-Za-z0-9]+(?:[ _-][A-Za-z0-9]+)*$/

After matching one or more alphanumeric characters, if there's a separator it must be followed by one or more alphanumerics; repeat as needed.

Lars's regex:

/^[[:alnum:]]+(?:[-_ ]?[[:alnum:]]+)*$/

...is effectively the same (assuming your regex flavor supports the POSIX character-class notation), but why make the separator optional? The only reason you'd be in that part of the regex in the first place is if there's a separator or some other, invalid character.

On the other hand, PhantomCode's regex:

/^[a-zA-Z0-9]+([_\s\-]?[a-zA-Z0-9])*$/

...only works because the separator is optional. After the first separator, it can only match one alphanumeric at a time. To match more, it has to keep repeating the whole group: zero separators followed by one alphanumeric, over and over. If the second [a-zA-Z0-9] were followed by a plus sign, it could find a match by a much more direct route.

Then there's Philippe's regex:

/^[a-zA-Z0-9][a-zA-Z0-9_\s\-]*[a-zA-Z0-9](?<![_\s\-]{2,}.*)$/

It can be made to work in flavors other than .NET by changing the lookbehind to a lookahead:

/^(?!.*[_\s-]{2,})[a-zA-Z0-9][a-zA-Z0-9_\s\-]*[a-zA-Z0-9]$/

...but it's still way more complicated than it needs to be.

Your own regex:

/^[a-zA-Z0-9]+([a-zA-Z0-9](_|-| )[a-zA-Z0-9])*[a-zA-Z0-9]+$/

...requires the string to start and end with two alphanumeric characters--I don't think that's what you want. Also, as @froh42 pointed out, if there are two separators within the string, there have to be exactly two alphanumerics between them--again, probably not what you wanted. And, as @ymv pointed out (with an assist from @Welbog), the (_|-| ) in your regex should be [-_ ]. That part's not incorrect, but if you have a choice between an alternation and a character class, you should always go with the character class: they're much, much more efficient.

Again, I'm not worried about whether "alphanumeric" is supposed to include non-ASCII characters, or the exact meaning of "space", just how to enforce a policy of non-contiguous internal separators with a regex.

Alan Moore
Nice, elegant solution and summary of the other suggestions.
mikej
A: 

Hi i like this and i needed just lowercase an hyphens:

/^[a-z0-9]+([-]?[a-z0-9])*$/

The question is: how can i require in "one go" that min-max length by using {2,50} where do i put this ? I could not figure out if it is possible ?

Thank You

Feha
Use a lookahead: `/^(?=.{2,50}$)[a-z0-9]+(?:-[a-z0-9])*$/` Note: since the hyphen is your only separator character, it doesn't need to be in a character class--and it never *did* need to be optional. I also changed your capturing group - `(...)` - to non-capturing - `(?:...)`. Rule of thumb: never use capturing groups if you don't have to.
Alan Moore
Hi Alan Thank You
Feha