views:

65

answers:

3

Hi, I'm looking for some regex code that I can use to check for a valid username.

I would like for the username to have letters (both upper case and lower case), numbers, spaces, underscores, dashes and dots, but the username must start and end with either a letter or number.

Ideally, it should also not allow for any of the special characters listed above to be repeated more than once in succession, i.e. they can have as many spaces/dots/dashes/underscores as they want, but there must be at least one number or letter between them.

I'm also interested to find out if you think this is a good system for a username? I've had a look for some regex that could do this, but none of them seem to allow spaces, and I would like for the usernames to have some spaces in them.

Thank you :)

+2  A: 

Although I'm sure someone will shortly post a 1 million lines regex to do exactly what you want, I don't think in this case a regex is a good solution.

Why don't you write a good old fashioned parser? It will take about as long as writing the regex that does everything you mentioned, but it's going to be much easier to maintain and read.

In particular, this is the tricky part:

it should also not allow for any of the special characters listed above to be repeated more than once in succession

Alternatively you can always do a hybrid of the two. A regex for the other checks ([a-zA-Z0-9][a-zA-Z0-9 _-\.]*[a-zA-Z0-9]) and a non-regex method for the no-repeat requirement.

Krevan
+1  A: 

You don't have to use a regex for everything. I find that requirements like the "no two consecutive characters" usually make the regexes so ugly that it's better to do that bit with a simple procedural loop.

I'd just use something like ^[A-Za-z0-9][A-Za-z0-9 \.\-_]*[A-Za-z0-9]$ (or the equivalents like ::alnum:: if your regex engine is more advanced) and then just check every character in a loop to make sure the next character isn't the same.

By doing it procedurally, you can check all the other rules you're likely to want at some point without resorting to what I call "regex gymnastics", things like:

  • not allowed to contain your first or last name.
  • no more than two consecutive digits.

and so forth.

paxdiablo
+3  A: 

So it looks like you want your username to have a "word" part (sequence of letters or numbers), interspersed with some "separator" part.

The regex will look something like this:

^[a-z0-9]+(?:[ _.-][a-z0-9]+)*$

Here's a schematic breakdown:

           _____sep-word…____
          /                  \
^[a-z0-9]+(?:[ _.-][a-z0-9]+)*$             i.e. "word ( sep word )*"
|\_______/   \____/\_______/  |
| "word"     "sep"   "word"   |
|                             |
from beginning of string...   till the end of string

So essentially we want to match things like word, word-sep-word, word-sep-word-sep-word, etc.

  • There will be no consecutive sep without a word in between
  • The first and last char will always be part of a word (i.e. not a sep char)

Note that for [ _.-], - is last so that it's not a range definition metacharacter. The (?:…) is what is called a non-capturing group. We need the brackets for grouping for the repetition (i.e. (…)*), but since we don't need the capture, we can use (?:…)* instead.

To allow uppercase/various Unicode letters etc, just expand the character class/use more flags as necessary.

References

polygenelubricants
See matches on rubular: http://www.rubular.com/r/UX0l8RroUN
polygenelubricants