views:

49

answers:

2

I have some code validating a string of 1 to 32 characters, which may contain only alpha-numerics and hyphens ('-') but may not begin or end with a hyphen.

I'm using PCRE regular expressions & PHP (albeit the PHP part is not really important in this case).

Right now the pseudo-code looks like this:

if (match("/^[\p{L}0-9][\p{L}0-9-]{0,31}$/u", string) 
    and
    not match("/-$/", string))

   print "success!"

That is, I'm checking first that the string is of right contents, doesn't being with a '-' and is of the right length, and then I'm running another test to see that it doesn't end with a '-'.

Any suggestions on merging this into a single PCRE regular expression?

I've tried using look-ahead / look-behind assertions but couldn't get it to work.

+1  A: 

Try this regular expression:

/^[\p{L}0-9](?:[\p{L}0-9-]{0,30}[\p{L}0-9])?$/u

And if you want to use look-around assertions:

/^[\p{L}0-9][\p{L}0-9-]{0,31}$(?<!-)/u
Gumbo
The first suggestion is nice and seem to pass all my tests, thanks!BTW do you know if there's any reason to prefer one over the other?
Shahar Evron
@Shahar Evron: There are regular expression implementation that do not support look-behind assertions or look-around assertions at all. In that case it’s good to know an alternative.
Gumbo
+1  A: 

A slightly alternative approach would be to keep your character class in one piece and be specific about the points where you don't want to allow the hyphen.

/^(?!-)[\p{L}0-9-]{1,32}(?<!-)$/Du

Also note the D modifier which everyone always seems to forget.

Finally, just to be sure, you are aware that \pL will match much more than a-zA-Z, right? Just checking.

salathe
Yes, I'm aware of that, that's why I'm using it :)
Shahar Evron