views:

113

answers:

3

Ok so I managed to solve a problem at work with regex, but the solution is a bit of a monster.

The string to be validated must be:

zero or more: A-Z a-z 0-9, spaces, or these symbols: . - = + ' , : ( ) /

But, the first and/or last characters must not be a forward slash /

This was my solution (used preg_match php function):

"/^[a-z\d\s\.\-=\+\',:\(\)][a-z\d\s\.\-=\+\',\/:\(\)]*[a-z\d\s\.\-=\+\',:\(\)]$|^[a-z\d\s\.\-=\+\',:\(\)]$/i"

A colleague thinks this is too big and complicated. Well it works, so is it really that bad? Anyone in the mood for some regex-golf?

+3  A: 

You can simplify your expression to this:

/^(?:[a-z\d\s.\-=+',:()]+(?:/+[a-z\d\s.\-=+',:()]+)*)?$/i

The outer (?:…)? is to allow an empty string. The [a-z\d\s.\-=+',:()]+ allows to start with one or more of the specified characters except the /. If a / follows, it also must be followed by one or more of the other specified characters ((?:/[a-z\d\s.\-=+',:()]+)*).

Furthermore, inside a character set, you only need to escape the characters \, ], and depending on the position also ^ and -.

Gumbo
+2  A: 

Try something like this instead

function validate($string) {
   return (preg_match("/[a-zA-Z0-9.\-=+',:()/]*/", $string) && substr($string, 0,1) != '/' && substr($string, -1) != '/'))
}

It's a lot simpler to check the first and last character specifically. Otherwise you're left with dealing with a lot of overhead when it comes to empty strings and such. Your regex, for example, requires the string to be at least one character long, otherwise it doesn't validate. Despite "" fitting your criteria.

Swizec Teller
I like the idea, but 1)you have to escape the / within your [] 2)$string like ab@c will validate as you are not using ^ nor $ 3)btw, your last closing ) should be a ;
Julien
+2  A: 
'#^(?!/)[a-z\d .=+\',:()/-]*$(?<!/)#i'

As others have observed, most of those characters don't need to be escaped inside a character class. Additionally, the hyphen doesn't need to be escaped if it's the last thing listed, and the slash doesn't need to be escaped if you use a different character as the regex delimiter (# in this case, but ~ is a popular choice, too).

I also ditched the double-quotes in favor of single-quotes, which meant I had to escape the single-quote in the regex. That's worth it because single-quoted strings are so much simpler to work with: no $variable interpolation, no embedded executable {code}, and the only characters you have to escape for them are the single-quote and the backslash.

But the main innovation here is the use of lookahead and lookbehind to exclude the slash as the first or last character. That's not just a code-golf tactic, either; I would write the regex this way anyway, because it expresses my intent so much better. Why force the next guy to parse those almost-identical character classes, when you can just say what you mean? "...but the first and last character can't be slashes."

Alan Moore
The problem here is that lookbehind is nonconsuming. So while it will check that the first character isn't /, the character class will then allow for the slash as first character and I'm fairly certain that could be a problem.
Swizec Teller
If the first character is a slash, the lookahead will fail, and the regex will fail without ever applying the character class. It would be the same as if you rearranged your solution to put the `substr($string, 0,1) != '/'` first; if that part failed, the `preg_match` and the other `substr` would never even be called.
Alan Moore
Thanks for this solution: Using a `look behind` to check for the next char, NOT a `look ahead` like `'#^(?!/)[a-z\d .=+\',:()/-]*(?!/)$#i'`. Also does the position of the `look behind` after the `$` matter ?
Julien
@Julien: It doesn't have to be after the `$`, but it does have to be a *lookbehind*, not a *lookahead*. `(<!/)$` means "I'm at the end of the string, and the final character was not a slash" (the order doesn't matter, since they're both zero-width assertions). `(?!/)$` would mean "I'm at the end of the string, and the **next** character isn't a slash"--of course it isn't: there *is* no next character! (Full disclosure: `$` can also match before a newline at the end of the string. If your target strings might end with newlines, you should use `\z` instead.)
Alan Moore