views:

70

answers:

2

Hi everyone. I need a regular expression to match a very specific sentence format. The format is as follow:

word(that can contain ,()[]&^%# and whitespace), word(that can contain ,()[]&^%# and whitespace), word(that can contain ,()[]&^%# and whitespace)

So basically it's a word, word, word but every word can contain some special characters and whitespaces. Can someone help me please?


Those are the examples:

  1. Various Artists, Total 6, I Built This City (Michael Mayer Mix)
  2. Ada, Blindhouse/ Luckycharm, Luckycharm
  3. Hector, Orale, Orale (Alex Picone Remix)
+1  A: 

I'm not sure that you want to include the , that delimits your sentences in the allowed pattern. If you do want to match a sentence three times:

/^[a-z()[]&^%#\s]+,[a-z()[]&^%#\s]+,[a-z()[]&^%#\s]+$/i
Nev Stokes
I've just noticed that your second example has a / in it which doesn't match the pattern you define.
Nev Stokes
Gumbo
Of course, silly me! Edited to reflect your comment
Nev Stokes
Furthermore you should allow numbers. Either use `a-z0-9` or maybe better `\w`.
nikic
+2  A: 

I would use this solution :

/(?x)([a-z\\d\\s()[\\]&^%#\\/]+),((?1)),((?1))/i

This way you don't have to repeat your pattern.

Code on ideone

Colin Hebert
Wouldn't that only work if the first word was repeated three times though?
Nev Stokes
@Nev Stokes, not (see the code on ideone) it uses subpatterns. It isn't `\1`.
Colin Hebert
+1 that one's new to me. Could you provide reference? (Only found `\1` in the manual.)
nikic
@nikic, on http://www.pcre.org/pcre.txt search for "SUBPATTERNS AS SUBROUTINES"
Colin Hebert
Thanks, I appreciate the link. There seems to be much much more I don't know about PCRE and which isn't documented on php.net. Thanks again.
nikic