tags:

views:

156

answers:

5

I'm trying to write a regular expression that matches all word inside a specific string, but skips words inside brackets. I currently have one regex that matches all words:

/[a-z0-9]+(-[a-z0-9]+)*/i

I also have a regex that matches all words inside brackets:

/\[(.*)\]/i

I basically want to match everything that the first regex matches, but without everything the second regex matches.

Sample input text: http://gist.github.com/222857 It should match every word separately, without the one in the brackets.

Any help is appreciated. Thanks!

+3  A: 

Perhaps you could do it in two steps:

  1. Remove all the text within brackets.
  2. Use a regular expression to match the remaining words.

Using a single regular expression to try to do both these things will end up being more complicated than it needs to be.

Greg Hewgill
Yea, that's exactly what I'll do too.
o.k.w
A: 

I don't think I understand the question properly. Why not just make a new string that does not contain the second regex like so:

string1 =~ s/\[(.*)\]//g

Off the top of my head won't that match what you deleted while storing the result in string1? I have not tested this yet though. I might test it later.

Robert Massaioli
A: 

I agree with Shhnap. Without more info, it sounds like the easiest way is to remove what you don't want. but it needs to be /[(.*?)]/ instead. After that you can split on \s.

If you are trying to iterate through each word, and you want each word to match maybe you can cheat a little with: string.split(/\W+/) .You will lose the quotations and what not, but you get each word.

cgr
A: 

How 'bout this:

your_text.scan(/\[.*\]|([a-z0-9]+(?:-[a-z0-9]+)*)/i) - [[nil]]
glenn mcdonald
+1  A: 

Which Ruby version are you using? If it's 1.9 or later, this should do what you want:

/(?<![\[a-z0-9-])[a-z0-9]+(-[a-z0-9]+)*(?![\]a-z0-9-])/i
Alan Moore