tags:

views:

233

answers:

1

i have this string

string = "<p>para1</p><p>para2</p><p>para3</p>"

I want to split on the para2 text, so that i get this

["<p>para1</p>", "<p>para3</p>"]

The catch is that sometimes para2 might not be wrapped in p tags (and there might be optional spaces outside the p and inside it). I thought that this would do it:

string.split(/\s*(<p>)?\s*para2\s*(<\/p>)?\s*/)

but, i get this:

["<p>para1</p>", "<p>", "</p>", "<p>para3</p>"]

it's not pulling the start and end p tags into the matching pattern - they should be eliminated as part of the split. Ruby's regular expressions are greedy by default so i thought that they would get pulled in. And, this seems to be confirmed if i do a gsub instead of a split:

string.gsub(/\s*(<p>)?\s*para2\s*(<\/p>)?\s*/, "XXX")
=> "<p>para1</p>XXX<p>para3</p>"

They are being pulled in and got rid of here, but not on the split. Any ideas anyone?

thanks, max

+4  A: 

Replace your capturing groups (…) with non-capturing groups (?:…):

/\s*(?:<p>)?\s*para2\s*(?:<\/p>)?\s*/
Gumbo
This answer is correct. When you split by a regex with capturing groups, it puts the captures into the array, so you can do more complex scanning/splitting operations.
mckeed
Nifty...didn't know we had that in Ruby!
btelles
Thanks Gumbo, that does the trick. I'd never even heard of non-capturing groups before, that's a really useful bit of knowledge.
Max Williams