views:

458

answers:

3

Hey and happy holidays

I'm working in ruby on rails and need the following:

remove all "br" html tags between "code" html tags in a string of html. The "code" tags might occur more than once.

Now, it's not screen scrapping I'm trying to do. I have a blog and would like to allow people to use the code html tags only in the comments. So when formatting the string I normally use simple_format but I'd like it to ignore code html tags.

Thanks in advance.

A: 

Using Hpricot or a HTML parser of your choice would be a far, far better idea.

squeeks
A: 

I second on Hpricot, but what are trying to do? Attempting to do some sort of web-scraping or are you parsing the HTML from a model?

Rilindo
i'm trying to edit the simple_format helper to ignore code tags and everything in between
Jaan J
+1  A: 

If you absolutely positively have to use regexp, try this one, which catches all <br>, <br/> and <br /> tags:

str.gsub(/<code>.+?<\/code>/) {|s| s.gsub(/<br\s*\/?>/, "")}

Tested with:

str = "Lorem ipsum dolor sit amet<br />, <code>consectetur adipisicing elit<br />, sed do eiusmod tempor incididunt ut labore<br> et dolore magna aliqua</code>. Ut enim ad minim veniam,<br> quis nostrud exercitation ullamco laboris nisi<br/> ut aliquip ex ea commodo consequat. <code>Duis aute irure dolor in reprehenderit<br /> in voluptate velit esse cillum dolore<br/> eu fugiat nulla pariatur.</code> Excepteur sint occaecat cupidatat non proident,<br /> sunt in culpa qui officia deserunt mollit anim id est laborum."
p str.gsub(/<code>.+?<\/code>/) {|s| s.gsub(/<br\s*\/?>/, "")}

If you don't have to use regexp, use an html parser like nokogiri.

vonconrad
thank you. I already found another simple_format hack that did it for me. The problem was the logic as i could not figure out a method of using regexp in a regexp.
Jaan J