views:

238

answers:

1

This seems like the hardest problem I have had yet, but maybe I am making it harder than it needs to be. I need to remove an unknown number of nested elements that may or may not be at the beginning of a sentence. The span elements contain a number of words in parentheses. So in the sentence:

(cryptography, slang) An internet firewall.

(cryptography, slang) looks like this:

<span class="ib-brac"><span class="qualifier-brac">(</span></span><span class="ib-content"><span class="qualifier-content">cryptography<span class="ib-comma"><span class="qualifier-comma">,</span></span> <a href="/wiki/Appendix:Glossary#slang" title="Appendix:Glossary">slang</a></span></span><span class="ib-brac"><span class="qualifier-brac">)</span></span>

I was thinking a good solution would be to use regex and nokogiri to check if the first '(' exists or not and if it does, remove all the spans until the closing ')' is reached, but I have no idea how to do this. The solution I am using now does not account for a variable number of spans:

if definition.inner_html =~ /^<span class/
  definition.search("span")[0..4].each do |span|
    span.remove
  end  

end

+1  A: 

Not 100% sure what you're trying to do, but your code above can delete a variable number of spans if you just leave off the index:

if definition.inner_html =~ /^<span class/
  definition.search("span").each do |span|
    span.remove
  end

end

jn80842