views:

199

answers:

1

Is there a way to edit the text of a nokogiri element? I have a nokogiri element that contains a list element (<li>) and I would like to remove some characters from the text while preserving the <li> html. Specifically, I want to remove a leading ":" character in the text if it exists. It doesn't look like there's a text= method for nokogiri elements, but I just wanted to make sure.

Maybe I will have to use regular expressions? If so, how would I remove a leading ":" if it looks something like:

<li>: blah blah blah</li>

p.s. I am using ruby.

+2  A: 
#!/usr/bin/ruby1.8

require 'rubygems'
require 'nokogiri'

html = <<EOS
  <ul>
    <li>: blah blah blah</li>
    <li>: foo bar baz</li>
  </ul>
EOS

doc = Nokogiri::HTML.parse(html)
for li in doc.xpath('//li/text()')
  li.content = li.content.gsub(/^: */, '')
end
puts doc.to_html

# => <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"&gt;
# => <html><body><ul>
# => <li>blah blah blah</li>
# =>     <li>foo bar baz</li>
# =>   </ul></body></html>
Wayne Conrad
Shouldn't that be `li.content = li.content.gsub(/^: */, '')` ?
Daniel Vandersluis
@Daniel, You caught me. This code went through some iterations before I posted it. Then I noticed a variable name left over from a previous iteration, and decided it to just edit the answer to the good name. But I missed one.
Wayne Conrad
What about preserving links that are in the list element?
TenJack
Just change the xpath to '//li/text()'. I'll edit the answer accordingly.
Wayne Conrad
Cool, thanks a lot. I also tried: definition.inner_html = definition.inner_html.gsub(/^: */, '') which seemed to work as well.
TenJack