tags:

views:

306

answers:

1

I have a title doc.at('head/title').inner_html that comes out & and it should be &.

My original document is:

<head><title>Foo & Bar</title></head>

but in comes out as the following:

>> doc = Nokogiri::HTML.parse(file, nil, "UTF-8")
>> doc.at('head/title')
=> #<Nokogiri::XML::Element:0x..fdb851bea name="title" children=#<Nokogiri::XML::Text:0x..fdb850808 "Foo & Bar">>
>> doc.at('head/title').inner_html
=> "Foo &amp; Bar"

I don't want to use Iconv or CGI like:

>> require 'cgi'
>> CGI.unescapeHTML(doc.at('head/title').inner_html)
=> "Foo & Bar"

that is ugly and inconvenient.

Please help me, I just can figure it out :(

+4  A: 

Use content instead of inner_html to get the content as plain text instead of (X)HTML.

irb(main):011:0> doc.at('head/title').content
=> "Foo & Bar"
Ben James