ansaurus

Question

how to translate this hpricot code to nokogiri ?

Answer 1

A:

Nokogiri and Hpricot are pretty interchangeable. I.e. Nokogiri(html) is an equivalent of Hpricot(html). Not really sure I understand what the linked article is trying to achieve, but to:

Extract text from HTML body which includes ignoring large white spaces between tags and words.

This would be an easier approach in Hpricot, and remove the need for the hpricot.search("script").remove bits. I.e. Just get the body in the first place:

Hpricot(html).search('body').inner_text.gsub("\r"," ").gsub("\n"," ").split(" ").join(" ")

And in Nokogiri:

Nokogiri(html).search('body').inner_text.gsub("\r"," ").gsub("\n"," ").split(" ").join(" ")

i5m 2010-04-16 08:51:46

ansaurus

tags:

views:

answers:

how to translate this hpricot code to nokogiri ?

related questions