ansaurus

Question

Answer 1

+5 A:

You can do this using the XPath text() selector.

require 'hpricot'
require 'open-uri'

doc  = open("http://stackoverflow.com/") { |f| Hpricot(f) }
text = (doc/"//*/text()") # array of text values
puts text.join("\n")

However this is a fair expensive operation. A better solution might be available.

Simone Carletti 2009-08-07 09:41:53

Answer 2

A:

@weppos: This will be bit better:

text = doc/"//p|div/text()" # array of text values

vulcan_hacker 2009-08-07 11:01:03

yeah, but this assumes he only wants p and div. I think he wants everything.

Geo 2009-08-07 11:04:41

ansaurus

tags:

views:

answers:

Hpricot, Get all text from document

related questions