views:

65

answers:

1

using nokogiri,

doc = Nokogiri::HTML(your_html)
doc.xpath("//text()").to_s

this does the job, however, it puts everything into one flat text.

i need to take each text surrounded via html tags

<b> text</b>
<h1>text3</b>

and put them into array. ["text", "text3"]

what is the recommended action ?

i thought of doing

doc.xpath("*").text

but dont know how to iterate through it all.

+2  A: 
doc = Nokogiri::HTML(your_html)
doc.xpath("//text()").to_a
khelll
wow this works really well! it even handles things that are just separated by <br> tags
Kim Jong Woo