ansaurus

Question

how to get xpath of text between or ?

Answer 1

+3 A:

Well, you could use "//br/text()", but that will return all the text-nodes inside the   tags. But since the above isn't well-formed xml I'm not sure how you are going to use xpath on it. Regex is usually a poor choice for html, but there are html (not xhtml) parsers available. I hesitate to suggest one for ruby simply because that isn't "my area" and I'd just be googling...

Marc Gravell 2009-09-28 03:59:27

i am using nokogiri.

2009-09-28 04:02:20

Answer 2

+1 A:

There are several issues here:

XPath works on XML - you have HTML which is not XML (basically, the tags don't match so an XML parser will throw an exception when you give it that text)
XPath normally also works by finding the attributes inside tags. Seeing as your   tags don't actually contain the text, they're just in-between it, this will also prove difficult

Because of this, what you probably want to do is use XPath (or similar) to get the contents of the div, and then split the string based on   occurrences.

As you've tagged this question with ruby, I'd suggest looking into hpricot, as it's a really nice and fast HTML (and XML) parsing library, which should be much more useful than mucking around with XPath

Orion Edwards 2009-09-28 04:02:18

yes. i am using Nokogiri html parsing library. i guess exploding is the best working solution here.

2009-09-28 04:07:51

Answer 3

+1 A:

Try the following, which gets all text siblings of   tags as array of strings stripped from trailing and leading whitespaces:

require 'rubygems'
reguire 'nokogiri'

doc = Nokogiri::HTML(DATA)

fruits =
  doc.xpath('//br/following-sibling::text()
           | //br/preceding-sibling::text()').map do |fruit| fruit.to_s.strip end

puts fruits

__END__
</div>
apple
<br>
banana
<br/>
watermelon
<br>
orange

Is this what you want?

andre-r 2009-09-28 13:48:58

ansaurus

tags:

views:

answers:

how to get xpath of text between <br> or <br /> ?

related questions