views:

108

answers:

4

I'm trying to use Nokogiri to parse an HTML file with some fairly eccentric markup. Specifically, I'm trying to grab divs which have both ids, multiple classes and styles defined. The markup looks something like this:

<div id="foo">
  <div id="bar" class="baz bang" style="display: block;">
    <h2>title</h2>
    <dl>
      List of stuff
    </dl>
  </div>
</div>

I'm attempting to grab the <dl> which sits inside the problem div. I can get divs with a single id attribute with no problem, but I can't figure out a way of getting Nokogiri to grab divs with both ids and classes. So these work fine:

content = @doc.xpath("//div[id='foo']")

content = @doc.css('div#foo')

But these doesn't return anything:

content = @doc.xpath("//div[id='bar']")

content = @doc.xpath("div#bar")

Is there something obvious that I'm missing here?

+1  A: 

I think content = @doc.xpath("div#bar") is a typo and should be content = @doc.css("div#bar") or better content = @doc.css("#bar"). The first expression in your second code chunk seems to be ok.

floatless
+2  A: 

I can get divs with a single id attribute with no problem, but I can't figure out a way of getting Nokogiri to grab divs with both ids and classes.

You want:

//div[id='bar' and class='baz bang' and style='display: block;']
Dimitre Novatchev
+1  A: 

The following works for me.

require 'rubygems'
require 'nokogiri'

html = %{
<div id="foo">
  <div id="bar" class="baz bang" style="display: block;">
    <h2>title</h2>
    <dl>
      List of stuff
    </dl>
  </div>
</div>
}

doc = Nokogiri::HTML.parse(html)
content = doc
  .xpath("//div[@id='foo']/div[@id='bar' and @class='baz bang']/dl")
  .inner_html

puts content
AboutRuby
A: 

You wrote:

I'm trying to grab divs which have both ids, multiple classes and styles defined

And

I'm attempting to grab the <dl> which sits inside the problem div

So, this XPath 1.0:

//div[@id][contains(normalize-space(@class),' ')][@style]/dl
Alejandro