tags:

views:

47

answers:

1

run the following, and its supposed to return the company name. The xpath works in firefox, and it returns the company name. however in nokogiri, this isn't happening, it jsut returns empty string!

require 'rubygems'
require 'nokogiri'
require 'open-uri'

 doc = Nokogiri::HTML(open('http://www.careerbuilder.com/JobSeeker/Jobs/JobDetails.aspx?IPath=QHK
CV&ff=21&APath=2.21.0.0.0&job_did=J3G71D73BM9HCK1M84Z&cbRecursionCnt=1&cbsid=6d2aee1515ed404b8306d1a583592cd4-314600403-JQ-5'))
companyname = doc.xpath("/html[1]/body[1]/div[1]/div[1]/form[1]/div[1]/table[1]/tbody[1]/tr[2]/td[1]/div[1]/table[1]/tbody[1]/tr[1]/td[1]/div[1]/div[2]/table[1]/tbody[1]/tr[1]/td[2]").to_s

puts companyname
A: 

Your xpath is not correct :)

You should omit the tbody part, this is generated by the browser but not by nokogiri!

doc.xpath("/html[1]/body[1]/div[1]/div[1]/form[1]/div[1]/table[1]/tr[2]/td[1]/div[1]/table[1]/tr[1]/td[1]/div[1]/div[2]/table[1]/tr[1]/td[2]").to_s

 

NB: Also you xpath will be more stable against changes of the HTML page if you use any class or id attributes to selected nodes, rather than the full path. For example you could use

doc.xpath("//div[@class='job_desc'][1]/table[1]/tr[1]/td[2]")

or even simple just use a css selector

doc.css("div.job_desc td")[1]
Adrian
any idea why browser adds tbody ?
gpow
The HTML standard defines that browsers must do so, see also http://stackoverflow.com/questions/938083/why-do-browsers-insert-tbody-element-into-table-elements
Adrian