I'm trying to extract some info from a table based website with hpricot. I get the XPath with FireBug.
/html/body/div/table/tbody/tr/td/table/tbody/tr[2]/td/table/tbody/tr/td[2]/table/tbody/tr[3]/td/table[3]/tbody/tr
This doesn't work... Apparently, the FireBug's XPath, is the path of the rendered HTML, and no the actual HTML from the site. I read that removing tbody may resolve the problem.
I try with:
/html/body/div/table/tr/td/table/tr[2]/td/table/tr/td[2]/table/tr[3]/td/table[3]/tr
And still doesn't work... I do a little more research, and some people report they get their XPath removing the numbers, so I try this:
/html/body/div/table/tr/td/table/tr/td/table/tr/td/table/tr/td/table/tr
Still no luck...
So I decide to do it step by step like this:
(doc/"html/body/div/table/tr").each do |aaa |
(aaa/"td").each do | bbb|
pp bbb
(bbb/"table/tr").each do | ccc|
pp ccc
end
end
end
I find the info I need in bbb, but not in ccc.
What am I doing wrong, or is there better tool to scrap HTML with long/complex XPath.