views:

437

answers:

2

last week I started to write a script in ruby. I needed to scrape some data from the web so I was recommended to use mechanize and then nokogiri.

Mechanize documentation says Mechanize uses nokogiri to parse html. What does this mean for you? You can treat a mechanize page like an nokogiri object. After you have used Mechanize to navigate to the page that you need to scrape, then scrape it using nokogiri methods.

I know that I can use .xpath .at_xpath because it was part of one answer to my question but I do not know exact syntax of these methods, the difference etc. I tried to search nokogiri web.

I was told in this answer that I often use text() expression. This is not required using Nokogiri. You can retrieve the node then call the text method on the node. It's much less expensive. I tried to search nokogiri web but didn't find anything on that.

is out there somebody who can help me how to read nokogiri documentation?

let's say I want to know how to use text method instead of text().

+2  A: 

I am not really sure what the problem is when reading Nokogiri documentation. A quick search for "nokogiri" on Google returns "nokogiri.org" as the first hit. That is the documentation page.

In Ruby, .text() is the same as .text if you are not passing parameters. .text() is an alias for .inner_text(), which will "Get the inner text of all contained Node objects". http://nokogiri.org/search?q=text will get you started.

Greg
+1  A: 

I think one of the things the author means is that the documentation on the site is not in the standard format/display as other sites that use rdoc and various methods to show information. E.G. it is hard to read.

To answer, or try to - I've had luck searching around github for projects that use nokogiri and going from there by reading the source.

sullivan.t