scrubyt

How to get 'Next Page' link with Scrubyt

I'm trying to use Scrubyt to get the details from this page http://www.nuffieldtheatre.co.uk/cn/events/event_listings.php?section=events. I've managed to get the titles and detail URLs from the list, but I can't use next_page to get the scraper to go to the next page. I assume that's cause I'm not using the correct pattern for the next p...

Any scrubyt command that clicks a link returns a 403 Forbidden Error

I'm trying to use Scrubyt to navigate around a website, but whenever I use it to click any links it gives me 403 Forbidden errors. The website doesn't require logins or anything so I don't understand this. Might it need some kind of session variable, or the right UserAgent string. Any idea how I might fix this? ...

Scrubyt gives 404 Error when clicking link using _details method

This might be a similar problem to my earlier two questions - see here and here but I'm trying to use the _detail command to automatically click the link so I can scrape the details page for each individual event. The code I'm using is: require 'rubygems' require 'scrubyt' nuffield_data = Scrubyt::Extractor.define do fetch 'http://w...

Transitioning from Scrubyt to Nokogiri- Write to XML or Hash?

I'm trying to transition this bit of code from scrubyt to nokogiri, and am stuck trying to write my results to either a hash or xml. In scrubyt it looks like the following: require 'rubygems' require 'scrubyt' result_data = Scrubyt::Extractor.define do fetch "http://rads.stackoverflow.com/amzn/click/0061673730" results "//d...

Hpricot or scRUBYt

I'm having problems deciding between hpricot and scrubyt and I was wondering if someone who has worked with them could provide an advantages/disadvantages list for each. ...

Scraping hidden HTML (when visible = false) using Hpricot (Ruby on Rails)

Hi, I've come across an issue which unfortunately I can't seem to surpass, I'm also just a newborn to Ruby on rails unfortunately hence the number of questions I am attempting to scrape a webpage such as the following: http://www.yellowpages.com.mt/Malta/Grocers-Mini-Markets-Retail-In-Malta-Gozo.aspx I would like to scrape The Addres...

scrubyt: multiple form in one page

How do I target one form from another when there are 2 forms in the same page like this page? http://screener.finance.yahoo.com/stocks.html Here's my sample code: require 'rubygems' require 'scrubyt' extractor = Scrubyt::Extractor.define do fetch 'http://screener.finance.yahoo.com/stocks.html' select_option('prmin', '5') select...

Scrubyt: Using big5 strings in query_field for fill_textfield

Does anyone know of a way to get fill_textfield to accept a big5-encoded string in the query_field? I keep getting an "unterminated string meets end of file" error with this: require 'rubygems' require 'scrubyt' search_data = Scrubyt::Extractor.define do fetch 'http://www.google.com/ncr' fill_textfield 'q', '你好世界' submit end ...

How to export scrubyt extractor?

I've written a scrubyt extractor based on the 'learning' technique - that is, specifying the current text on the page and getting it to work out the XPath expressions itself. However, I now want to export the extractor so that it can be used even when the page has changed. The documentation for scrubyt seems to be all over the place now...

Is it possible to set the referer with Scrubyt?

I can't seem to get a page to load with scrubyt and I think its because the page I am navigating to checks the referer. Is it possible to set the referer on the fetch action? ...

scrubyt - > Check for tag existence?

I'm trying to use scrubyt to scrape a page and have everything working except for a decent way of advancing to the next page of the results. The next_page approach isn't working due to the url being relative. I figured out a simple way to do it but it all hinges on being able to use something like: if node_exists("//div[@class='pagina...

How to use Scrubty properly to grab URL from the XML outputted content

I am by no means a master with Ruby and am quite new to Scrubyt. I was just trying out some examples found on there wiki page. The example i was working on was getting the search results returned by Google when you search for 'ruby' and I had the idea of grabbing the URL of each result so I could go ahead and fetch that page as well. The...

Scrubyt "next_page" not working with relative links?

Hello all. I'm trying to scrape the the Yellow Pages website. Specifically, this link http://www.yellowpages.com/santa-barbara-ca/restaurants. My code works perfectly except for one small problem. Because the "Next" link to go to the next page of restaurants is a relative link, Scrubyt's "next_page" function doesn't work...apparently...