tags:

views:

604

answers:

1

I'm trying to use Scrubyt to get the details from this page http://www.nuffieldtheatre.co.uk/cn/events/event_listings.php?section=events. I've managed to get the titles and detail URLs from the list, but I can't use next_page to get the scraper to go to the next page. I assume that's cause I'm not using the correct pattern for the next page link. I tried the string "Next Page", and I've also tried the XPath. Any other ideas?

The code is below:

require 'rubygems'
require 'scrubyt'

nuffield_data = Scrubyt::Extractor.define do
  fetch 'http://www.nuffieldtheatre.co.uk/cn/events/event_listings.php?section=events'

  event do
    title 'The Coast of Mayo'
    #url "href", :type => :attribute
    link_url
  end

  next_page "Next Page", :limit => 2


end

  nuffield_data.to_xml.write($stdout,1)
+2  A: 

Try this with a slightly different URL:

fetch 'http://www.nuffieldtheatre.co.uk/cn/events/event_listings.php'

scrubyt seems to be having issues with "?section=events" query on the end of the URL.

When it looks for the next page it is trying to return this URL:

http://www.nuffieldtheatre.co.uk/cn/events/?pageNum_rsSearch=1&totalRows_rsSearch=39&section=events

instead of:

http://www.nuffieldtheatre.co.uk/cn/events/event_listings.php?pageNum_rsSearch=1&totalRows_rsSearch=39&section=events

Removing the query string on the end of the URL seems to fix this - you might want to file this as a bug.