ansaurus

Question

Scrubyt gives 404 Error when clicking link using _details method

Answer 1

+1 A:

    sudo gem install ruby-debug

This will give you access to a nice ruby debugger, start the debugger by altering your script:

    require 'rubygems'
    require 'ruby-debug'
    Debugger.start
    Debugger.settings[:autoeval] = true if Debugger.respond_to?(:settings)

    require 'scrubyt'

    nuffield_data = Scrubyt::Extractor.define do
      fetch 'http://www.nuffieldtheatre.co.uk/cn/events/event_listings.php'

      event do
        title 'The Coast of Mayo'
        link_url
        event_detail do
          dates "1-4 October"
          times "7:30pm"
        end
      end

      next_page "Next Page", :limit => 2

    end

    nuffield_data.to_xml.write($stdout,1)

Then find out where scrubyt is throwing an exception - in this case:

    /Library/Ruby/Gems/1.8/gems/scrubyt-0.3.4/lib/scrubyt/core/navigation/fetch_action.rb:52:in `fetch'

Find the scrubyt gem on your system, and add a rescue clause to the method in question so that the end of the method looks like this:

      if @@current_doc_protocol == 'file'
        @@hpricot_doc = Hpricot(PreFilterDocument.br_to_newline(open(@@current_doc_url).read))
      else
        @@hpricot_doc = Hpricot(PreFilterDocument.br_to_newline(@@mechanize_doc.body))
        store_host_name(self.get_current_doc_url)   # in case we're on a new host
      end
    rescue
      debugger
      self # the self is here because debugger doesn't like being at the end of a method
    end

Now run the script again and you should be dropped into a debugger when the exception is raised. Just try typing this a the debug prompt to see what the offending URL is:

@@current_doc_url

You can also add a debugger statement anywhere in that method if you want to check what is going on - for example you may want to add one between line 51 and 52 of this method to check how the url that is being called changes and why.

This is basically how I figured out the answer to your previous questions.

Good luck.

2008-10-04 22:56:48

Answer 2

A:

Thank you for the very useful answer. I have tried this and @@current_doc_url seems to be nil when I look at it. Any ideas why this might be the case? I've tried to use the debugger to look at the value that has been passed into the function - but that seems to be nil too!

Any ideas where else I could look, or why it might be coming up as nill?

robintw 2008-10-05 08:22:43

Answer 3

A:

Sorry I have no idea why this would be nil - every time I have run this it returns a url - the method self.fetch requires a URL which you should be able to access as the local variable doc_url. If this returns nil also may you should post the code where you have included the debugger call.

2008-10-05 20:19:40

Answer 4

A:

I've tried to access doc_url but that seems to also return nil. When I have access to my server (later in the day) I'll post the code with the debugging bit in it.

robintw 2008-10-06 08:07:44

Answer 5

+1 A:

I had the same issue with relative links and fixed it like this... you have to set the :resolve param to the correct base url

  event do
    title 'The Coast of Mayo'
    link_url
    event_detail :resolve => 'http://www.nuffieldtheatre.co.uk/cn/events' do
      dates "1-4 October"
      times "7:30pm"
    end
  end

Rohan 2009-10-15 13:02:06

ansaurus

tags:

views:

answers:

Scrubyt gives 404 Error when clicking link using _details method

related questions