tags:

views:

179

answers:

1

This is my first time asking a question, please be gentle!

I have a Rails application that handles content for a whole bunch of domains (over 100 so far). Each domain either points to where my app is hosted (Heroku, if you're interested), or the original place it was hosted. Every time a domain is ready, it needs to point to the heroku servers, so that my app can serve content for it.

To check to see if a domain has successfully been changed over from its original location to my application, I'm writing a script that looks for a special hidden tag I included in them. If it finds the tag, then the domain is pointing to my app. If not, it hasn't been changed, which I record.

The problem is that, at least for one domain so far, I'm getting a 404 OpenURI::HTTPError exception for my script. Which is strange, because I can visit the site just fine and I can even get it via curl. Does anyone know why a working site would get an error like this? Here's the important snippet:

require 'rubygems'
require 'open-uri'
require 'hpricot'
...
url = "http://www.#{domainname}.com"
doc = Hpricot(open(url)) #<---- Problem right here.
...

Thanks for all of your help!

A: 

Welcome to SO!

Here would be my debugging method:

  1. See if you can replicate in irb with open-uri alone, no Hpricot:

$ irb -rubygems -ropen-uri

>> open('http://www.somedomain.com')

  1. Look in your Heroku log to see if it even touches the server.
  2. Look in your original server's log for the same.
  3. Throw open something like Wireshark to see the HTTP transaction, and see if a 404 is indeed coming back.

Start with that, and come back with your results.

Colin Curtin
I tried it with irb; does the same thing without Hpricot. The site I'm asking for isn't pointing at the heroku version yet (which is why it makes a good test; it should be recorded for fixing), but it is stopping my script from this error.In case you are curious, the site I'm trying to get is http://www.aquarium-equipment.com. I just need ruby to open the darn thing!
G. Martin
Ok, I've made progress. I abandoned the open-uri approach, and instead decided to go with Net::HTTP. This has produced output, instead of an error. However, the response is a 404 error! The site is totally visitable, so I don't understand why. Since curl gets the right response, I must assume it has something to do with ruby, or the way ruby interacts with the server.
G. Martin
Your server is expecting a User-Agent. By default, OpenURI does not send one. This works: `open('http://www.aquarium-equipment.com', "User-Agent" => "Ruby/#{RUBY_VERSION}")`
Colin Curtin
Oh thank you! I'm still learning Ruby, so I really appreciate help like this. Thanks!
G. Martin