views:

35

answers:

1

So I have a web app where both users and non-users can submit links to content. My web app is a site where users receive donations to content so I make sure that users verify that they are the owner of a website before receiving donations.

So when users submit a link from their own site they get a form with a select field of all the sites they have registered and verified. If users are submitting links for other people's sites they get a form where it's just a text field for an :unclaimed_domain (which is not in the db but an attr_accessor that is used as part of a find_or_create method)

...I hope that's enough background info.

I'm trying to implement a feature that keeps users from submitting links that turn out to be 404 errors.

This is my current unsatisfactory validation (the validation never passes):

  require 'net/http'
  require 'nokogiri'
  require 'open-uri'

    def should_not_have_a_404_link
        if unclaimed_domain != nil || unclaimed_domain != "" #attr_accessor field
            domain = unclaimed_domain
        else
            domain = Website.find_by_website_id(self.website_id).domain #select_field
        end
        response = Net::HTTP.start(domain, 80) {|http| http.head("/self.link") }
        if response.code == "404"
            errors.add(:link, "404 Error")
        end         
    end

Another method I tried was this:

def should_not_have_a_404_link
    if unclaimed_domain != nil || unclaimed_domain != "" #attr_accessor field
        domain = unclaimed_domain
    else
        domain = Website.find_by_website_id(self.website_id).domain #select_field
    end
   begin
     Nokogiri::HTML(open("http://#{domain}/self.link"))
   rescue
     errors.add(:link, "404 error")
   end
 end

This works when a brand new site is submitted, but doesn't work for an existing site.

I also have a callback called :make_full_link which could be part of the problem

    before_create :make_full_link 
    before_update :make_full_link

   def make_full_link
        website = Website.find(website_id)
        unless self.link.index("http://#{website.domain}")        
        old_link = self.link
        self.link = "http://#{website.domain}/#{old_link}"
        end
    end
end

Any ideas?

+1  A: 

I imagine I could lock your rails processes (all the dynamic content on your site) by hosting my own super-slow http server that sends one or two characters every few minutes and submitting three or four or ten requests. It'd tie up all your queue processors, and no more dynamic content could be served.

I suggest adding a timeout to your requests (if you really want to perform online validity checking) or just moving the validity checking to an offline task run out of cron or on-demand or something.

Further, you're just checking that you're not getting a 200 response code, but e.g. 301, 302 and 307 are fine, just require more processing on the part of clients.

Rather than adding "404 error" to your errors, I suggest adding the actual response code. That'd be way more useful if e.g. access controls have been put in place to allow per-netblock access, and your host is receiving 403 responses, but reporting to the user that it is getting 404 responses. These are very different things and should be reported correctly. (Have you ever seen the useless IE error message "sorry, couldn't connect to the remote host, maybe it is offline, maybe it is firewalled, maybe the DNS doesn't exist, maybe you aren't connected to the internet any more"? I hate it, it gives no details on how to fix the situation.)

sarnold
Thanks for the advice, I'll just change that line to "if response.code == '404'". I'm not looking to eliminate all invalid links, just most of them so they don't clog up the database. How would you go about making those timeouts by the way?
Kenji Crosland
Cool, the `net/http` library provides a `read_timeout=()` that looks like it will provide good enough timeout support: http://ruby-doc.org/stdlib/libdoc/net/http/rdoc/classes/Net/HTTP.html#M000666
sarnold
hmm, the timeout doesn't seem to be working. It's probably something else I'm doing wrong.
Kenji Crosland