views:

66

answers:

1

This is what I'd like to achieve:

  1. User input URL (http://google.com for example)
  2. GET the page, and render it on my own domain
  3. Traverse the DOM (Using JS) and all that jazz

The problem is I don't want to use an iFrame, because then I can't traverse the DOM of the page loaded.

The only solutions I can think of is to parse the page for relative URL's, and set them as absolute. Might not work everywhere. Another way is to run wget and save everything (even images) in a temporary folder. Scaling this would be impossible.

Any other ideas?

+2  A: 

Sounds like a simple proxy. Your rails back-end can use open-uri to load a site in an action, and render the same HTML.

class ProxyController < ActionController::Base

  def get
    require 'open-uri'
    file = open params[:url]
    render :inline => file.read
  end

end

Access this using something like: (don't forget to URL-encode as necessary)

http://mysite.com/proxy/get?url=http://www.proxiedsite.com

You might do some parsing before rendering the HTML, adding whatever you want to the page, including javascript.

I assume you'll consult the terms of use for whatever content you are proxying.

fullware