views:

29

answers:

1

I would like to write a Rails app that is able to capture a webpage like the Evernote clipper does. If you are not familiar, in your browser, you can click a button on the toolbar and it captures a pretty accurate copy of the webpage layout. For an example, go to http://www.evernote.com/pub/jssmith072/shared and click on the single note on that page and you will be able to see a webpage I captured. There a few reasons I have no idea where to start:

  • How can I get a rendered webpage programmatically in a Rails app? Can/should I use WebKit?
  • How can I store this webpage in my database?
  • How can I display this webpage archive consistently across browsers?
A: 

Personally, I'd be inclined to not store it in the database at all, but rather spawn a background job to pull down the site, parse it and filter it with your readability port and then save it to the filesystem somewhere (public or non, public, depending on your needs, you can easily write an assets serving controller to expose non-public static content) using a directory scheme that identifies it uniquely.

That way you don't need to do any horrible, complicated stuff and can just let servers do what they are good at, rather than having to write something custom to pull potentially large amounts of data out of a database every time this page is viewed.

To something like that, all you need is a simple database entry with an id, an url, some kind of flag to indicate it's been downloaded successfully (or when it last failed so it can be tried again later), the path on the filesystem it should/will be stored at and perhaps a text column with a dump of the pages text in it for search purposes.

darkliquid
Yeah, that sounds like a much better idea than storing it in the database.
Jake