views:

49

answers:

2

Hi All,

I'm trying to edit the readability.js file from http://code.google.com/p/arc90labs-readability/.

It's a bookmarklet that "cleans" the current page by stripping everything except for the web page/web article title and body.

However, I'd like to edit the script so that when the bookmarklet is active, the current page is untouched but outputs the "cleaned" html file to a specified local directory instead.

Can anyone help? Thank you!

Note: The clean HTML file is called 'document.body.innerHTML'

A: 

You don't really need to modify the readability code. Just pull the contents of:

document.getElementById("readability-content");

You can then pass that onto a local script to be saved.

Jonathan Sampson
+1  A: 

To begin with, it can't be done without touching the original page. The way the script works, it edits the current page (so image urls continue to work, etc). The best you could do would be to store the innerHTML of the root html and then restore it after you have grabbed the content (or store the head and body separately) It would look something like this:

  1. First you would need to store the existing innerHTML of the html element.
  2. Next, you would have the script run as needed, just remove the readability-controls part.
  3. Get the HTML contents of either the readability-content or the whole document and store it in a variable.
  4. Restore the original content using the content stored in step 1 (so the page goes back to how it was before)

At this point, depending on your browser, you could either try to use a dataURI or you could dynamically add a reference to the Downloadify library, images, etc and add the download button to the page. Finally, clicking the "Download" button you could pre-supply the filename and the data stored in step 3, but the location would have to be selected every time.

Sorry this is so hypothetical, but it would take quite a bit of work to put this together.

Doug Neiner