views:

54

answers:

3

Is it possible to get the source of the current HTML document, exactly as it was loaded, in text form? (i.e. not the "Generated source" after parsing and DOM manipulation.)

Note: Issuing an extra AJAX request to retrieve the HTML page again is not an option in this case: The document could have changed.

Most browsers have a "view source" functionality, which would provide exactly what I want - so browsers keep the original HTML content anyway. Would be nice, if I could access that...

+3  A: 

You can't do this with JavaScript, the browser has no responsibility to keep the original document really. Is making an AJAX request with a timestamp an option? You could store the loaded date of the page with new Date() and pass this timestamp to the server when asking for the document again, if a history was available.

Other than that...I'm not sure how you'd do this with JavaScript/HTML. What is your actual end-game goal here though? Are you seeing if a <form> and it's inputs changed, or something else?

Nick Craver
I'm thinking about possibilities to make it harder for an attacker to modify the page (by manipulating HTTP traffic). I would build its MD5 sum, and let that check by a JavaScript that was loaded via HTTPS. It's just a rough idea -- I really don't know, if I can make that work... (The page itself has to be loaded via HTTP due to SOP issues, but it's possible to include HTTPS scripts in such a page!)
Chris Lercher
@chris_l - With http/https you'll possibly have some cross-domain issues...I'd *really* press that SOP getting changed. My current employer is ISO certified, we're under the same restrictions, but getting them changed for the overall good is worth it every time. You can put a hash in the page that verifies against something on the server, an IP/session variable changing every load, etc...but none of that prevents man-in-the-middle attacks really. HTTPS/SSL is definitely you're best option, if you're able to push for that SOP getting changed at all.
Nick Craver
@Nick: Oh, I meant "Same Origin Policy", not "Standard Operating Procedure" - I just realized the ambiguity of that abbreviation... :-) I'm afraid, I can't change that: There will have to be images included from foreign HTTP pages (and I can't copy them to my own server).
Chris Lercher
@chris_l - Images are fine, as long as the scripts/page itself are from the same scheme/domain, thats all that will be affected, try your page and scripts over HTTPS, images with HTTP if necessary, shouldn't be an issue for same-origin.
Nick Craver
@Nick: Having HTTP images on an HTTPS page requires users to click away a message box like "This page contains insecure elements - do you want to show these elements Yes/No" - can't do that (and I also don't want to train people to click away warnings)...
Chris Lercher
Accepted, probably my idea is impossible to implement - would have been too good to be true. :-)
Chris Lercher
+3  A: 

As far as I know of, there is no way of doing so.

You may try grab the HTML very early and store it in a variable, but that's a very poor alternative because:

  • if very early is too early (before all DOM nodes are loaded), you'll run into trouble trying to get the innerHTML property
  • if very early is when the DOM is ready for manipulation, it might be too late already (if you have things like <script>document.write(stuff);</script> you may already seeing a different view over the HTML content)

Re-fetching the document with AJAX, despite its many possible implications, may be your best alternative regarding this matter.

Miguel Ventura
A: 

A very bad hack-around method would be to load the page only using JS. Load a blank page with a single AJAX call to get the actual content of the page.

However, before doing that, I'd rethink what you are trying to do and why you need the "saved state."

Aaron Harun