views:

1206

answers:

8

I'm looking for a tool that will give me the proper generated source including DOM changes made by AJAX requests for input into W3's validator. I've tried the following methods:

  1. Web Developer Toolbar - Generates invalid source according to the doc-type (e.g. it removes the self closing portion of tags). Loses the doctype portion of the page.
  2. Firebug - Fixes potential flaws in the source (e.g. unclosed tags). Also loses doctype portion of tags and injects the console which itself is invalid HTML.
  3. IE Developer Toolbar - Generates invalid source according to the doc-type (e.g. it makes all tags uppercase, against XHTML spec).
  4. Highlight + View Selection Source - Frequently difficult to get the entire page, also excludes doc-type.

Is there any program or add-on out there that will give me the exact current version of the source, without fixing or changing it in some way? So far, Firebug seems the best, but I worry it may fix some of my mistakes.

Solution

It turns out there is no exact solution to what I wanted as Justin explained. The best solution seems to be to validate the source inside of Firebug's console, even though it will contain some errors caused by Firebug. I'd also like to thank Forgotten Semicolon for explaining why "View Generated Source" doesn't match the actual source. If I could mark 2 best answers, I would.

+6  A: 

[updating in response to more details in the edited question]

The problem you're running into is that, once a page is modified by ajax requests, the current HTML exists only inside the browser's DOM-- there's no longer any independent source HTML that you can validate other than what you can pull out of the DOM.

As you've observed, IE's DOM stores tags in upper case, fixes up unclosed tags, and makes lots of other alterations to the HTML it got originally. This is because browsers are generally very good at taking HTML with problems (e.g. unclosed tags) and fixing up those problems to display something useful to the user. Once the HTML has been canonicalized by IE, the original source HTML is essentially lost from the DOM's perspective, as far as I know.

Firefox most likley makes fewer of these changes, so Firebug is probably your better bet.

A final (and more labor-intensive) option may work for pages with simple ajax alterations, e.g. fetching some HTML from the server and importing this into the page inside a particular element. In that case, you can use fiddler or similar tool to manually stitch together the original HTML with the Ajax HTML. This is probably more trouble than it's worth, and is error prone, but it's one more possibility.

[Original response here to the original question]

Fiddler (http://www.fiddlertool.com/) is a free, browser-independent tool which works very well to fetch the exact HTML received by a browser. It shows you exact bytes on the wire as well as decoded/unzipped/etc content which you can feed into any HTML analysis tool. It also shows headers, timings, HTTP status, and lots of other good stuff.

You can also use fiddler to copy and rebuild requests if you want to test how a server responds to slightly different headers.

Fiddler works as a proxy server, sitting in between your browser and the website, and logs traffic going both ways.

Justin Grant
Familiar with Fiddler, it's not an easy way of doing what I want (viewing the generated source of a page after it's been changed by the user).
jeremy
hi downvoter - care to comment about what's wrong with this answer so I can address the problem?
Justin Grant
he wants the source of the page after javascript has modified the dom.
Byron Whitlock
I'm not the downvoter, but your answer has nothing to do with the question itself. The question may have been edited since you commented.
bradlis7
yep, I know that now... the original question didn't mention that important detail, though. :-) Once I got the new info from the OP, I just updated my answer. But I think my original answer was a reasonable answer to the original question. Even though it's not the best answer (I like Forgotten Semicolon's much better!), I'm wondering what made my answer worthy of a downvote. Not a big deal, just wondering.
Justin Grant
Thanks for this explanation regarding the current HTML existing only inside of the browser's DOM. This is the crux of my problem and I didn't understand that when I asked. It makes me believe that what I'm asking for is essentially impossible.
jeremy
Cool, glad to help. Yep, I think what you're trying to do is impossible. Check out the links on Forgotten Semicolon's answer-- they explain the situation well.
Justin Grant
Nice teamwork! :-)
Forgotten Semicolon
+5  A: 

In the Web Developer Toolbar, have you tried the Tools -> Validate HTML or Tools -> Validate Local HTML options?

The Validate HTML option sends the url to the validator, which works well with publicly facing sites. The Validate Local HTML option sends the current page's HTML to the validator, which works well with pages behind a login, or those that aren't publicly accessible.

You may also want to try View Source Chart (archive.org). An interesting note there:

Q. Why does View Source Chart change my XHTML tags to HTML tags?

A. It doesn't. The browser is making these changes, VSC merely displays what the browser has done with your code. Most common: self closing tags lose their closing slash (/). See this article on Rendered Source for more information (archive.org).

Forgotten Semicolon
A downvote for a solution that actually gets the HTML to the validator?
Forgotten Semicolon
I didn't do the downvote, but "validate HTML" will not send the generated HTML, but the original source. (See the edited question)
Pekka
I just tried this, it doesn't appear to submit the generated source (i.e. the source with DOM changes), but the source that would be seen with firefox's "view source" option.
jeremy
Changing the goalposts on me!
Forgotten Semicolon
I thought "view generated source" would make that part of the question clear, but judging by the 4 answers so far I was clearly mistaken :)
jeremy
"Generated" could have been from serverside code. ;-)
Forgotten Semicolon
hey, great answer.
Justin Grant
The link to View Source Chart is broken
Casebash
That does happen. I added archive.org links for that content. It's slow, but still available.
Forgotten Semicolon
A: 

Using the Firefox Web Developer Toolbar (https://addons.mozilla.org/en-US/firefox/addon/60)

Just go to View Source -> View Generated Source

I use it all the time for the exact same thing.

lewsid
And I now see your edit where you cite the Doctype issue with the Toolbar. That's a fair criticism and I have nothing else to suggest.
lewsid
A: 

Rest client

I use it (and it satisfies me :)).

The best thing for me is, that it's a Firefox plugin.

Trick
A: 

Run your code through a validator before you manipulate the DOM, then use firebug. If you inject invalid markup via javascript, well, you're on your own!

ScottE
+1  A: 

If you load the document in Chrome, the Developer|Elements view will show you the HTML as fiddled by your JS code. It's not directly HTML text and you have to open (unfold) any elements of interest, but you effectively get to inspect the generated HTML.

Carl Smotricz
In Google Chrome, in Inspect Element, you can right click on any element and "Copy as HTML"
plutext
+2  A: 

Justin is dead on. The key point here is that HTML is just a language for describing a document. Once the browser reads it, it's gone. Open tags, close tags, and formatting are all taken care of by the parser and then go away. Any tool that shows you HTML is generating it based on the contents of the document, so it will always be valid.

I had to explain this to another web developer once, and it took a little while for him to accept it.

You can try it for yourself in any JavaScript console:

el = document.createElement('div');
el.innerHTML = "<p>Some text<P>More text";
el.innerHTML; // <p>Some text</p><p>More text</p>

The un-closed tags and uppercase tag names are gone, because that HTML was parsed and discarded after the second line.

The right way to modify the document from JavaScript is with document methods (createElement, appendChild, setAttribute, etc.) and you'll observe that there's no reference to tags or HTML syntax in any of those functions. If you're using document.write, innerHTML, or other HTML-speaking calls to modify your pages, the only way to validate it is to catch what you're putting into them and validate that HTML separately.

That said, the simplest way to get at the HTML representation of the document is this:

document.documentElement.innerHTML
Sidnicious
A: 

Why not type this is the urlbar?

javascript:alert(document.body.innerHTML)
Mike