views:

1212

answers:

2

Selenium Remote Control has a method of "get_html_source", which returns the source of the current page as a string.

AFAIK, this method works in all cases in Firefox and Safari. But when it's invoked in Internet Explorer, it returns an incorrect source.

Does anyone know if this is a bug with Selenium or Internet Explorer, and if there's a fix?

+2  A: 

I'm 99% sure get_html_source uses the browser's innerHTML property. InnerHTML returns the browser's internal representation of a document, and has always been inconsistent and "wonky" between platforms.

You can test this by temporarily adding the following onload attribute to the body tag of your page.

onload="var oArea = document.createElement('textarea');oArea.rows=80;oArea.cols=80;oArea.value = document.getElementsByTagName('html')[0].innerHTML;document.getElementsByTagName('body')[0].appendChild(oArea)"

This will add a text area to the bottom of your page with the document's innerHTML. If you see the same "incorrect" HTML source you know IE's the culprit here.

Possible workarounds would be running the source through HTML Tidy or some other cleaner if you're after valid markup. I don't know of anything that will give you a consistent rendering between browsers.

Alan Storm
+1  A: 
Howard