Firefox uses W3C standard Node::textContent
, but its behavior differs "slightly" from that of MSHTML's proprietary innerText
(copied by Opera as well, some time ago, among dozens of other MSHTML features).
First of all, textContent
whitespace representation is different from innerText
one. Second, and more importantly, textContent
includes all of SCRIPT tag contents, whereas innerText doesn't.
Just to make things more entertaining, Opera - besides implementing standard textContent
- decided to also add MSHTML's innerText
but changed it to act as textContent
- i.e. including SCRIPT contents (in fact, textContent
and innerText
in Opera seem to produce identical results, probably being just aliased to each other).
And finally, Safari 2.x also has buggy innerText
implementation. In Safari, innerText
functions properly only if an element is
neither hidden (via style.display == "none"
) nor orphaned from the document. Otherwise, innerText
results in an empty string.
I was playing with textContent
abstraction (to work around these deficiencies), but it turned out to be rather complex.
You best bet is to first define your exact requirements and follow from there. It is often possible to simply strip tags off of innerHTML
of an element, rather than deal with all of the possible textContent
/innerText
deviations.
Another possibility, of course, is to walk the DOM tree and collect text nodes recursively.