views:

27

answers:

1

Is there an (unobtrusive, to the user) way to get all the text in a page with Javascript? I could get the HTML, parse it, remove all tags, etc, but I'm wondering if there's a way to get the text from the alread rendered page.

To clarify, I don't want to grab text from a selection, I want the entire page.

Thank you!

+2  A: 

I suppose you could do something like this, if you don't mind loading jQuery.

var theText;
$('p,h1,h2,h3,h4,h5').each(function(){
  theText += $(this).text();
});

When its all done, "theText" should contain most of the text on the page. Add any relevant selectors I may have left out.

Greg W
Actually, that's not a bad idea at all, I don't think I'll be needing any text outside these... However, won't this also pick up links inside paragraphs, etc?
Stavros Korokithakis
I think that since we're using jQuery's text() method, it knows to strip those extra tags out for us. If we had used the html() method it would definitely carry the anchor tags along.
Greg W
Ah, thank you, I will try that.
Stavros Korokithakis
You could `switch` on element type that `this` is inside the each() and decide how to handle its content (the value from a drop down, the innerHTML from an anchor/paragraph/etc.)... You could get pretty fancy if you wanted to.
Cory Larson
The point is to rely on the browser's rendering to get the text, though, not to get it from the HTML... Something like "select all text, get selected text".
Stavros Korokithakis