views:

221

answers:

4

I'm trying to convert HTML to plain text. Is it efficient? Am I missing something?

txt = $("body").find("script,noscript,style,:hidden").remove().end().text();

Thanks!

+1  A: 

HTML is text.

EDIT Try this...

// Get current body text
var html = $("body").text();

// Create a new jQuery object out of body text and remove desired elements
var text = $(html).remove("script,noscript,style,:hidden").text();
Josh Stodola
I am looking for the plain text version of my HTML document.
bosh
Answer updated. I think it will work for you.
Josh Stodola
Indeed, I eventually ended up doing something similar, since once you remove the elements it's basically impossible to plug them back. Now if only we had a recursive version of clone()...
bosh
You could use jQuery extend to do a deep copy like this: `var html = $.extend(true, {}, $("body").text());` and then have then use the same `var text` line in my answer. Try that, if it works as you expect, I will edit my answer.
Josh Stodola
Josh, you are basically copying a flat string (`$("body").text()`). Why use `extend(true,...)`? I ended up creating a jQuery.html2text function (based on `jQuery.text()`) specifically for converting from HTML.
bosh
I guess I misunderstood your previous comment :D
Josh Stodola
A: 

If you're trying to just render it to the screen you might be able to:

<pre>
    some html here
</pre>
David
I'm trying to use a client side script to convert HTML to plain text.
bosh
when you say plain tesxt do you mean you want to remove the tags?
Lewis
A: 

You want element.textContent (element.innerText for IE).

Eli Grey
A: 
var scriptContents = $('body').find('script').html();
var noScriptContents = $('body').find('noscript').html();
var styleContents = $('body').find('style').html();
Ambrosia