views:

6986

answers:

7

Is there an easy way to take a string of html in JavaScript and strip out the html?

A: 
function stripHtml(s) {
    return s.replace(/\\&/g, '&amp;').replace(/\\</g, '&lt;').replace(/\\>/g, '&gt;').replace(/\\t/g, '&nbsp;&nbsp;&nbsp;').replace(/\\n/g, '<br />');
}
hypoxide
I think you're doing the opposite of what was asked.
Laurence Gonsalves
+10  A: 
myString.replace(/<.*?>/g, '');
nickf
+32  A: 

If you're running in a browser, then the easiest way is just to let the browser do it for you...

function strip(html)
{
   var tmp = document.createElement("DIV");
   tmp.innerHTML = html;
   return tmp.textContent||tmp.innerText;
}
Shog9
+1 good answer!
nickf
nice one!
knittl
Just remember that this approach is rather inconsistent and will fail to strip certain characters in certain browsers. For example, in Prototype.js, we use this approach for performance, but work around some of the deficiencies - http://github.com/kangax/prototype/blob/a223833c8b49ae55f03b1e1a3a5b7e9fb647c139/src/lang/string.js#L476
kangax
Remember your whitespace will be messed about. I used to use this method, and then had problems as certain product codes contained double spaces, which ended up as single spaces after I got the innerText back from the DIV. Then the product codes did not match up later in the application.
Magnus Smith
@Magnus Smith: Yes, if whitespace is a concern - or really, if you have any need for this text that doesn't directly involve the specific HTML DOM you're working with - then you're better off using one of the other solutions given here. The primary advantages of this method are that it is 1) trivial, and 2) will reliably process tags, whitespace, entities, comments, etc. in *the same way as the browser you're running in*. That's frequently useful for web client code, but not necessarily appropriate for interacting with other systems where the rules are different.
Shog9
Requisite reference on this topic: http://stackoverflow.com/questions/1359469/innertext-works-in-ie-but-not-in-firefox/1359822#1359822
Crescent Fresh
I was just looking for this and this is brilliant!
aip.cd.aish
+2  A: 

Another, admittedly less elegant solution than nickf's or Shog9's, would be to recursively walk the DOM starting at the <body> tag and append each text node.

var bodyContent = document.getElementsByTagName('body')[0];
var result = appendTextNodes(bodyContent);

function appendTextNodes(element) {
    var text = '';

    // Loop through the childNodes of the passed in element
    for (var i = 0, len = element.childNodes.length; i < len; i++) {
     // Get a reference to the current child
     var node = element.childNodes[i];
     // Append the node's value if it's a text node
     if (node.nodeType == 3) {
      text += node.nodeValue;
     }
     // Recurse through the node's children, if there are any
     if (node.childNodes.length > 0) {
      appendTextNodes(node);
     }
    }
    // Return the final result
    return text;
}
Bryan
yikes. if you're going to create a DOM tree out of your string, then just use shog's way!
nickf
Yes, my solution wields a sledge-hammer where a regular hammer is more appropriate :-). And I agree that yours and Shog9's solutions are better, and basically said as much in the answer. I also failed to reflect in my response that the html is already contained in a string, rendering my answer essentially useless as regards the original question anyway. :-(
Bryan
To be fair, this has value - if you absolutely must preserve /all/ of the text, then this has at least a decent shot at capturing newlines, tabs, carriage returns, etc... Then again, nickf's solution should do the same, and do much faster... eh.
Shog9
A: 

Check out the ticked answer to this:

http://stackoverflow.com/questions/795512/how-might-one-go-about-implementing-a-forward-index-in-php

karim79
this is javascript though.
nickf
+1  A: 
Jibberboy2000
A: 

I built this JavaScript library for a Konfabulator widget that does exactly that. It completely strips out comments and <style> and <script> tags and tries to be somewhat smart about converting <br/>'s and <p/>'s into newlines as well.

http://github.com/mtrimpe/jsHtmlToText

Michiel Trimpe