Is there an easy way to take a string of html in JavaScript and strip out the html?
A:
function stripHtml(s) {
return s.replace(/\\&/g, '&').replace(/\\</g, '<').replace(/\\>/g, '>').replace(/\\t/g, ' ').replace(/\\n/g, '<br />');
}
hypoxide
2009-05-04 22:41:38
I think you're doing the opposite of what was asked.
Laurence Gonsalves
2009-05-04 22:47:48
+32
A:
If you're running in a browser, then the easiest way is just to let the browser do it for you...
function strip(html)
{
var tmp = document.createElement("DIV");
tmp.innerHTML = html;
return tmp.textContent||tmp.innerText;
}
Shog9
2009-05-04 22:48:21
Just remember that this approach is rather inconsistent and will fail to strip certain characters in certain browsers. For example, in Prototype.js, we use this approach for performance, but work around some of the deficiencies - http://github.com/kangax/prototype/blob/a223833c8b49ae55f03b1e1a3a5b7e9fb647c139/src/lang/string.js#L476
kangax
2009-09-14 16:08:02
Remember your whitespace will be messed about. I used to use this method, and then had problems as certain product codes contained double spaces, which ended up as single spaces after I got the innerText back from the DIV. Then the product codes did not match up later in the application.
Magnus Smith
2009-09-17 15:03:41
@Magnus Smith: Yes, if whitespace is a concern - or really, if you have any need for this text that doesn't directly involve the specific HTML DOM you're working with - then you're better off using one of the other solutions given here. The primary advantages of this method are that it is 1) trivial, and 2) will reliably process tags, whitespace, entities, comments, etc. in *the same way as the browser you're running in*. That's frequently useful for web client code, but not necessarily appropriate for interacting with other systems where the rules are different.
Shog9
2009-09-17 21:05:03
Requisite reference on this topic: http://stackoverflow.com/questions/1359469/innertext-works-in-ie-but-not-in-firefox/1359822#1359822
Crescent Fresh
2009-12-22 13:56:27
+2
A:
Another, admittedly less elegant solution than nickf's or Shog9's, would be to recursively walk the DOM starting at the <body> tag and append each text node.
var bodyContent = document.getElementsByTagName('body')[0];
var result = appendTextNodes(bodyContent);
function appendTextNodes(element) {
var text = '';
// Loop through the childNodes of the passed in element
for (var i = 0, len = element.childNodes.length; i < len; i++) {
// Get a reference to the current child
var node = element.childNodes[i];
// Append the node's value if it's a text node
if (node.nodeType == 3) {
text += node.nodeValue;
}
// Recurse through the node's children, if there are any
if (node.childNodes.length > 0) {
appendTextNodes(node);
}
}
// Return the final result
return text;
}
Bryan
2009-05-04 23:14:30
yikes. if you're going to create a DOM tree out of your string, then just use shog's way!
nickf
2009-05-04 23:21:26
Yes, my solution wields a sledge-hammer where a regular hammer is more appropriate :-). And I agree that yours and Shog9's solutions are better, and basically said as much in the answer. I also failed to reflect in my response that the html is already contained in a string, rendering my answer essentially useless as regards the original question anyway. :-(
Bryan
2009-05-05 00:08:42
To be fair, this has value - if you absolutely must preserve /all/ of the text, then this has at least a decent shot at capturing newlines, tabs, carriage returns, etc... Then again, nickf's solution should do the same, and do much faster... eh.
Shog9
2009-05-05 04:58:56
A:
Check out the ticked answer to this:
http://stackoverflow.com/questions/795512/how-might-one-go-about-implementing-a-forward-index-in-php
karim79
2009-05-04 23:15:55
A:
I built this JavaScript library for a Konfabulator widget that does exactly that. It completely strips out comments and <style> and <script> tags and tries to be somewhat smart about converting <br/>'s and <p/>'s into newlines as well.
Michiel Trimpe
2009-09-14 13:25:15