ansaurus

Question

Answer 1

A:

function stripHtml(s) {
    return s.replace(/\\&/g, '&amp;').replace(/\\</g, '&lt;').replace(/\\>/g, '&gt;').replace(/\\t/g, '&nbsp;&nbsp;&nbsp;').replace(/\\n/g, '<br />');
}

hypoxide 2009-05-04 22:41:38

I think you're doing the opposite of what was asked.

Laurence Gonsalves 2009-05-04 22:47:48

Answer 2

+10 A:

myString.replace(/<.*?>/g, '');

nickf 2009-05-04 22:42:52

Answer 3

+32 A:

If you're running in a browser, then the easiest way is just to let the browser do it for you...

function strip(html)
{
   var tmp = document.createElement("DIV");
   tmp.innerHTML = html;
   return tmp.textContent||tmp.innerText;
}

Shog9 2009-05-04 22:48:21

+1 good answer!

nickf 2009-05-04 22:50:31

nice one!

knittl 2009-09-14 13:32:25

Just remember that this approach is rather inconsistent and will fail to strip certain characters in certain browsers. For example, in Prototype.js, we use this approach for performance, but work around some of the deficiencies - http://github.com/kangax/prototype/blob/a223833c8b49ae55f03b1e1a3a5b7e9fb647c139/src/lang/string.js#L476

kangax 2009-09-14 16:08:02

Remember your whitespace will be messed about. I used to use this method, and then had problems as certain product codes contained double spaces, which ended up as single spaces after I got the innerText back from the DIV. Then the product codes did not match up later in the application.

Magnus Smith 2009-09-17 15:03:41

@Magnus Smith: Yes, if whitespace is a concern - or really, if you have any need for this text that doesn't directly involve the specific HTML DOM you're working with - then you're better off using one of the other solutions given here. The primary advantages of this method are that it is 1) trivial, and 2) will reliably process tags, whitespace, entities, comments, etc. in *the same way as the browser you're running in*. That's frequently useful for web client code, but not necessarily appropriate for interacting with other systems where the rules are different.

Shog9 2009-09-17 21:05:03

Requisite reference on this topic: http://stackoverflow.com/questions/1359469/innertext-works-in-ie-but-not-in-firefox/1359822#1359822

Crescent Fresh 2009-12-22 13:56:27

I was just looking for this and this is brilliant!

aip.cd.aish 2010-10-07 19:02:26

Answer 4

+2 A:

Another, admittedly less elegant solution than nickf's or Shog9's, would be to recursively walk the DOM starting at the <body> tag and append each text node.

var bodyContent = document.getElementsByTagName('body')[0];
var result = appendTextNodes(bodyContent);

function appendTextNodes(element) {
    var text = '';

    // Loop through the childNodes of the passed in element
    for (var i = 0, len = element.childNodes.length; i < len; i++) {
     // Get a reference to the current child
     var node = element.childNodes[i];
     // Append the node's value if it's a text node
     if (node.nodeType == 3) {
      text += node.nodeValue;
     }
     // Recurse through the node's children, if there are any
     if (node.childNodes.length > 0) {
      appendTextNodes(node);
     }
    }
    // Return the final result
    return text;
}

Bryan 2009-05-04 23:14:30

yikes. if you're going to create a DOM tree out of your string, then just use shog's way!

nickf 2009-05-04 23:21:26

Yes, my solution wields a sledge-hammer where a regular hammer is more appropriate :-). And I agree that yours and Shog9's solutions are better, and basically said as much in the answer. I also failed to reflect in my response that the html is already contained in a string, rendering my answer essentially useless as regards the original question anyway. :-(

Bryan 2009-05-05 00:08:42

To be fair, this has value - if you absolutely must preserve /all/ of the text, then this has at least a decent shot at capturing newlines, tabs, carriage returns, etc... Then again, nickf's solution should do the same, and do much faster... eh.

Shog9 2009-05-05 04:58:56

Answer 5

A:

Check out the ticked answer to this:

http://stackoverflow.com/questions/795512/how-might-one-go-about-implementing-a-forward-index-in-php

karim79 2009-05-04 23:15:55

this is javascript though.

nickf 2009-05-04 23:23:40

Answer 6

+1 A:

Jibberboy2000 2009-08-06 08:30:22

Answer 7

A:

I built this JavaScript library for a Konfabulator widget that does exactly that. It completely strips out comments and <style> and <script> tags and tries to be somewhat smart about converting <br/>'s and <p/>'s into newlines as well.

http://github.com/mtrimpe/jsHtmlToText

Michiel Trimpe 2009-09-14 13:25:15

ansaurus

tags:

views:

answers:

Strip HTML from Text JavaScript

related questions