views:

1388

answers:

5

I am processing xhtml using javascript. I am getting the text content for a div node by concatenating the nodeValue of all child nodes where nodeType == Node.TEXT_NODE.

The resulting string sometimes contains a non-breaking space entity. How do I replace this with a regular space character?

My div looks like this...

<div><b>Expires On</b> Sep 30, 2009 06:30&nbsp;AM</div>

The following suggestions found on the web did not work:

var cleanText = text.replace(/^\xa0*([^\xa0]*)\xa0*$/g,"");


var cleanText = replaceHtmlEntities(text);

var replaceHtmlEntites = (function() {
  var translate_re = /&(nbsp|amp|quot|lt|gt);/g;
  var translate = {
    "nbsp": " ",
    "amp" : "&",
    "quot": "\"",
    "lt"  : "<",
    "gt"  : ">"
  };
  return function(s) {
    return ( s.replace(translate_re, function(match, entity) {
      return translate[entity];
    }) );
  }
})();

Any suggestions?

+1  A: 

I think when you define a function with "var foo = function() {...};", the function is only defined after that line. In other words, try this:

var replaceHtmlEntites = (function() {
  var translate_re = /&(nbsp|amp|quot|lt|gt);/g;
  var translate = {
    "nbsp": " ",
    "amp" : "&",
    "quot": "\"",
    "lt"  : "<",
    "gt"  : ">"
  };
  return function(s) {
    return ( s.replace(translate_re, function(match, entity) {
      return translate[entity];
    }) );
  }
})();

var cleanText = text.replace(/^\xa0*([^\xa0]*)\xa0*$/g,"");
cleanText = replaceHtmlEntities(text);

Edit: Also, only use "var" the first time you declare a variable (you're using it twice on the cleanText variable).

Edit 2: The problem is the spelling of the function name. You have "var replaceHtmlEntites =". It should be "var replaceHtmlEntities ="

Kip
Yes, in my script I have the function before the place where I use it. Just forgot to make it that way when I posted here. But it did not work.
A: 

If you only need to replace &nbsp; then you can use a far simpler regex:

var textWithNBSpaceReplaced = originalText.replace(/&nbsp;/g, ' ');

Also, there is a typo in your div example, it says &nnbsp; instead of &nbsp;.

bobbymcr
How does that interact with   character strings in CDATA blocks (since this is XHTML)?
cletus
It doesn't really cover that case. If there is a need to go that far, a regex is probably the wrong solution.
bobbymcr
i put the typo in my post - Stack Overflow was converting the entity into an actual space in the post preview if I used  
When I inspect the variable in Firebug, I do not see the   - the string looks like a valid date. Pasting the value in a hex editor using UTF8 encoding revealed that the nbsp has been replaced with a 2 byte unicode char \uC2A0
A: 

This is much easier than you're making it. The text node will not have the literal string "&nbsp;" in it, it'll have have the corresponding character with code 160.

function replaceNbsps(str) {
  var re = new RegExp(String.fromCharCode(160), "g");
  return str.replace(re, " ");
}

textNode.nodeValue = replaceNbsps(textNode.nodeValue);
Tim Down
thanks tim. this worked and proved to be easier than I was making it :)
A: 

That first line is pretty messed up. It only needs to be:

var cleanText = text.replace(/\xA0/g,' ');

That should be all you need.

brianary
Thanks - this worked as well!
A: 

i used this, and it worked:

var cleanText = text.replace(/&amp;nbsp;/g,"");
mohamida