views:

639

answers:

3

I'm using javascript to set the value of an input with text that may contain html specific chars such a &   etc. So, I'm trying to find one regex that will match these values and replace them with the appropriate value ("&", " ") respectively, only I can't figure out the regex to do it. Here's what I'm trying to do. Make an object the contains the matches and reference to the replacement value:

var specialChars = {
  " " : " ",
  "&"  : "&",
  ">"   : ">",
  "&lt;"   : "<"
}

Then, I want to match my string

var stringToMatch = "This string has special chars &amp; and &nbsp;"

I tried something like

stringToMatch.replace(/(&nbsp;|&)/g,specialChars["$1"});

but it doesn't work. I don't really understand how to capture the special tag and replace it. Any help is greatly appreciated.

A: 

You can use a function based replacement to do what you want to do.

var myString = '&'+'nbsp;&'+'nbsp;&tab;&copy;';
myString.replace(/&\w+?;/g, function( e ) {
 switch(e) {
  case '&nbsp;': 
   return ' ';
  case '&tab;': 
   return '\t';
  case '&copy;': 
   return String.fromCharCode(169);
  default: 
   return '&'+e+';';
 }
});

However, I urge you to consider your situation. If you're receiving &nbsp; and &copy; and other HTML entities in your text values, do do you really want to remove them? Should you be converting them afterword?

Just something to keep in mind. Cheers!

coderjoe
+5  A: 

I think you can use the functions from a question on a slightly different subject (http://stackoverflow.com/questions/286921).

Jason Bunting's answer has some nice ideas + the necessary explanation, here is his solution with some modifications to get you started (if you find this helpful, upvote his original answer as well, as this is his code, essentially).

var replaceHtmlEntites = (function() {
  var translate_re = /&(nbsp|amp|quot|lt|gt);/g;
  var translate = {
    "nbsp": " ", 
    "amp" : "&", 
    "quot": "\"",
    "lt"  : "<", 
    "gt"  : ">"
  };
  return function(s) {
    return ( s.replace(translate_re, function(match, entity) { 
      return translate[entity]; 
    }) );
  }
})();

callable as

var stringToMatch = "This string has special chars &amp; and &amp;nbsp;";
var stringOutput  = replaceHtmlEntites(stringToMatch);

Numbered entites are even easier, you can replace them much more generically using a little math and String.fromCharCode().

Tomalak
Tomalak
brad
The answer has been modified a little bit to accommodate for this. I guess you've tried with the original code. The above works for me, I've just tried it out (again).
Tomalak
Really? No I copied your code and ran it through the debugger, the value for me being passed in (s) was the whole  . Very odd. I'm using safari and I tested in Firefox. I'll try a few other browsers too. ANyway thanks again
brad
Sorry, I just noticed the changes. The extra entity attr in the return function. Thx again!!
brad
Right, that's it. :)
Tomalak
+1  A: 

Another way would be creating a div object

var tmp = document.createElement("div");

Then assigning the text to its innerHTML

tmp.innerHTML = mySpecialString;

And finally reading the element's text content

var output = tmp.textContent || tmp.innerText //for IE compatibility

And there you go...

BYK
I'm using the text to set a value of an input (w/ jquery) so $(input).val(someText) it's the someText that needs the replacement
brad
Okay I got the point. When you do what I have suggested, all the values are converted by the HTML engine of the browser since the "textContent" or "innerText" property contains the "resultant text".
BYK