views:

27

answers:

1

I need to convert strings pasted into a text area so that, if they were pasted from MS Word, the weird quotation marks and apostrophes that Word likes to use will get converted to regular single and double quotes. Unfortunately my text editor seems to already convert such quotes when I save, so any regular expression I make seems to get messed up. So something like this

string = string.replace(new RegExp("“", "g"), '"').replace(new RegExp("”", "g"), '"').replace(new RegExp("’", "g"), "'");

doesn't seem to work. (and I don't even know if it will post correctly here)

How do I construct the regular expression to find these quotation marks using all "regular" characters? Presumably an escape sequence? I prefer avoid the RegExp "literal" notation, even if creating objects is slower.

A: 

Try getting the ASCII code of the characters you want removed and use it instead of the character itself:

string = string.replace(new RegExp("\xe2\x80\x9c", "g"), '"').replace(new RegExp("\xe2\x80\x9d", "g"), '"').replace(new RegExp("\xe2\x80\x99", "g"), "'");

(I used Notepad++ with the HexEdit plugin to get the code)

Edit:
See at http://www.regular-expressions.info/unicode.html for more details.
Your code should be (hope I didn't mix the codes...):

string = string.replace(new RegExp("\u201c", "g"), '"').replace(new RegExp("\u201d", "g"), '"').replace(new RegExp("\u2019", "g"), "'");
Dror
That didn't seem to work. If I check using charCodeAt(), I get 8217, 8220, and 8221 in decimal, which is 2019 201C 201D in hex. Not sure how to form regex's out of that.
rob