tags:

views:

48

answers:

1

Hi,

I've a free text field on my form where the users can type in anything. Some users are pasting text into this field from Word documents with some weird characters that I don't want to go in my DB. (e.g. webding font characters) I'm trying to get a regular expression that would give me only the alphanum and the punctuation characters. But when I try the following, the output is still all the characters. How can I leave them out?

<html><body><script type="text/javascript">var str="";document.write(str.replace(/[^a-zA-Z 0-9 [:punct]]+/g, " "));</script></body></html>
+1  A: 

If you want only ascii, use /[^ -~]+/ as regex. The problem is your [:punct:] statement. Perhaps javascript does not support [:punct:]?

ZyX
Or perhaps the problem was that it is written as `[:punct]` rather than `[:punct:]`?
Senseful
`[:punct:]` is an example of a POSIX character class, and no, JavaScript doesn't support them.
Alan Moore
@eagle I tested with the correct spelling. And, as said @Alan Moore, I was right.
ZyX
@ZyX - The one you suggested doesn't work either. I'm testing my regex at "http://www.w3schools.com/jsref/tryit.asp?filename=tryjsref_replace3". Do you think that might be the cause? I modified my webapp as well, but it takes a while to build and run so I'm testing it at the above URL. So... so far both [:punct] and [:punct:] are defunct for me! :)Any other suggestions?
DS
So far my expression stands at this - accounts for a-z (both case), 0-9, spanish accented characters and some special characters. And that's all I need. I want the rest to fall off. But when I pass the characters from the original question post above, they still pass through. Somebody help!/[^a-zA-Z 0-9 áÁéÉíÍóÓúÚñÑ`~!@#$%^"':\/?.>,<]+/g
DS