views:

644

answers:

2

I need a regex that also matches Chinese, Greek, Russian, ... letters. What I basically want to do is remove punctuation and numbers.

Until now I removed punctuation and numbers "manually" but that does not seem to be very consistent.

Another thing I have tried is

/[\p{L}]/

but that is not supported by Mozilla (I use this in a Firefox extension).

+1  A: 

You can find a lot complains about the current ECMA specs on regular expressions not dealing with unicode characters the way they should. E.g. a blog entry by Scott Hanselman that links back to a SO question ;-)
There's no "real" solution to this problem yet, but take a look at the answers of http://stackoverflow.com/questions/280712/javascript-unicode (your question is more or less a duplicate of this) (edit: I take that back, the unicode plugin Jonathan Lonowski suggests look pretty nice)

VolkerK
+2  A: 

Have you given XRegExp and the Unicode plugin a try/look?

<script src="xregexp.js"></script>
<script src="xregexp-unicode.js"></script>
<script>
    var unicodeWord = XRegExp("^\\p{L}+$");
    alert(unicodeWord.test("Ниндзя")); // -> true
</script>
Jonathan Lonowski
Thanks, that's exactly what I was looking for. Though, I don't really want to include a 8kb library that I only use once in my extension. The unicode ranges in the Unicode plugin are very helpful and I think I will use those to write something myself.
slosd