ansaurus

Question

Regex in javascript workin with Cyrillic (Russian) set

Answer 1

+3 A:

It should work if you just save the JavaScript file in UTF8. Then you should be able to enter any character in a string.

edit: Just made a quick example with some cryllic characters from Wikipedia:

var cryllic = 'абвгдеёжзийклмнопрстуфхцчшщъыьэюяабвгдеёжзийклмнопрстуфхцчшщъыьэюя';
cryllic.match( 'л.+а' )[0];
// returns as expected: "лмнопрстуфхцчшщъыьэюяа"

poke 2009-12-31 16:51:57

but if I try this: var str1 = "абв"; var regexp = new RegExp("[бв]\b", "g"); alert(str1.replace(regexp, "е")); it doesn't work.

samuel 2009-12-31 17:18:45

Is your file 100% UTF-8 encoded? Can you try with a single character?

Pekka 2009-12-31 18:01:26

It seems that the word boundary `\b` is not working correctly. If I remove it, it works correctly, so try replacing it by `[ ]` or something like that.

poke 2009-12-31 18:08:44

I need to know the right " something like that". this is a simple example I need more complex regex patterns to work with, and need to use those tags.

samuel 2009-12-31 18:46:16

Answer 2

+1 A:

According to this:

JavaScript, which does not offer any Unicode support through its RegExp class, does support \uFFFF for matching a single Unicode code point as part of its string syntax.

so you can at least use code points, but seemingly nothing more (no classes).

Also check out this duplicate of your question.

Pekka 2009-12-31 16:53:49

That site is incorrect. JavaScript supports Unicode in regexps.

Eli Grey 2009-12-31 17:54:59

I can't find any reference on more than comparing against single code points as I quoted above, see e.g. http://www.w3schools.com/jsref/jsref_obj_regexp.asp Do you have a source?

Pekka 2009-12-31 18:03:38

Answer 3

+1 A:

Here is a good article on JavaScript regular expressions and unicode. Strings in JavaScript are 16 bit, so strings and RegExp objects can contain unicode characters, but most of the special characters like '\b', '\d', '\w' only support ascii. So your regular expression does not work as expected due to the use of '\b'. It seems you'll have to find a different way to detect word boundaries.

Annie 2009-12-31 18:23:56

ansaurus

tags:

views:

answers:

Regex in javascript workin with Cyrillic (Russian) set

related questions