views:

37

answers:

2

Regex \w doesn't match utf-8 characters in Ruby 1.9.2. Anybody faced same problem?

Example:

/[\w\s]+/u

In my rails application.rb I've added config.encoding = "utf-8"

A: 

I encountered the same problem with php, I had to add all UTF-8 special caracters to the caracters class. If anyone has a better idea I'm interested !

MatTheCat
+1  A: 

Define "doesn't match utf-8 characters"? If you expect \w to match anything other than exactly the uppercase and lowercase ASCII letters, the ASCII digits, and underscore, it won't -- Ruby has defined \w to be equivalent to [A-Za-z0-9_] regardless of Unicode. Maybe you want \p{Word} or something similar instead.

hobbs
That works in .NET. Looks like it is a bug of ruby regex implementation
Alexey Zakharov
+1 Ruby has defined `\w` to be equivalent to `[A-Za-z0-9_]` regardless of Unicode. It's so obvious, so neither I don't noticed :)
kfl62