views:

74

answers:

2

How to determine if a character is a chinese character use ruby?

+2  A: 

Ruby 1.9

#encoding: utf-8   
 "漢"=~ /\p{Han}/
steenslag
I use this code,but it's can't work。This is error info:invalid character property name {Han}: /\p{Han}/
HelloWorld
@HelloWorld: Update your version of Ruby. All characters classes are documented now: http://github.com/ruby/ruby/blob/trunk/doc/re.rdoc (cool nick, BTW)
Marc-André Lafortune
A: 

An interesting article on encodings in Ruby: http://blog.grayproductions.net/articles/bytes_and_characters_in_ruby_18 (it's part of a series - check the table of contents at the start of the article also)

I haven't used chinese characters before but this seems to be the list supported by unicode: en.wikipedia.org/wiki/List_of_CJK_Unified_Ideographs . Also take note that it's a unified system including Japanese and Korean characters (some characters are shared between them) - not sure if you can distinguish which are Chinese only.

I think you can check if it's a CJK character by calling this on string str and character with index n:

def check_char(str, n)
  list_of_chars = str.unpack("U*")
  char = list_of_chars[n]
  #main blocks
  if char >= 0x4E00 && char <= 0x9FFF
    return true
  end
  #extended block A
  if char >= 0x3400 && char <= 0x4DBF
    return true
  end
  #extended block B
  if char >= 0x20000 && char <= 0x2A6DF
    return true
  end
  #extended block C
  if char >= 0x2A700 && char <= 0x2B73F
    return true
  end
  return false
end
Andrei Fierbinteanu
thank you very much
HelloWorld