tags:

views:

53

answers:

2

How can I parse a string of fullwidth unicode integer characters to an integer in ruby?

Attempting the obvious results in;

irb(main):011:0> a = "\uff11"
=> "1"
irb(main):012:0> Integer(a)
ArgumentError: invalid value for Integer: "\xEF\xBC\x91"
      from (irb):12:in `Integer'
      from (irb):12
      from /export/home/henry/apps/bin/irb:12:in `<main>'
irb(main):013:0> a.to_i
=> 0

The equivalent in python gives;

>>> a = u"\uff11"
>>> print a
1
>>> int(a)
1
+2  A: 

Ruby 1.9's numeric parsing is thinking in ascii only. I don't think there's any convenient elegant parsing methods that properly handle fullwidth unicode numeric codepoints.

A quick filthy hack function:

def parse_utf(utf_integer_string)
  ascii_numeric_chars = "0123456789"
  utf_numeric_chars = "\uff10\uff11\uff12\uff13\uff14\uff15\uff16\uff17\uff18\uff19"
  utf_integer_string.tr(utf_numeric_chars, ascii_numeric_chars).to_i
end

Pass in a string of fullwidth numeric characters and get out an integer.

animal
That does the trick; thanks very much.
henryl
If you're doing it all over the place it might be worth your while to monkeypatch to_i on String to do the translation throughout your application.
animal
A: 

Convert ‘compatibility’ characters like the fullwidths to their normalized versions (plain ASCII numbers in this case) before parsing as integer. For example, using Unicode::normalize_KC or UnicodeUtils::nfkc.

bobince