tags:

views:

134

answers:

3

PHP has a lot of trouble with multibyte strings (non-ASCII characters). The entire language was built assuming that each character is a byte. To solve this they invented the mb_strings functions which you can use instead of the standard functions (which work fine).

strlen($str);
mb_strlen($str); // correct

However, this is really a pain since you have to verify that the code you download/find online uses these functions or enable the mb_string_overload which then might break some code that actually needs char = byte calculations.

Does Ruby share this problem?

+1  A: 
irb(main):002:0> 'ÿ'.length
=> 2
TML
+3  A: 

It shares the problem. It's covered here at SO. You can use ActiveSupport::Multibyte for mb_chars support.

>> s =  "Iñtërnâtiônàlizætiøn"
=> "Iñtërnâtiônàlizætiøn"
>> puts s[0..3]
Iñt
=> nil
>> puts s.mb_chars[0..3]
Iñtë
=> nil
>> puts s.mb_chars.size
20
=> nil
>> puts s.size
27
=> nil
Chandra Patni
+1  A: 

I think Ruby 1.9 clears up this underlaying assumption

RyanWilcox
How so? Can you expand on that thought?
Xeoncross
http://blog.grayproductions.net/articles/ruby_19s_string documents how strings are different in Ruby 1.9
RyanWilcox