views:

43

answers:

2

In my Rails (2.3, Ruby 1.8.7) application, I need to truncate a string to a certain length. the string is unicode, and when running tests in console, such as 'א'.length, I realized that a double length is returned. I would like an encoding-agnostic length, so that the same truncation would be done for a unicode string or a latin1 encoded string.

I've gone over most of the unicode material for Ruby, but am still a little in the dark. How should this problem be tackled?

A: 

You can use something like str.chars.slice(0, 50).join to get the first 50 characters of a string, no matter how many bytes it uses per character.

Chris Heald
In console, I get `undefined method `slice' for #<Enumerable::Enumerator:0xb67a0ed4>`
shmichael
+1  A: 

Rails has an mb_chars method which returns multibyte characters. Try unicode_string.mb_chars.slice(0,50)

Teoulas
Here's what I ultimately used: http://pastie.org/1129327
shmichael