tags:

views:

807

answers:

5

What is the Ruby idiomatic way for retrieving a single character from a string as a one-character string? There is the str[n] method of course, but (as of Ruby 1.8) it returns a character code as a fixnum, not a string. How do you get to a single-character string?

A: 
'abc'[1].chr # => "b"
Thiago Arrais
This doesn't work in Ruby 1.8 or earlier. In Ruby before 1.9, Strings are sequences of bytes, not characters. If your String has a multibyte encoding, you will index right into the middle of a multibyte character.
Jörg W Mittag
A: 
'abc'[1..1] # => "b"
Thiago Arrais
This doesn't work in Ruby 1.8 or earlier. In Ruby before 1.9, Strings are sequences of bytes, not characters. If your String has a multibyte encoding, you will index right into the middle of a multibyte character.
Jörg W Mittag
+6  A: 

Before Ruby 1.9:

'Hello'[1].chr  # => "e"

Ruby 1.9+:

'Hello'[1]  # => "e"

A lot has changed in Ruby 1.9 including string semantics.

Robert Gamble
Up because the notion of character changed in 1.9 and it is an important change...
Keltia
The first one won't work. In Ruby before 1.9, Strings are sequences of bytes, not characters. If your String has a multibyte encoding, you will index right into the middle of a multibyte character.
Jörg W Mittag
The problem is that support for multibyte strings is very spotty in Ruby 1.8, this can be seen in many other areas of the language as well, for example 'µsec'.reverse won't work well.
Robert Gamble
+6  A: 

Should work for Ruby before and after 1.9:

'Hello'[2,1]  # => "l"

Please see Jörg Mittag's comment: this is correct only for single-byte character sets.

Brent.Longborough
+1 for pointing out the portable solution.
Robert Gamble
This doesn't work in Ruby 1.8 or earlier. In Ruby before 1.9, Strings are sequences of bytes, not characters. If your String has a multibyte encoding, you will index right into the middle of a multibyte character.
Jörg W Mittag
+4  A: 

In Ruby 1.9, it's easy. In Ruby 1.9, Strings are encoding-aware sequences of characters, so you can just index into it and you will get a single-character string out of it:

'µsec'[0] => 'µ'

However, in Ruby 1.8, Strings are sequences of bytes and thus completely unaware of the encoding. If you index into a string and that string uses a multibyte encoding, you risk indexing right into the middle of a multibyte character (in this example, the 'µ' is encoded in UTF-8):

'µsec'[0] # => 194
'µsec'[0].chr # => Garbage
'µsec'[0,1] # => Garbage

However, Regexps and some specialized string methods support at least a small subset of popular encodings, among them some Japanese encodings (e.g. Shift-JIS) and (in this example) UTF-8:

'µsec'.split('')[0] # => 'µ'
'µsec'.split(//u)[0] # => 'µ'
Jörg W Mittag
If you are using multibytes in Ruby 1.8 you should really be using the appropriate character handler classes (like UTF8Handler from Active Support) instead of trying to come up with silly hacks since you will have the same issues with reverse/strip/downcase/slice/etc because strings are byte arrays.
Robert Gamble