ansaurus

Question

How to extract a single character (as a string) from a larger string in Ruby?

Answer 1

A:

'abc'[1].chr # => "b"

Thiago Arrais 2008-12-16 13:45:17

This doesn't work in Ruby 1.8 or earlier. In Ruby before 1.9, Strings are sequences of bytes, not characters. If your String has a multibyte encoding, you will index right into the middle of a multibyte character.

Jörg W Mittag 2008-12-17 11:35:51

Answer 2

A:

'abc'[1..1] # => "b"

Thiago Arrais 2008-12-16 13:46:00

This doesn't work in Ruby 1.8 or earlier. In Ruby before 1.9, Strings are sequences of bytes, not characters. If your String has a multibyte encoding, you will index right into the middle of a multibyte character.

Jörg W Mittag 2008-12-17 11:39:56

Answer 3

+6 A:

Before Ruby 1.9:

'Hello'[1].chr  # => "e"

Ruby 1.9+:

'Hello'[1]  # => "e"

A lot has changed in Ruby 1.9 including string semantics.

Robert Gamble 2008-12-16 13:51:38

Up because the notion of character changed in 1.9 and it is an important change...

Keltia 2008-12-16 13:52:48

The first one won't work. In Ruby before 1.9, Strings are sequences of bytes, not characters. If your String has a multibyte encoding, you will index right into the middle of a multibyte character.

Jörg W Mittag 2008-12-17 11:28:16

The problem is that support for multibyte strings is very spotty in Ruby 1.8, this can be seen in many other areas of the language as well, for example 'µsec'.reverse won't work well.

Robert Gamble 2008-12-17 13:30:01

Answer 4

+6 A:

Should work for Ruby before and after 1.9:

'Hello'[2,1]  # => "l"

Please see Jörg Mittag's comment: this is correct only for single-byte character sets.

Brent.Longborough 2008-12-16 14:45:39

+1 for pointing out the portable solution.

Robert Gamble 2008-12-16 15:29:16

This doesn't work in Ruby 1.8 or earlier. In Ruby before 1.9, Strings are sequences of bytes, not characters. If your String has a multibyte encoding, you will index right into the middle of a multibyte character.

Jörg W Mittag 2008-12-17 11:27:15

Answer 5

+4 A:

In Ruby 1.9, it's easy. In Ruby 1.9, Strings are encoding-aware sequences of characters, so you can just index into it and you will get a single-character string out of it:

'µsec'[0] => 'µ'

However, in Ruby 1.8, Strings are sequences of bytes and thus completely unaware of the encoding. If you index into a string and that string uses a multibyte encoding, you risk indexing right into the middle of a multibyte character (in this example, the 'µ' is encoded in UTF-8):

'µsec'[0] # => 194
'µsec'[0].chr # => Garbage
'µsec'[0,1] # => Garbage

However, Regexps and some specialized string methods support at least a small subset of popular encodings, among them some Japanese encodings (e.g. Shift-JIS) and (in this example) UTF-8:

'µsec'.split('')[0] # => 'µ'
'µsec'.split(//u)[0] # => 'µ'

Jörg W Mittag 2008-12-17 11:42:06

If you are using multibytes in Ruby 1.8 you should really be using the appropriate character handler classes (like UTF8Handler from Active Support) instead of trying to come up with silly hacks since you will have the same issues with reverse/strip/downcase/slice/etc because strings are byte arrays.

Robert Gamble 2008-12-17 18:55:18

ansaurus

tags:

views:

answers:

How to extract a single character (as a string) from a larger string in Ruby?

related questions