views:

239

answers:

1

If a save a text file with the following character б U+0431, but save it as an ANSI code page file.

Ruby returns ord = 63. Saving the file with UTF-8 as the codepage returns ord = 208, 177

Should I be specifically telling Ruby to handle the input encoded with a certain code page? If so, how do you do this?

+2  A: 

Is that in ruby source code or in a file which is read with File.open? If it's in the ruby source code, you can (in ruby 1.9) add this to the top of the file:

# encoding: utf-8

Or you could specify most other encodings (like iso-8859-1).

If you are reading a file with File.open, you could do something like this:

File.open("file.txt", "r:utf-8") {|f| ... }

As with the encoding comment, you can pass in different types of encodings here too.

dvyjones
I am using File.open and I am using Ruby 1.8. The "r:utf-8" doesn't seem to work. Is that a 1.9 feature?
Maulin
Ah, yes, that is 1.9 specific. 1.8 isn't that good when it comes to other encodings. However, you could try either calling ruby with the `-K U` options, or set the `$KCODE` variable to `"U"` and then `require 'jcode'`, which sets the string encoding to UTF-8. I'm not sure if this works though, and I'll recommend you to use Ruby 1.9 if you can.
dvyjones