views:

312

answers:

4

My application needs to handle some international characters, namely ä, ü, ö and ß, which are still ascii.

When I tested the behavior of ruby when dealing with these chars, I got this error:

test.rb:1: invalid multibyte char (US-ASCII)
test.rb:1: invalid multibyte char (US-ASCII)

for this code:

puts "i like my chars: ä, ü, ö and ß!"

But the strange thing is: When using the Interactive Ruby Shell, I get no error!

EDIT: In my application, I'm retrieving the data from an external api. The above code is just an example!

+10  A: 

No, those characters aren't in ASCII. ASCII doesn't have any values above Unicode U+007F (decimal 127). See the wikipedia ASCII entry for more details.

I suspect the interactive Ruby shell is taking the native encoding of your shell, rather than ASCII.

Do you have a way of specifying the encoding of your .rb file? If so, use that - or change your scripts so they genuinely are ASCII.

Jon Skeet
Oh thanks, I always thought ASCII includes 256 characters. But that doesn't solve my problem, because I'm retrieving the text from an external API...
Vincent
@Vincent: If you're retrieving the text from an external API, why is it in your Ruby script? If it's really just test, leave it in a text file and load it from your Ruby script instead of putting it directly into the script.
Jon Skeet
+1  A: 

Those are not ASCII characters… They just happen to still be encoded in one byte in some legacy, ASCII-derived character sets. Most likely what is happening is that your source file is being saved as UTF-8 because it contains non-ASCII characters, and ruby is correctly handling this.

You're only getting away with it at the interactive prompt because your terminal is using some legacy character encoding.

Kaelin Colclasure
A: 

To escape the characters for the ASCII encoding, use Unicode escape sequences:

puts "i like my chars: \u00E4, \u00FC, \u00F6 and \u00DF!"

Ruby 1.9, anyway - I can't remember if this works in 1.8.

McDowell
+1  A: 

Put the magic comment "# coding: utf-8" at the beginning your your script (on the second line if you"re using sheband)

#!/usr/local/bin/ruby

# coding: utf-8

puts "i like my chars: ä, ü, ö and ß!"

Alex