views:

47

answers:

1

i am currently having trouble with results from the amazon api.

the service returns a string with unicode characters: Learn Objective\xE2\x80\x93C on the Mac (Learn Series)

with ruby 1.9.1 the string could not even been processed:

REXML::ParseException: #<Encoding::CompatibilityError: incompatible encoding regexp match (UTF-8 regexp with ASCII-8BIT string)>

...

Exception parsing

Line: 1

Position: 1636

Last 80 unconsumed characters:

Learn Objective–C on the Mac (Learn Series)
+1  A: 

As the exception points, your string is ASCII-8BIT encoded. You should change the encoding. There is a long story about that, but if you are interested in quick solution, just force_encoding on the string before you do any processing:

s = "Learn Objective\xE2\x80\x93C on the Mac"
# => "Learn Objective\xE2\x80\x93C on the Mac"
s.encoding
# => #<Encoding:ASCII-8BIT>
s.force_encoding 'utf-8'
# => "Learn Objective–C on the Mac"
Mladen Jablanović
is this an issue of the response that is sent from the amazon-service? should it have set another content-type?
phoet
I didn't work with AWS so I don't know how that string has been loaded, but you can set the default encoding on (Ruby) application level, so chances are that it would solve the issue - more on the link in the answer. BTW, I don't think there is an _issue_ at all, Ruby simply doesn't (and shouldn't) try to guess which encoding the string it is receiving is in.
Mladen Jablanović
i used HTTParty and i thought that this should inferr the encoding from the content-type...
phoet
Probably; that would mean HTTParty should take care of it.
Mladen Jablanović