views:

60

answers:

2

I'm having a real trouble to get accents right, and I believe this may happen to most Latin languages, in my case, portuguese

I have a string that come as parameter and I must get the first letter and upcase it! That should be trivial in ruby, but here is the catch:

s1 = 'alow'; s1.size #=> 4
s2 = 'álow'; s2.size #=> 5

s1[0,1] #=> "a"
s2[0,1] #=> "\303"

s1[0,1].upcase #=> 'A'
s2[0,1].upcase #=> '\303' !!!

s1[0,1].upcase + s1[1,100] #=> "Alow" OK
s2[0,1].upcase + s2[1,100] #=> "álow" NOT OK

I'd like to make it generic, Any help?

[EDIT]
I found that Rails Strings can be cast to Multibytes as seen in class ../active_support/core_ext/string/multibyte.rb, just using:

s2.mb_chars[0,1].upcase.to_s #=> "Á"

Still, @nsdk approach is easier to use =)

+4  A: 

The character á is not represented a single byte in UTF-8 strings, but the two-byte sequence C3, A1. So when you slice [0,1] from the string you get only the first byte of it, \C3 or \303, which you can't meaningfully upcase.

Ruby 1.8 doesn't do Unicode, so you're going to be fighting against byte strings and having lots of problems like this trying to write internationally-aware applications in it. See eg. this question for some background. Ruby 1.9 finally fixes this (although I dislike its approach).

bobince
+1  A: 

s1.sub /^(.)/ do |char| char.upcase end

nsdk
That works perfectly, thanks a lot!
Fabiano PS