tags:

views:

159

answers:

2

How to make locale aware uppercase operation?

Standard tr '[:lower:]' '[:upper:]' trick doesn't work:

=$ echo żółw | tr "[:lower:]" "[:upper:]"
żółW

(it should be ŻÓŁW)

I'd rather avoid having to run Perl or anything such heavy-weight.

+2  A: 

It seems to me that if you want to avoid a heavyweight you'd need to provide the characters yourself, and, given that's a nonstarter for a truly generic solution, you are stuck with using a heavyweight.

One heavyweight solution

vinko@parrot:~$ more t.py
# coding: UTF-8
print unicode.upper("żółw".decode('utf-8'))
vinko@parrot:~$ python t.py
ŻÓŁW

Non-heavyweight solution that would require you to specify each character

vinko@parrot:~$ echo żół | tr "[żół]" "[ŻÓŁ]"
ŻÓŁ

EDIT: Based on the other answer and comments, BASH 4.0 IS locale sensitive and aware of wide chars. You have to set a proper locale of course (LC_CTYPE or LC_ALL) so BASH can tell what it is supposed to do. Also it seems that there are locale sensitive versions of tr as of late (for example, Mac OSX 10.6)

Vinko Vrsalovic
Well - python is in the range of things I'd like to avoid. Too heavy. As for specifying each character - while it is simple for "żółw", it becomes practically impossible to make it generally working.
depesz
Yes, that's what I said in the first paragraph. As far as I know and as far as I could find, there is no unicode aware version of tr, nor BASH is aware of wide characters.
Vinko Vrsalovic
Actually, on OS X and probably FreeBSD, tr *is* locale-aware and the OP's example works.
Ned Deily
@Ned which versions? Are those Apple's and FreeBSD's mods or upstream?
Vinko Vrsalovic
The manpage for *tr* in Mac OS X 10.6 (http://developer.apple.com/mac/library/documentation/Darwin/Reference/ManPages/man1/tr.1.html) certainly says it is locale sensitive, but *tr* is not locale sensitive in back in 10.4.
Chris Johnsen
+2  A: 

If you bash is new enough, it may be able to do it (see bash-4.0 NEWS item 1.hh.).

bash -c 'foo="żółw"; echo ${foo^^}'

zsh, too (since 4.3.3?):

zsh -c 'foo="żółw"; echo ${(U)foo}'
Chris Johnsen
doesn't work for me on bash 4.0.35 :(
depesz
Mine is 4.0.35, also. What are your locale settings? You probably need to at least set LC_CTYPE to a UTF-8 variation (see `locale -a | fgrep UTF-8`). If `locale` reports that everything is just `C`, then it will not work. My LC_CTYPE is `en_US.UTF-8`.
Chris Johnsen