tags:

views:

48

answers:

3

Hello I have UTF file in uppercase and I want to change all words to lowercase.

I have tried:

`$ tr '[:upper:]' '[:lower:]' < input.txt > output.txt`

but that changes only cheracter without accent.

Thanks

+1  A: 

This is because the default character classes only work on standard ASCII, which does not include most of the international accented characters. If you have a defined set of those characters, the easiest way would be to simply add the mapping from special uppercase character to special lowercase character manually:

tr 'ÄÖU[:upper:]' 'äöü[:lower:]'

If you only have a few accented characters, this is workable.

JeSuisse
A: 

No, the issue is that tr is not Unicode aware.

$ grep -o '[[:upper:]]' <<< JalapeÑo
J
Ñ
$ tr '[:upper:]' '[:lower:]' <<< JalapeÑo
jalapeÑo

The reason to use [:upper:], etc., is in order to handle characters outside ASCII. Otherwise, you could just use [A-Z] and [a-z]. That's also why PCRE has a character class called [:ascii:]]:

$ perl -pe 's/[[:ascii:]]//g' <<< jalapeño
ñ
Dennis Williamson
You're right! But using character classes never worked for me up to now, neither in unicode nor in latin1, so I gave up on it a long time ago and always do it manually :-(
JeSuisse
A: 

Finally the simplest way I found is to use awk:

awk '{print tolower($0)}' < input.txt > output.txt
liborw