views:

492

answers:

4

For the Char data-type, how do I specify that I want to use the Turkish i instead of the English i for the toLower and toUpper functions?

A: 

maybe try setting your locale? not sure

newacct
The locale has no impact on the default `Data.Char` library.
grddev
+12  A: 

The Data.Char library in Haskell is not locale dependent. It works for all Unicode characters, but perhaps not in the way you would expect. In the corresponding Unicode chart you can see the mappings for "dotted"/"dotless" i's.

  • toUpper 'i' => 'I'
  • toUpper 'ı' => 'I'
  • toLower 'I' => 'i'
  • toLower 'İ' => 'i'

Thus, it is clear that neither of the two transforms are reversible. If you want reversible handling of Turkish characters, it seems you have to use either a C-library or roll your own.

UPDATE: The Haskell 98 report makes this quite clear, whereas the Haskell 2010 report only says that Char corresponds to a Unicode character, and does not as clearly define the semantics of toLower and toUpper.

grddev
`toLower 'I'` should give a dotless `i`.
Alexandre C.
@Alexandre: I documented how Haskell work, and what the (linked) Unicode specification says. If you want other behavior, you need to implement your own (as in jrockway's reply).
grddev
+7  A: 

A Simple Matter Of Programming:

import qualified Data.Char as Char

toLower 'I' = 'ı'
toLower x   = Char.toLower x

Then

toLower <$> "I AM LOWERCASE" == "ı am lowercase"  
jrockway
Are you really telling me that I have to hack every library that calls Char.toLower in order to support internationalization?
Jonathan Allen
@Jonathan: Yes, because the Haskell specification only says to follow the Unicode standard, which provides the rules I gave above. Thus any library that uses `Char.toLower` is not prepared for internationalization.
grddev
@Jonathan Allen: If you don't want the standard Unicode behavior, then no, you can't use libraries that follow the Unicode standard. It's unfortunate, but pretty plainly so.
Chuck
I should clarify that this is not the best possible solution. It would be good to write a library that is more flexible than Data.Char, and the community would surely appreciate any contributions in that area.
jrockway
+1  A: 

You might check this post, using Text library.

sdcvvc