views:

181

answers:

4

I'm going out of my mind trying to simply output UTF-8-encoded data to the console.

I've managed to accomplish this using String, but now I'd like to do the same with ByteString. Is there a nice and fast way to do this?

This is what I've got so far, and it's not working:

import Prelude hiding (putStr)
import Data.ByteString.Char8 (putStr, pack)

main :: IO ()
main = putStr $ pack "čušpajž日本語"

It prints out uapaj~�,�, ugh.

I'd like an answer for the newest GHC 6.12.1 best, although I'd like to hear answers for previous versions as well.

Thanks!

Update: Simply reading and outputting the same UTF-8-encoded line of text seems to work correctly. (Using Data.ByteString.Char8, I just do a putStr =<< getLine.) But packed values from inside the .hs file, as in the above example, refuse to output properly... I must be doing something wrong?

+1  A: 

I investigated this a while ago, and found this survey of Haskell unicode support libraries. It looks like iconv may be what you want.

Aidan Cully
That's out of date: GHC now encodes/decodes to the current locale by default. See http://www.haskell.org/pipermail/cvs-libraries/2009-June/010890.html and http://ghcmutterings.wordpress.com/2009/09/30/heads-up-what-you-need-to-know-about-unicode-io-in-ghc-6-12-1/ .
ephemient
Yes, that would most probably work with earlier versions. In fact, I've managed to print out UTF-8 Strings using the utf8-string library mentioned on the very same page. I'm more interested in the newest version, however, and have updated the question accordingly.I'm more interested in the newest GHC exactly because people seem to boast it handles UTF-8 "properly", finally. :)
liszt
A: 

This is a known ghc bug, marked "wontfix".

Justin Smith
Noooooooo. :( But, I'm puzzled... it seems to work fine with regular Strings?
liszt
Whatever this is, it's fixed now. Executing the example given on your linked page works as expected. The difference is that I'm trying to output UTF-8-encoded ByteStrings, and not UTF-8-encoded Strings, which is supposed to be more efficient. Keep in mind I'm currently using GHC 6.12.1, although I know the problem doesn't exist in GHC 6.10.4 either.
liszt
No, that's not actually the problem. GHC 6.12 does utf8 *String* IO, if the locale is set to that. Which in fact solves the above bug, which isn't the problem the OP is asking about.
Don Stewart
+5  A: 

bytestrings are strings of bytes. When they're output, they will be truncated to 8 bits, as it describes in the documentation. You'll need to explicitly convert them to utf8 - via the utf8-string package on Hackage, which contains support for bytestrings.

Don Stewart
Doesn't utf8-string work only with Strings, and not ByteStrings?
liszt
No, it also works with bytestrings. See http://stackoverflow.com/questions/2086842/using-haskell-to-output-a-utf-8-encoded-bytestring/2089195#2089195
Don Stewart
+7  A: 

utf8-string supports bytestrings.

import Prelude hiding (putStr)
import Data.ByteString.Char8 (putStr)
import Data.ByteString.UTF8 (fromString)

main :: IO ()
main = putStr $ fromString "čušpajž日本語"
Wei Hu
Oh well now I just feel silly. ^^ Thanks, this solves my problem!
liszt