ansaurus

Question

How to go from String to Data.ByteString.Lazy in an existing Haskell code ?

Answer 1

+1 A:

You should know, that a ByteString is really bad for things like iteration over it elements, but better for Concatation, etc.

If you want to work with ByteStrings, you have to convert the String to a ByteString, just do something like

import Data.ByteString.Lazy as B

and stick a B in front of each function which works with them - most functions for String also exists for ByteString. Please notice - you have to convert the Strings you use to a ByteString with some functions.

If you use Data.ByteString.Lazy.Char8 instead, you can easily use pack, but all chars greater than 255 will be truncated. Also, this type is more suitable for binary data and safes memory.

Edit: You should consider using the package text, if you want to work on text-strings. Look here for further details.

FUZxxl 2010-09-15 08:04:32

Why is ByteString "really bad for things like iteration"? It boils down to a vector of 8-bit ints; exactly like a C `char *`. `map` over strict Bytestrings does a single allocation for the target vector, reads each byte from the source vector, applies the mapping functions, and puts the result into memory. Everything is unboxed. It's subject to stream fusion. I can't imagine anything better for iteration than this!

jrockway 2010-09-15 10:09:05

That's what I was thought at school.

FUZxxl 2010-09-15 12:40:15

As I think, it involves much array-copying if you're adding / removing elements (especially adding) from the head of a ByteString. But yes, I may be wrong.

FUZxxl 2010-09-15 12:50:08

Yes. If your operation is cons heavy, a linked list or Lazy bytestring is theoretically more appropriate than a strict Bytestring. If it's append-heavy, the Lazy bytestring will do the operation in constant time. It's a trade-off, though; although appends are constant time, you lose the locality of reference that you get with a pure vector. Whether that is faster or slower than an allocate / memcpy cycle depends on your app -- fortunately, it's easy to try both!

jrockway 2010-09-15 14:06:51

Answer 2

+4 A:

The OverloadedStrings extension can be handy if you're using GHC and are converting code with a lot of string literals. Just add the following to the top of your source file:

{-# LANGUAGE OverloadedStrings #-}

And you don't have to use B.pack on any string literals in your code. You could have the following, for example:

equalsTest :: B.ByteString -> Bool
equalsTest x = x == "Test"

Without the extension this would give an error, since you can't use == on a ByteString and a [Char]. With the extension, string literals have type (IsString a) => a, and ByteString is an instance of IsString, so "Test" here is typed as a ByteString and there's no error.

Travis Brown 2010-09-15 08:38:43

I also thought about suggesting this one, but as it is a language extension, it may break portability.

FUZxxl 2010-09-15 08:42:00

People writing software in Haskell use something other than GHC? Oh.

jrockway 2010-09-15 10:03:37

Yes, indeed. For instance there is the Utrecht Haskell Compiler, some people told me, that it can create faster code sometimes.

FUZxxl 2010-09-15 12:41:52

This seems pretty safe as extensions go: it's convenient, it makes your code cleaner, and you can always get rid of it with one quick `sed`.

Travis Brown 2010-09-15 17:05:18

ansaurus

tags:

views:

answers:

How to go from String to Data.ByteString.Lazy in an existing Haskell code ?

related questions