views:

120

answers:

2

Hi,

I have a Haskell code which use a lot of String, while profilling it, it appear that the code use a lot of memory to store Lists []. One solution to this problem is to use Data.ByteString.Lazy instead of String, but

what I have to care about while doing this ?,

which part of the code have to be look carefully : fold, map, ... ?

thanks for reply

+1  A: 

You should know, that a ByteString is really bad for things like iteration over it elements, but better for Concatation, etc.

If you want to work with ByteStrings, you have to convert the String to a ByteString, just do something like

import Data.ByteString.Lazy as B

and stick a B in front of each function which works with them - most functions for String also exists for ByteString. Please notice - you have to convert the Strings you use to a ByteString with some functions.

If you use Data.ByteString.Lazy.Char8 instead, you can easily use pack, but all chars greater than 255 will be truncated. Also, this type is more suitable for binary data and safes memory.

Edit: You should consider using the package text, if you want to work on text-strings. Look here for further details.

FUZxxl
Why is ByteString "really bad for things like iteration"? It boils down to a vector of 8-bit ints; exactly like a C `char *`. `map` over strict Bytestrings does a single allocation for the target vector, reads each byte from the source vector, applies the mapping functions, and puts the result into memory. Everything is unboxed. It's subject to stream fusion. I can't imagine anything better for iteration than this!
jrockway
That's what I was thought at school.
FUZxxl
As I think, it involves much array-copying if you're adding / removing elements (especially adding) from the head of a ByteString. But yes, I may be wrong.
FUZxxl
Yes. If your operation is cons heavy, a linked list or Lazy bytestring is theoretically more appropriate than a strict Bytestring. If it's append-heavy, the Lazy bytestring will do the operation in constant time. It's a trade-off, though; although appends are constant time, you lose the locality of reference that you get with a pure vector. Whether that is faster or slower than an allocate / memcpy cycle depends on your app -- fortunately, it's easy to try both!
jrockway
+4  A: 

The OverloadedStrings extension can be handy if you're using GHC and are converting code with a lot of string literals. Just add the following to the top of your source file:

{-# LANGUAGE OverloadedStrings #-}

And you don't have to use B.pack on any string literals in your code. You could have the following, for example:

equalsTest :: B.ByteString -> Bool
equalsTest x = x == "Test"

Without the extension this would give an error, since you can't use == on a ByteString and a [Char]. With the extension, string literals have type (IsString a) => a, and ByteString is an instance of IsString, so "Test" here is typed as a ByteString and there's no error.

Travis Brown
I also thought about suggesting this one, but as it is a language extension, it may break portability.
FUZxxl
People writing software in Haskell use something other than GHC? Oh.
jrockway
Yes, indeed. For instance there is the Utrecht Haskell Compiler, some people told me, that it can create faster code sometimes.
FUZxxl
This seems pretty safe as extensions go: it's convenient, it makes your code cleaner, and you can always get rid of it with one quick `sed`.
Travis Brown