tags:

views:

79

answers:

2

I am trying to load a PNG file, get the uncompressed RGBA bytes, then send them to the gzip or zlib packages.

The pngload package returns image data as a (StorableArray (Int, Int) Word8), and the compression packages take lazy ByteStrings. Therefore, I am attempting to build a (StorableArray (Int, Int) Word8 -> ByteString) function.

So far, I have tried the following:

import qualified Codec.Image.PNG as PNG
import Control.Monad (mapM)
import Data.Array.Storable (withStorableArray)
import qualified Data.ByteString.Lazy as LB (ByteString, pack, take)
import Data.Word (Word8)
import Foreign (Ptr, peekByteOff)

main = do
    -- Load PNG into "image"...
    bytes <- withStorableArray 
        (PNG.imageData image)
        (bytesFromPointer lengthOfImageData)

bytesFromPointer :: Int -> Ptr Word8 -> IO LB.ByteString
bytesFromPointer count pointer = LB.pack $ 
    mapM (peekByteOff pointer) [0..(count-1)]

This causes the stack to run out of memory, so clearly I am doing something very wrong. There are more things I could try with Ptr's and ForeignPtr's, but there are a lot of "unsafe" functions in there.

Any help here would be appreciated; I'm fairly stumped.

+5  A: 

Generally, pack and unpack are a bad idea for performance. If you have a Ptr, and a length in bytes, you can generate a strict bytestring in two different ways:

Like this:

import qualified Codec.Image.PNG as PNG
import Control.Monad
import Data.Array.Storable (withStorableArray)

import Codec.Compression.GZip

import qualified Data.ByteString.Lazy   as L
import qualified Data.ByteString.Unsafe as S

import Data.Word
import Foreign

-- Pack a Ptr Word8 as a strict bytestring, then box it to a lazy one, very
-- efficiently
bytesFromPointer :: Int -> Ptr Word8 -> IO L.ByteString
bytesFromPointer n ptr = do
    s <- S.unsafePackCStringLen (castPtr ptr, n)
    return $! L.fromChunks [s]

-- Dummies, since they were not provided 
image = undefined
lengthOfImageData = 10^3

-- Load a PNG, and compress it, writing it back to disk
main = do
    bytes <- withStorableArray
        (PNG.imageData image)
        (bytesFromPointer lengthOfImageData)
    L.writeFile "foo" . compress $ bytes

I'm using the O(1) version, that just repackages the Ptr from the StorableArray. You might wish to copy it first, via "packCStringLen".

Don Stewart
This works very well. Thanks for the help!
Keith Holman
+2  A: 

The problem with your "bytesFromPointer" is that you take a packed representation, the StorableArray from pngload, and you want to convert it to another packed representation, a ByteString, going through an intermediate list. Sometimes laziness means that the intermediate list won't be constructed in memory, but that's not the case here.

The function "mapM" is the first offender. If you expand mapM (peekByteOff pointer) [0..(count-1)] you get

el0 <- peekByteOff pointer 0
el1 <- peekByteOff pointer 1
el2 <- peekByteOff pointer 2
...
eln <- peekByteOff pointer (count-1)
return [el0,el1,el2,...eln]

because these actions all occur within the IO monad, they are executed in order. This means every element of the output list must be constructed before the list is constructed and laziness never has a chance to help you.

Even if the list was constructed lazily, as Don Stewart points out the "pack" function will still ruin your performance. The problem with "pack" is that it needs to know how many elements are in the list to allocate the correct amount of memory. To find the length of a list, the program needs to traverse it to the end. Because of the necessity of calculating the length, the list will need to be entirely loaded before it can be packed into a bytestring.

I consider "mapM", along with "pack", to be a code smell. Sometimes you can replace "mapM" with "mapM_", but in this case it's better to use the bytestring creation functions, e.g. "packCStringLen".

John