views:

88

answers:

2

Profiling of some code showed that about 65% of the time I was inside the following code.

What it does is use the Data.Binary.Get monad to walk through a bytestring looking for the terminator. If it detects 0xff, it checks if the next byte is 0x00. If it is, it drops the 0x00 and continues. If it is not 0x00, then it drops both bytes and the resulting list of bytes is converted to a bytestring and returned.

Any obvious ways to optimize this? I can't see it.

parseECS = f [] False
    where
    f acc ff = do
        b <- getWord8
        if ff
            then if b == 0x00
                then f (0xff:acc) False
                else return $ L.pack (reverse acc)
            else if b == 0xff
                then f acc True
                else f (b:acc) False
+1  A: 

Bug fix

It seems there may be a bug here. An exception gets raised if you reach the end of the byte stream before an 0xff, not 0x00 sequence is found. Here's a modified version of your function:

parseECS :: Get L.ByteString
parseECS = f [] False
  where
    f acc ff = do
      noMore <- isEmpty
      if noMore
         then return $ L.pack (reverse acc)
         else do
           b <- getWord8
           if ff
              then
                if b == 0x00
                   then f (0xff:acc) False
                   else return $ L.pack (reverse acc)
              else
                if b == 0xff
                   then f acc True
                   else f (b:acc) False

Optimization

I haven't done any profiling, but this function will probably be faster. Reversing long lists is expensive. I'm not sure how lazy getRemainingLazyByteString is. If it's too strict this probably won't work for you.

parseECS2 :: Get L.ByteString
parseECS2 = do
    wx <- liftM L.unpack $ getRemainingLazyByteString
    return . L.pack . go $ wx
  where
    go []             = []
    go (0xff:0x00:wx) = 0xff : go wx
    go (0xff:_)      = []
    go (w:wx)         = w : go wx
Michael Steele
That makes sense, and I agree it is kind of a compromise. I had the idea to break out of the Get monad and search the ByteString directly, and it gave me 4x the speed. Your code essentially does the same thing but converts to list and uses patterns. Get back to you on performance. Thanks for your help.
me2
A: 

If problem is in "reverse" you can use "lookAhead" to scan position and then go back and rebuild your new string

parseECS2 :: Get L.ByteString
parseECS2 = do
    let nextWord8 = do
            noMore <- isEmpty
            if noMore then return Nothing
                      else liftM Just getWord8

    let scanChunk !n = do
            b <- nextWord8
            case b of
                Just 0xff -> return (Right (n+1))
                Just _ -> scanChunk (n+1)
                Nothing -> return (Left n)

    let readChunks = do
            c <- lookAhead (scanChunk 0)
            case c of
                Left n -> getLazyByteString n >>= \blk -> return [blk]
                Right n -> do
                    blk <- getLazyByteString n
                    b <- lookAhead nextWord8
                    case b of
                        Just 0x00 -> skip 1 >> liftM (blk:) readChunks
                        _ -> return [L.init blk]

    liftM (foldr L.append L.empty) readChunks
ony
Hi Ony. Thanks. The issue isn't really the lookahead portion though. It's that getWord8 through a long binary string is very slow.
me2