views:

174

answers:

1

Hi,

I'm trying to read and decode a binary file strictly, which seems to work most of the time. But unfortunately in a few cases my Program fails with

"too few bytes. Failed reading at byte position 1"

I guess Binary its decode function thinks there is no data available, but I know there is and just rerunning the program works fine.

I've tried several solutions, but neither was able to solve my problem :(

  • using withBinaryFile:

    decodeFile' path = withBinaryFile path ReadMode doDecode
      where
        doDecode h = do c <- LBS.hGetContents h
                        return $! decode c
    
  • reading the whole file with strict ByteString and decoding from it:

    decodeFile' path = decode . LBS.fromChunks . return <$> BS.readFile path
    
  • adding some more strictness

    decodeFile' path = fmap (decode . LBS.fromChunks . return) $! BS.readFile path
    

Any ideas what is going on here and how to solve the issue?

Thanks!

EDIT: I think I've figured out my problem. It is not about strictly reading the file. I have a number of processes mostly reading from the file, but from time to time one needs to write to it which will truncate the file first and add the new content then. So for writing I need to set a file lock first, which seems not to be done when "Binary.encodeFile" is used (when I say process I don't mean threads, but real instances of the same program being run).

EDIT Finally had some time to solve my problem using POSIX IO and File Locks. I've had no more Problems since.

Just in case someone is interested in my current solution or maybe someone is able to point out errors/problems I'll post my solution here.

Safe encoding to File:

safeEncodeFile path value = do
    fd <- openFd path WriteOnly (Just 0o600) (defaultFileFlags {trunc = True})
    waitToSetLock fd (WriteLock, AbsoluteSeek, 0, 0)
    let cs = encode value
    let outFn = LBS.foldrChunks (\c rest -> writeChunk fd c >> rest) (return ()) cs
    outFn
    closeFd fd
  where
    writeChunk fd bs = unsafeUseAsCString bs $ \ptr ->
                         fdWriteBuf fd (castPtr ptr) (fromIntegral $ BS.length bs)

and decoding a File:

safeDecodeFile def path = do
    e <- doesFileExist path
    if e
      then do fd <- openFd path ReadOnly Nothing
                           (defaultFileFlags{nonBlock=True})
              waitToSetLock fd (ReadLock, AbsoluteSeek, 0, 0)
              c  <- fdGetContents fd
              let !v = decode $! c
              return v
      else return def

fdGetContents fd = lazyRead
  where
    lazyRead = unsafeInterleaveIO loop

    loop = do blk <- readBlock fd
              case blk of
                Nothing -> return LBS.Empty
                Just c  -> do cs <- lazyRead
                              return (LBS.Chunk c cs)

readBlock fd = do buf <- mallocBytes 4096
                  readSize <- fdReadBuf fd buf 4096
                  if readSize == 0
                    then do free buf
                            closeFd fd
                            return Nothing
                    else do bs <- unsafePackCStringFinalizer buf
                                         (fromIntegral readSize)
                                         (free buf)
                            return $ Just bs

With qualified imports for strict and lazy Bytestrings as:

import qualified Data.ByteString as BS
import qualified Data.ByteString.Lazy as LBS
import qualified Data.ByteString.Lazy.Internal as LBS
+1  A: 

It would be helpful if you could produce some minimum code snippet that runs and demonstrates the problem. Right now I am not convinced this isn't a problem with your program tracking which handles are opened/closed and the reads/writes getting in the way of each other. Here is example test code I made that works fine.

import Data.Trie as T
import qualified Data.ByteString as B
import qualified Data.ByteString.Lazy as L
import Data.Binary
import System.IO

tmp = "blah"

main = do
    let trie = T.fromList    [(B.pack [p], p) | p <- [0..]]
    (file,hdl) <- openTempFile "/tmp" tmp
    B.hPutStr hdl (B.concat $ L.toChunks $ encode trie)
    hClose hdl
    putStrLn file
    t <- B.readFile file
    let trie' = decode (L.fromChunks [t])
    print (trie' == trie)
TomMD
Thanks. it is more or less similar to your code, but reading and writing swapped. 1. read data, 2. modify data, 3. write data using Binary.encodeFile which will truncate the file before writing. So I think it is a race condition of loads of processes reading the file while one is overwriting it (See "edit" in my post).
urso