I've written a small Haskell program to print the MD5 checksums of all files in the current directory (searched recursively). Basically a Haskell version of md5deep
. All is fine and dandy except if the current directory has a very large number of files, in which case I get an error like:
<program>: <currentFile>: openBinaryFile: resource exhausted (Too many open files)
It seems Haskell's laziness is causing it not to close files, even after its corresponding line of output has been completed.
The relevant code is below. The function of interest is getList
.
import qualified Data.ByteString.Lazy as BS
main :: IO ()
main = putStr . unlines =<< getList "."
getList :: FilePath -> IO [String]
getList p =
let getFileLine path = liftM (\c -> (hex $ hash $ BS.unpack c) ++ " " ++ path) (BS.readFile path)
in mapM getFileLine =<< getRecursiveContents p
hex :: [Word8] -> String
hex = concatMap (\x -> printf "%0.2x" (toInteger x))
getRecursiveContents :: FilePath -> IO [FilePath]
-- ^ Just gets the paths to all the files in the given directory.
Are there any ideas on how I could solve this problem?
The entire program is available here: http://haskell.pastebin.com/PAZm0Dcb
Edit: I have plenty of files that don't fit into RAM, so I am not looking for a solution that reads the entire file into memory at once.