Last week user Masse asked a question about recursively listing files in a directory in Haskell. My first thought was to try using monadic lists from the List
package to avoid building the entire list in memory before the printing can start. I implemented this as follows:
module Main where
import Prelude hiding (filter)
import Control.Applicative ((<$>))
import Control.Monad (join)
import Control.Monad.IO.Class (liftIO)
import Control.Monad.ListT (ListT)
import Data.List.Class (cons, execute, filter, fromList, mapL)
import System (getArgs)
import System.Directory (getDirectoryContents, doesDirectoryExist)
import System.FilePath ((</>))
main = execute . mapL putStrLn . listFiles =<< head <$> getArgs
listFiles :: FilePath -> ListT IO FilePath
listFiles path = liftIO (doesDirectoryExist path) >>= listIfDir
where
valid "." = False
valid ".." = False
valid _ = True
listIfDir False = return path
listIfDir True
= cons path
$ join
$ listFiles
<$> (path </>)
<$> (filter valid =<< fromList <$> liftIO (getDirectoryContents path))
This works beautifully in that it starts printing immediately and uses very little memory. Unfortunately it's also dozens of times slower than a comparable FilePath -> IO [FilePath]
version.
What am I doing wrong? I've never used the List
package's ListT
outside of toy examples like this, so I don't know what kind of performance to expect, but 30 seconds (vs. a fraction of a second) to process a directory with ~40,000 files seems much too slow.