tags:

views:

324

answers:

3

I am a Java programmer who learns Haskell. I've written a small program that searches files for words with a particular suffix.

I'd like to read your criticism. What would you suggest to make this code more compact and readable?

module Main where

import Control.Monad
import Data.String.Utils
import Data.List
import Data.Char
import System.Directory
import System.FilePath
import System.IO
import System.IO.HVFS.Utils
import Text.Regex

alphaWords :: String -> [String]
alphaWords = words . map (\c -> if isAlpha c then c else ' ') -- by ephemient
-- was:
-- words2 s =  case dropWhile isSpace2 s of
--     "" -> []
--     ss -> w : words2 sss
--         where (w, sss) = break isSpace2 ss
--     where isSpace2 = not . isAlpha

findFiles :: FilePath -> IO [FilePath]
findFiles path = do
    cur_path <- getCurrentDirectory
    files <- recurseDir SystemFS $ normalise $ combine cur_path path
    filterM doesFileExist files

wordsWithSuffix :: String -> String -> [String]
wordsWithSuffix suffix text =
    let tokens = (nub . alphaWords) text
        endswithIgnoringCase = endswith suffix . map toLower
    in filter endswithIgnoringCase tokens

searchWords :: String -> String -> [String] -> IO [String]
searchWords suffix path exts = do
    let isSearchable = (`elem` exts) . takeExtension -- by yairchu
    --was let isSearchable s = takeExtension s `elem` exts

    --files <- filterM (fmap isSearchable) $ findFiles path -- by ephemient (compile error)
    files <- liftM (filter isSearchable) $ findFiles path

    wordsPerFile <- forM files $ fmap (wordsWithSuffix suffix) . readFile -- by ephemient
    -- was: wordsPerFile <- forM files (\x -> liftM (wordsWithSuffix suffix) (readFile x))

    return . sort . nub $ concat wordsPerFile -- by ephemient
    -- was: return $ (sort . nub . concat) wordsPerFile

main = do
    words <- searchWords "tick" "/path/to/src" [".as", ".java", ".mxml"]
    print $ length words
    putStrLn $ unlines words

UPDATE: I've fixed 2 verbose spots found using "hlint", thanks @yairchu
UPDATE 2: More fixes. Thanks @ephemient
UPDATE 3: One little fix. Thanks @yairchu, can't use all of your code - too hard for a Java dev's mind

+1  A: 

First of all, start by asking hlint.

It would give you a few useful suggestions and lessons, like:

homework.hs:46:1: Error: Use print
Found:
  putStrLn $ show $ length words
Why not:
  print (length words)

so we see that print = putStrLn . show

etc

yairchu
hlint rocks. Thanks. How would you simplify "searchWords" function?
oshyshko
+1  A: 

I don't like variable names.

Hence here's a shorter searchWords:

searchWords :: String -> String -> [String] -> IO [String]
searchWords suffix path exts =
  fmap (sort . nub . concatMap (wordsWithSuffix suffix)) .
  mapM readFile . filter ((`elem` exts) . takeExtension)
  =<< findFiles path
yairchu
Shorter, yes; more readable, depends on the reader :-D Looks fine to me, anyhow. (Digression: why use `Control.Monad.liftM` when `Prelude.fmap` will do?)
ephemient
@ephemient: that's a nasty habit. from writing generic code for Monads. since Monad doesn't imply Functor automatically..
yairchu
It's more a nasty wart in Haskell than a nasty habit. Monad really should imply both Applicative and Functor.
Alasdair
+1  A: 

Don't import System.FilePath.Posix if you don't need to. System.FilePath exports System.FilePath.Posix or System.FilePath.Windows according to the platform you're compiling on.

Your words2 implementation is fine, but lacks any explanation as to why it does what it does. This is more self-explanatory, and the efficiency difference is not significant.

alphaWords = words . map (\c -> if isAlpha c then c else ' ')

Little improvements in searchWords:

-    wordsPerFile <- forM files (\x ->
-        liftM (wordsWithSuffix suffix) (readFile x))
+    wordsPerFile <- forM files $ fmap (wordsWithSuffix suffix) . readFile
-    return $ (sort . nub . concat) wordsPerFile
+    return . sort . nub $ concat wordsPerFile

Type annotations inside let constructs are not common unless the typechecker really needs assistance... though if I were paying attention to them I wouldn't have made my earlier mistake of trying to move isSearchable :)

Also, in main, I would change this:

-    putStrLn $ unlines words
+    mapM_ putStrLn words

I'm not familiar with the modules exposed by MissingH; is System.IO.HVFS.Utils.recurseDir lazy? If not, adding System.IO.Unsafe.unsafeInterleaveIO may help with memory consumption when traversing a large directory tree.

ephemient
Problem here: "files <- filterM isSearchable $ findFiles path"the type is "filterM :: (Monad m) => (a -> m Bool) -> [a] -> m [a]"but expected to be "filterM :: (Monad m) => (a -> Bool) -> m [a] -> m [a]"
oshyshko
Ah, I didn't compile-test it. It would need to be `filterM (fmap isSearchable)`, which makes it as preferable.
ephemient
Edited to "files <- filterM (fmap isSearchable) $ findFiles path" -- got "Couldn't match expected type `[m FilePath]' against inferred type `IO [FilePath]'"
oshyshko
Argh. I typed that up real quick before dinner -- now that I've got an interpreter in front of me, I see what you're trying to do there... I'll fix my post.
ephemient