If you're using String IO, you can do the following:
import System.IO
import Control.Monad
-- | Process 100 lines
process100 :: [String] -> MyData
-- whatever this function does
loop :: [String] -> [MyData]
loop lns = go [] lns
where
go acc [] = reverse acc
go acc lns = let (this, next) = splitAt 100 lns in go (process100 this:acc) next
processFile :: FilePath -> IO [MyData]
processFile f = withFile f ReadMode (fmap (loop . lines) . hGetContents)
Note that this function will silently process the last chunk even if it isn't exactly 100 lines.
Packages like bytestring and text generally provide functions like lines
and hGetContents
so you should be able to easily adapt this function to any of them.
It's important to know what you're doing with the results of processing each slice, because you don't want to hold on to that data for longer than necessary. Ideally, after each slice is calculated the data would be entirely consumed and could be gc'd. Generally either the separate results get combined into a single data structure (a "fold"), or each one is dealt with separately (maybe outputting a line to a file or something similar). If it's a fold, you should change "loop" to look like this:
loopFold :: [String] -> MyData -- assuming there is a Monoid instance for MyData
loopFold lns = go mzero lns
where
go !acc [] = acc
go !acc lns = let (this, next) = splitAt 100 lns in go (process100 this `mappend` acc) next
The loopFold
function uses bang patterns (enabled with "LANGUAGE BangPatterns" pragma) to force evaluation of the "MyData". Depending on what MyData is, you may need to use deepseq
to make sure it's fully evaluated.
If instead you're writing each line to output, leave loop
as it is and change processFile
:
processFileMapping :: FilePath -> IO ()
processFileMapping f = withFile f ReadMode pf
where
pf = mapM_ (putStrLn . show) <=< fmap (loop . lines) . hGetContents
If you're interested in enumerator/iteratee style processing, this is a pretty simple problem. I can't give a good example without knowing what sort of work process100
is doing, but it would involve enumLines
and take
.
Is it necessary to process exactly 100 lines at a time, or do you just want to process in chunks for efficiency? If it's the latter, don't worry about it. You'd most likely be better off processing one line at a time, using either an actual fold function or a function similar to processFileMapping.