ansaurus

Question

How to make my Haskell code use Lazyness and Garbage collector

Answer 1

+1 A:

If it's not needed to process your file in slices, avoid this. Haskell does this automatically! In Haskell, you think of IO as a stream. Data is read from input, as soon as it's needed and discarded, as soon as it's unused. So for instance, this is an easy file-copying programm:

main = interact id

interact has the signature interact :: (String -> String) -> IO (), and feeds the input into a function which handles it and produces some output, which is written to stdout. This program is more efficient then most C-implementations, as the runtime automatically buffers the input and output.

If you want to understand laziness, you have to forget all the wisdom you learned as a imperative programmer and have to think about a program as a description to modify data, not as a set of instructions - data is only processed when needed!

The key point, why your data may be handled the wrong way is the multiple traversion of the list. Your function makeDags traverses the transposed the slices list one by one, so the elements of the original list may not be discarded. What you should try, is to write your function in a way like this:

sliceFile :: FilePath -> [[String]]
sliceFile fp = do
  f <- readFile fp
  let l = lines fp
      slice [] = []
      slice x  = ll : slice ls where (ll,ls) = splitAt 100 x
  return slice l

sliceFirstRow :: [[String]] -> ([String],[[String]])
sliceFirstRow list = unzip $ map (\(x:xs) -> (x,xs)) list

makeDags :: [[String]] -> [Dag]
makeDags [[]] = []
makeDags list = makeDag firstRow : makeDags restOfList where
 (firstRow,restOfList) = sliceFirstRow list

This function may be a solution, since the first row is no longer referenced, when it's done. But in the most places, this is a result of laziness, so you could probably try to use seq to force building the Dags and allowing the IO data to be garbage-collected. (If you don't force building the dags, the data can't be garbage collected).

But anyway, I could provide a more helpfull answer, if you give some informations about what these dags are.

FUZxxl 2010-09-17 12:49:02

Pretty good answer, but the information about ByteStrings is incorrect. `Data.ByteString.Lazy` is for binary data, and operations act on `Word8`, the Haskell type for an 8-bit word. `Data.ByteString.Lazy.Char8` is for ASCII text, operations act on (a subset of) `Char`s. Both take the same memory for storage.

John 2010-09-17 16:01:11

Indeed. @FUZxxl, can you stop answering ByteString questions until you actually use them? This is like the 10th answer I've seen of yours that is incorrect. All strict bytestrings are 8-bit word vectors. Same exact underlying data, same Haskell type. The functions in .Char8, though, simply convert to and from Char as necessary. (For example, `pack` packs Chars instead of Word8s, and `map` applies a Char to the mapping function instead of applying a Word8.) You can use the .Char8 and normal functions on the same data, the functions are just *views* onto the same underlying data.

jrockway 2010-09-18 00:55:22

thanks FUZxxl for your code it really helps me. but how are you dealing with many files, I've try too make it walk over many file without any succes ?

Leonzo Constantini 2010-09-19 09:39:49

@Leonzo Constantini: It would be very interesting to know, how you're dealing with the dags. What is there use? Why do you need slices of 100 lines? I can provide you further information then.

FUZxxl 2010-09-19 09:56:14

Answer 2

+1 A:

A few points:

1) Have you considered using fgl? It's probably more efficient than your own Dag implementation. If you really need to use Dag, you could construct your graphs with fgl then convert them to Dag when they're complete.

2) It seems like you don't actually use the slices when constructing your graphs, rather they control how many graphs you have. If so, how about something like this:

dagFromHandles :: [Handle] -> IO Dag
dagFromHandles = fmap makeDags . mapM hGetLine

allDags :: [FilePath] -> IO [Dag]
allDags listOfFiles = do
  handles <- mapM (flip openFile ReadMode) listOfFiles
  replicateM 100 (dagFromHandles handles)

This assumes that each file has at least 100 lines, and any extra lines will be ignored. Even better would be if you had a function that would consume a Dag, then you could do

useDag :: Dag -> IO ()

runDags :: [FilePath] -> IO ()
runDags listOfFiles = do
  handles <- mapM (flip openFile ReadMode) listOfFiles
  replicateM_ 100 (dagFromHandles handles >>= useDag)

This should make more efficient use of garbage collection.

Of course this assumes that I understand the problem properly, and I'm not certain that I do. Note that concat (map sliceFile) should be a no-op (sliceFile would need to be in IO as you've defined the type, but ignoring that for now), so I don't see why you're bothering with it at all.

John 2010-09-17 17:02:27

ansaurus

tags:

views:

answers:

How to make my Haskell code use Lazyness and Garbage collector

related questions