I have a Haskell program which processes a text file and builds a Map
(with several million elements). The whole thing can run for 2-3 minutes. I found that tweaking the -H and -A options makes a big difference in running time.
There is documentation about this functionality of the RTS, but it's a hard read for me since I don't know the algorithms and terms from GC theory. I'm looking for a less technical explanation, preferably specific to Haskell/GHC. Are there any references about choosing sensible values for these options?
EDIT: That's the code, it builds a trie for a given list of words.
buildTrie :: [B.ByteString] -> MyDFA
buildTrie l = fst3 $ foldl' step (emptyDFA, B.empty, 1) $ sort $ map B.reverse l where
step :: (MyDFA , B.ByteString, Int) -> B.ByteString -> (MyDFA , B.ByteString, Int)
step (dfa, lastWord, newIndex) newWord = (insertNewStates, newWord, newIndex + B.length newSuffix) where
(pref, lastSuffix, newSuffix) = splitPrefix lastWord newWord
branchPoint = transStar dfa pref
--new state labels for the newSuffix path
newStates = [newIndex .. newIndex + B.length newSuffix - 1]
--insert newStates
insertNewStates = (foldl' (flip insertTransition) dfa $ zip3 (branchPoint:init newStates) (B.unpack newSuffix) newStates)