ansaurus

Question

Answer 1

+3 A:

Jeff Foster 2010-02-07 16:20:45

Answer 2

+10 A:

Well, there's Data.HashTable. Hash tables don't tend to play nicely with immutable data and referential transparency, though, so I don't think it sees a lot of use.

For a small number of values, stashing them in a search tree (such as Data.Map) would probably be fast enough. If you can put up with doing some mangling of your Doubles, a more robust solution would be to use a trie-like structure, such as Data.IntMap; these have lookup times proportional primarily to key length, and roughly constant in collection size. If Int is too limiting, you can dig around on Hackage to find trie libraries that are more flexible in the type of key used.

As for how to cache the results, I think what you want is usually called "memoization". If you want to compute and memoize results on demand, the gist of the technique is to define an indexed data structure containing all possible results, in such a way that when you ask for a specific result it forces only the computations needed to get the answer you want. Common examples usually involve indexing into a list, but the same principle should apply for any non-strict data structure. As a rule of thumb, non-function values (including infinite recursive data structures) will often be cached by the runtime, but not function results, so the trick is to wrap all of your computations inside a top-level definition that doesn't depend on any arguments.

Edit: MemoTrie example ahoy!

This is a quick and dirty proof of concept; better approaches may exist.

{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE TypeOperators #-}
import Data.MemoTrie
import Data.Binary
import Data.ByteString.Lazy hiding (map)

mangle :: Double -> [Int]
mangle = map fromIntegral . unpack . encode

unmangle :: [Int] -> Double
unmangle = decode . pack . map fromIntegral

instance HasTrie Double where
    data Double :->: a = DoubleTrie ([Int] :->: a)
    trie f = DoubleTrie $ trie $ f . unmangle
    untrie (DoubleTrie t) = untrie t . mangle

slow x 
    | x < 1 = 1
    | otherwise = slow (x / 2) + slow (x / 3)

memoSlow :: Double -> Integer
memoSlow = memo slow

Do note the GHC extensions used by the MemoTrie package; hopefully that isn't a problem. Load it up in GHCi and try calling slow vs. memoSlow with something like (10^6) or (10^7) to see it in action.

Generalizing this to functions taking multiple arguments or whatnot should be fairly straightforward. For further details on using MemoTrie, you might find this blog post by its author helpful.

camccann 2010-02-07 16:32:43

+1 for memoization

Marcus Lindblom 2010-02-07 16:33:17

The key domain is about 1.8 billion. I have no way to _initialize_ any data structure as this would eat all my available memory.

ondra 2010-02-07 17:10:34

That's why the idea is *lazy* initialization; theoretically the data structure contains the entire key space, but non-strict evaluation allows only the parts you actually use to get initialized. It's the same idea as infinite lists, except that you'll need something that avoids linear traversal.

camccann 2010-02-07 17:30:22

That seems to work. I think I will be able to adapt to my needs :) Thanks.

ondra 2010-02-08 08:41:41

I did some tests and unfortunately it is unusable in practice. Mostly because of Haskell's implementation of Double (the encode is too long, it eats too much memory). I did some tests for Word64, which would hopefully resemble 8-byte Double and I got about 40MB per 100.000 results. ~ 400 bytes per one record. Quite a lot. Anyway, the Haskells implementation of Double is horrendous, eventually I tried to implement it in C and got it twice as slow jsut by moving the function from Haskell to C :(

ondra 2010-02-08 19:37:41

Probably there are ways to improve efficiency through strategic use of unboxing and strictness, but that's getting well out of my depth as far as optimizing Haskell code goes. Sorry...

camccann 2010-02-08 19:58:29

Answer 3

A:

I don't know haskell specifically, but how about keeping existing answers in some hashed datastructure (might be called a dictionary, or hashmap)? You can wrap your slow function in another function that first check the map and only calls the slow function if it hasn't found an answer.

You could make it fancy by limiting the size of the map to a certain size and when it reaches that, throwing out the least recently used entry. For this you would additionally need to keep a map of key-to-timestamp mappings.

abc 2010-02-07 16:36:20

This is a fine way to do it given mutable data structures and impure functions, but in Haskell it's preferred (where possible) to retain referential transparency and avoid mutable state.

camccann 2010-02-07 16:41:10

Answer 4

A:

You can write the slow function as a higher order function, returning a function itself. Thus you can do all the preprocessing inside the slow function and the part that is different in each computation in the returned (hopefully fast) function. An example could look like this: (SML code, but the idea should be clear)

fun computeComplicatedThing (x:float) (y:float) = (* ... some very complicated computation *)
fun computeComplicatedThingFast = computeComplicatedThing 3.14 (* provide x, do computation that needs only x *)
val result1 = computeComplicatedThingFast 2.71 (* provide y, do computation that needs x and y *)
val result2 = computeComplicatedThingFast 2.81
val result3 = computeComplicatedThingFast 2.91

swegi 2010-02-07 17:04:36

Answer 5

+1 A:

I have several tens of thousands possible inputs and I only actually use a handful. I would need to initialize the array ... using a function instead of a list.

I'd go with listArray (start, end) (map func [start..end])

func doesn't really get called above. Haskell is lazy and creates thunks which will be evaluated when the value is actually required.
When using a normal array you always need to initialize its values. So the work required for creating these thunks is necessary anyhow.
Several tens of thousands is far from a lot. If you'd have trillions then I would suggest to use a hash table yada yada

yairchu 2010-02-07 19:48:37

So - to put it differently: I have 60.000 points and what I am interested in is the distance between those points. So the domain is actually 60.000^2, something like 3 billions.... I can attach the distance function to every point - that does not help with space complexity and it is very wasting considering that I would mostly need to cache about 100 values per point.

ondra 2010-02-08 08:11:48

@ondra: Ok - for 3 billion I wouldn't use an array :)

yairchu 2010-02-08 10:01:52

Answer 6

A:

I will add my own solution, which seems to be quite slow as well. First parameter is a function that returns Int32 - which is unique identifier of the parameter. If you want to uniquely identify it by different means (e.g. by 'id'), you have to change the second parameter in H.new to a different hash function. I will try to find out how to use Data.Map and test if I get faster results.

import qualified Data.HashTable as H
import Data.Int
import System.IO.Unsafe

cache :: (a -> Int32) -> (a -> b) -> (a -> b)
cache ident f = unsafePerformIO $ createfunc
    where 
        createfunc = do
            storage <- H.new (==) id
            return (doit storage)

        doit storage = unsafePerformIO . comp
            where 
                comp x = do
                    look <- H.lookup storage (ident x)

                    case look of
                        Just res -> return res
                        Nothing -> do
                            result <- return (f x)
                            H.insert storage (ident x) result
                            return result

ondra 2010-02-09 21:09:11

Answer 7

+1 A:

There are a number of tools in GHC's runtime system explicitly to support memoization.

Unfortunately, memoization isn't really a one-size fits all affair, so there are several different approaches that we need to support in order to cope with different user needs.

You may find the original 1999 writeup useful as it includes several implementations as examples:

Stretching the Storage Manager: Weak Pointers and Stable Names in Haskell by Simon Peyton Jones, Simon Marlow, and Conal Elliott

Edward Kmett 2010-02-12 20:44:41

ansaurus

tags:

views:

answers:

Haskell caching results of a function

related questions