views:

472

answers:

4

I am looking for a mutable (balanced) tree/map/hash table in Haskell or a way how to simulate it inside a function. I.e. when I call the same function several times, the structure is preserved. So far I have tried Data.HashTable (which is OK, but somewhat slow) and tried Data.Array.Judy but I was unable to make it work with GHC 6.10.4. Are there any other options?

+4  A: 
Norman Ramsey
The problem is that I cannot use (or I don't know how to) use a non-mutable type. I am trying to build a 'caching' function and so far the different solutions have been pretty bad. I tried the HashTable approach outlined here: http://stackoverflow.com/questions/2217289/haskell-caching-results-of-a-function I have no idea how to write the same with non-mutable data structures.
ondra
@ondra: I've tried to add some guidelines to my answer, but it would really help to know more about the problem. I saw your other question, and memoization with floating-point keys could be agonizingly painful. If you can say more about the larger context of the problem, you might get more useful help.
Norman Ramsey
I am still trying to solve the same. I have modified the problem so that I know a unique Int32 for every 'object' acting as a parameter to the function. This allows me to cache the things reasonably efficiently but I am not sure if it allows me to use memoization on Ints.The problem I am trying to solve is: optimization problem that works on sphere. I am computing distances between the points - I can 'lazily compute' some goniometric operations, but not all. Only about 5% of the computations are unique, so I could save much time if I could cache the distances between points.
ondra
@ondra: Can you afford to discretize the sphere, introducing some quantization? If so, write your code to take the entire solution as answer, then find a way to compute it lazily. Also, a question: are you doing just the surface of the sphere or the interior as well? I have the vague idea that you can do this for the cost of a cosine and a couple of multiply operations, which is chump change compared to chasing pointers in a big cache...
Norman Ramsey
I need just a surface. That can be computed with one cos() and one acos() (all other 4 necessary sin/cos operations can be lazily computed). So far it has paid to me to do quite a lot of integer operations to avoid doing that.
ondra
@ondra: Fascinating. I wonder if this is a flaw in GHC. On modern hardware GHC should generate the `cos()` instruction for you but I can't remember if `acos()` is implemented in hardware. Can you put a grid on the surface?
Norman Ramsey
acos is implemented in hardware I think using atan. However, first I am not sure if it is used by Haskell (as the 'Double' thing in Haskell is pretty weird) - but it surely can be, and it is not 'that' slow. At least I wasn't able to make faster using these 'caching approaches'. However, most approaches look even slower compared to computing the cos()/acos() directly :)I already put a kind of grid on the surface, but I need a precise result which even then requires quite a lot of computations. I will have to put up with my current solution.
ondra
+5  A: 

Building on @Ramsey's answer, I also suggest you reconceive your function to take a map and return a modified one. Then code using good ol' Data.Map, which is pretty efficient at modifications. Here is a pattern:

import qualified Data.Map as Map

-- | takes input and a map, and returns a result and a modified map
myFunc :: a -> Map.Map k v -> (r, Map.Map k v)
myFunc a m = … -- put your function here

-- | run myFunc over a list of inputs, gathering the outputs
mapFuncWithMap :: [a] -> Map.Map k v -> ([r], Map.Map k v)
mapFuncWithMap as m0 = foldr step ([], m0) as
    where step a (rs, m) = let (r, m') = myFunc a m in (r:rs, m')
    -- this starts with an initial map, uses successive versions of the map
    -- on each iteration, and returns a tuple of the results, and the final map

-- | run myFunc over a list of inputs, gathering the outputs
mapFunc :: [a] -> [r]
mapFunc as = fst $ mapFuncWithMap as Map.empty
    -- same as above, but starts with an empty map, and ignores the final map

It is easy to abstract this pattern and make mapFuncWithMap generic over functions that use maps in this way.

MtnViewMark
Perfect! +1 And note that the function type `Map.Map k v -> (r, Map.Map k v)` is equivalent to the state monad! With type `MonadState (Map.Map k v) r` from `Control.Monad.State`.
Norman Ramsey
I am using an optimization function that is working on a tree itself.. I have many "points" and if I had a mutable structure I could attach it to every point - which would be more efficient then having all points in one huge tree. The resulting Map would have about 500.000 points, while the individual ones attached to every 'point' will have only about 100-1000. I might try this approach to see if it improves the speed.
ondra
A: 

If I read your comments right, then you have a structure with possibly ~500k total values to compute. The computations are expensive, so you want them done only once, and on subsequent accesses, you just want the value without recomputation.

In this case, use Haskell's laziness to your advantage! ~500k is not so big: Just build a map of all the answers, and then fetch as needed. The first fetch will force computation, subsequent fetches of the same answer will reuse the same result, and if you never fetch a particular computation - it never happens!

You can find a small implementation of this idea using 3D point distances as the computation in the file PointCloud.hs. That file uses Debug.Trace to log when the computation actually gets done:

> ghc --make PointCloud.hs 
[1 of 1] Compiling Main             ( PointCloud.hs, PointCloud.o )
Linking PointCloud ...

> ./PointCloud 
(1,2)
(<calc (1,2)>)
Just 1.0
(1,2)
Just 1.0
(1,5)
(<calc (1,5)>)
Just 1.0
(1,2)
Just 1.0
MtnViewMark
I need ~500K computations, however the domain is about 3 billions and I don't know which 500K I will need.
ondra
Ah well, perhaps just a little too big for most systems to attack it this way then. No harm, it was fun developing the code for PointCloud.hs anyway. Thanks for the intriguing problem!
MtnViewMark
A: 

If you want mutable state, you can have it. Just keep passing the updated map around, or keep it in a state monad (which turns out to be the same thing).

import qualified Data.Map as Map
import Control.Monad.ST
import Data.STRef
memoize :: Ord k => (k -> ST s a) -> ST s (k -> ST s a)
memoize f = do
    mc <- newSTRef Map.empty
    return $ \k -> do
        c <- readSTRef mc
        case Map.lookup k c of
            Just a -> return a
            Nothing -> do a <- f k
                       writeSTRef mc (Map.insert k a c) >> return a

You can use this like so. (In practice, you might want to add a way to clear items from the cache, too.)

import Control.Monad
main :: IO ()
main = do
    fib <- stToIO $ fixST $ \fib -> cache $ \n ->
        if n < 2 then return n else liftM2 (+) (fib (n-1)) (fib (n-2))
    mapM_ (print <=< stToIO . fib) [1..10000]

At your own risk, you can unsafely escape from the requirement of threading state through everything that needs it.

import System.IO.Unsafe
unsafeMemoize :: Ord k => (k -> a) -> k -> a
unsafeMemoize f = unsafePerformIO $ do
    f' <- stToIO $ memoize f
    return $ unsafePerformIO . stToIO . f'

fib :: Integer -> Integer
fib = unsafeMemoize $ \n -> if n < 2 then n else fib (n-1) + fib (n-2)

main :: IO ()
main = mapM_ (print . fib) [1..1000]
ephemient
I still don't understand the way this thing works :) But I did roughly the same with IO monad and IOref. Surprisingly, it was pretty fast, but it wasn't faster then computing the distance goniometric functions :) At least I have learnt some Haskell.
ondra