views:

131

answers:

2

When using immutable dictionaries in F# , how much overhead is there when adding / removing entries?

Will it treat entire buckets as immutable and clone those and only recreate the bucket whos item has changed?

Even if that is the case, it seems like there is alot of copying that needs to be done in order to create the new dictionary(?)

+1  A: 

A lot of the tree structure can be reused. I don't know the algorithmic complexity offhand, I would guess on average there's only like amortized logN 'waste'...

Why not try to write a program to measure? (We'll see if I can get motivated tonight to try it myself.)

EDIT

Ok, here is something I hacked. I haven't decided if there's any useful data here or not.

open System

let rng = new Random()
let shuffle (array : _[]) =
    let n = array.Length
    for x in 1..n do
        let i = n-x
        let j = rng.Next(i+1)
        let tmp = array.[i]
        array.[i] <- array.[j]
        array.[j] <- tmp

let TryTwoToThe k =
    let N = pown 2 k

    GC.Collect()

    let a = Array.init N id

    let makeRandomTreeAndDiscard() =
        shuffle a
        let mutable m = Map.empty
        for i in 0..N-1 do
            m <- m.Add(i,i)

    for i in 1..20 do
        makeRandomTreeAndDiscard()
    for i in 1..20 do
        makeRandomTreeAndDiscard()
    for i in 1..20 do
        makeRandomTreeAndDiscard()

#time
// run these as separate interactions
printfn "16"
TryTwoToThe 16

printfn "17"
TryTwoToThe 17

printfn "18"
TryTwoToThe 18

When I run this in FSI on my box, I get

--> Timing now on

> 
16
Real: 00:00:08.079, CPU: 00:00:08.062, GC gen0: 677, gen1: 30, gen2: 1
> 
17
Real: 00:00:17.144, CPU: 00:00:17.218, GC gen0: 1482, gen1: 47, gen2: 4
> 
18
Real: 00:00:37.790, CPU: 00:00:38.421, GC gen0: 3400, gen1: 1059, gen2: 17

which suggests the memory may be scaling super-linearly but not too badly. I am presuming that the gen0 collections are roughly a good proxy for the 'waste' of rebalancing the tree. But it is late so I am not sure if I have thought this through well enough. :)

Brian
So a dictionary in F# is some form of a binary tree rather than a hashtable? that would make sense I guess.If so, is it self balancing?
Roger Alsing
Yes, and yes. You can look at the implementation in the CTP in `map.fs`.
Brian
+5  A: 

I looked at the implementation of the F# Map<K,V> type and I think it is implemented as a functional AVL tree. It stores the values in the inner nodes of the tree as well as in the leafs and for each node, it makes sure that |height(left) - height(right)| <= 1.

            A 
         /     \
        B       C
      /   \
     D     E

I think that the both average and worst-case complexities are O(log(n)):

  • Insert we need to clone all nodes on the path from the root to the newly inserted element and the height of the tree is at most O(log(n)). On the "way back", the tree may need to rebalance each node, but that's also only O(log(n))

  • Remove is similar - we find the element and then clone all nodes from the root to that element (rebalancing nodes on the way back to the root)

Note that other data-structures that don't need to rebalance all nodes from the root to the current one on insertion/deletion won't be really useful in the immutable scenario, because you need to create new nodes for the entire path anyway.

Tomas Petricek