ansaurus

Question

Answer 1

+1 A:

A lot of the tree structure can be reused. I don't know the algorithmic complexity offhand, I would guess on average there's only like amortized logN 'waste'...

Why not try to write a program to measure? (We'll see if I can get motivated tonight to try it myself.)

EDIT

Ok, here is something I hacked. I haven't decided if there's any useful data here or not.

open System

let rng = new Random()
let shuffle (array : _[]) =
    let n = array.Length
    for x in 1..n do
        let i = n-x
        let j = rng.Next(i+1)
        let tmp = array.[i]
        array.[i] <- array.[j]
        array.[j] <- tmp

let TryTwoToThe k =
    let N = pown 2 k

    GC.Collect()

    let a = Array.init N id

    let makeRandomTreeAndDiscard() =
        shuffle a
        let mutable m = Map.empty
        for i in 0..N-1 do
            m <- m.Add(i,i)

    for i in 1..20 do
        makeRandomTreeAndDiscard()
    for i in 1..20 do
        makeRandomTreeAndDiscard()
    for i in 1..20 do
        makeRandomTreeAndDiscard()

#time
// run these as separate interactions
printfn "16"
TryTwoToThe 16

printfn "17"
TryTwoToThe 17

printfn "18"
TryTwoToThe 18

When I run this in FSI on my box, I get

--> Timing now on

> 
16
Real: 00:00:08.079, CPU: 00:00:08.062, GC gen0: 677, gen1: 30, gen2: 1
> 
17
Real: 00:00:17.144, CPU: 00:00:17.218, GC gen0: 1482, gen1: 47, gen2: 4
> 
18
Real: 00:00:37.790, CPU: 00:00:38.421, GC gen0: 3400, gen1: 1059, gen2: 17

which suggests the memory may be scaling super-linearly but not too badly. I am presuming that the gen0 collections are roughly a good proxy for the 'waste' of rebalancing the tree. But it is late so I am not sure if I have thought this through well enough. :)

Brian 2010-05-08 05:54:48

So a dictionary in F# is some form of a binary tree rather than a hashtable? that would make sense I guess.If so, is it self balancing?

Roger Alsing 2010-05-08 06:11:01

Yes, and yes. You can look at the implementation in the CTP in `map.fs`.

Brian 2010-05-08 06:17:59

Answer 2

+5 A:

I looked at the implementation of the F# Map<K,V> type and I think it is implemented as a functional AVL tree. It stores the values in the inner nodes of the tree as well as in the leafs and for each node, it makes sure that |height(left) - height(right)| <= 1.

I think that the both average and worst-case complexities are O(log(n)):

Insert we need to clone all nodes on the path from the root to the newly inserted element and the height of the tree is at most O(log(n)). On the "way back", the tree may need to rebalance each node, but that's also only O(log(n))
Remove is similar - we find the element and then clone all nodes from the root to that element (rebalancing nodes on the way back to the root)

Note that other data-structures that don't need to rebalance all nodes from the root to the current one on insertion/deletion won't be really useful in the immutable scenario, because you need to create new nodes for the entire path anyway.

Tomas Petricek 2010-05-08 12:28:33

ansaurus

tags:

views:

answers:

Immutable Dictionary overhead?

related questions