views:

714

answers:

5

In purely functional languages, data is immutable. With reference counting, creating a reference cycle requires changing already created data. It seems like purely functional languages could use reference counting without worrying about the possibility of cycles. Am is right? If so, why don't they?

I understand that reference counting is slower than GC in many cases, but at least it reduces pause times. It would be nice to have the option to use reference counting in cases where pause times are bad.

A: 

Reference counting is MUCH slower than GC because it's not good for CPU. And GC most of the time can wait for idle time and also GC can be concurrent (on another thread). So that's the problem - GC is least evil and lots of tries shown that.

Mash
+12  A: 

Your question is based on a faulty assumption. It's perfectly possible to have circular references and immutable data. Consider the following C# example which uses immutable data to create a circular reference.

class Node { 
  public readonly Node other;
  public Node() { 
    other = new Node(this);
  }
  public Node(Node node) {
    other = node;
  }
}

This type of trick can be done in many functional languages and hence any collection mechanism must deal with the possibility of circular references. I'm not saying a ref counting mechanism is impossible with a circular reference, just that it must be dealt with.

Edit by ephemient

In response to the comment... this is trivial in Haskell

data Node a = Node { other :: Node a }
recursiveNode = Node { other = recursiveNode }

and barely any more effort in SML.

datatype 'a node = NODE of unit -> 'a node
val recursiveNode : unit node =
    let fun mkRecursiveNode () = NODE mkRecursiveNode
    in mkRecursiveNode () end

No mutation required.

JaredPar
Your example still relies on modifying already created data (albeit in the constructor). I was not aware that this was possible in any purely functional languages though. Can you give a simple example in Haskell (if possible)?
Zifre
@ephemient, thanks for the example. I'm kind of new to Haskell so forgive me for such a trivial question.
Zifre
@ephemient, yes thanks.
JaredPar
`Your example still relies on modifying already created data (albeit in the constructor).`... false claim. There is still no mutation in that example.
trinithis
+10  A: 

There are a few things, I think.

  • There are cycles: "let rec" in many languages does allow "circular" structures to be created. Apart from this, immutability does usually imply no cycles, but this breaks the rule.
  • Ref-counts are bad at lists: I don't know that reference-counted collection works well with e.g. long singly-linked-list structures you often find in FP (e.g. slow, need to ensure tail-recursive, ...)
  • Other strategies have benefits: As you allude to, other GC strategies are still usually better for memory locality

(Once upon a time I think I maybe really 'knew' this, but now I am trying to remember/speculate, so don't take this as any authority.)

Brian
Ref counting works fine with lists as long as you do it lazily: when an objects ref count goes to zero, it goes onto the free list, but you don't decrement the things it points to until it comes off the free list. That way every operation is constant time (bound by the size of the largest object, not the length of the longest list).
Norman Ramsey
+8  A: 

Relative to other managed languages like Java and C#, purely functional languages allocate like crazy. They also allocate objects of different sizes. The fastest known allocation strategy is to allocate from contiguous free space (sometimes called a "nursery") and to reserve a hardware register to point to the next available free space. Allocation from the heap becomes as fast as allocation from a stack.

Reference counting is fundamentally incompatible with this allocation strategy. Ref counting puts objects on free lists and takes them off again. Ref counting also has substantial overheads required for updating ref counts as new objects are created (which, as noted above, pure functional languages do like crazy).

Reference counting tends to do really well in situations like these:

  • Almost all heap memory is used to hold live objects.
  • Allocation and pointer assignment are infrequent relative to other operations.
  • References can be managed on another processor or computer.

To understand how the best high-performance ref-counting systems work today, look up the work of David Bacon and Erez Petrank.

Norman Ramsey
Actually it is possible to add a nursery to a reference counting collector: http://cs.anu.edu.au/~Steve.Blackburn/pubs/papers/urc-oopsla-2003.pdf
Luke Quinane
Hybrid collectors are definitely the future, and if anybody can make this work, Steve Blackburn can (especially when he has help from Perry Cheng). That said, I'm not sure I'd be willing to extrapolate their Java results to a modern functional language. Java's not too hard on the memory system.
Norman Ramsey
+1: Very interesting but why do you say "Hybrid collectors are definitely the future"? What advantages does reference counting bring?
Jon Harrop
+1  A: 

Consider this allegory told about David Moon, an inventor of the Lisp Machine:

One day a student came to Moon and said: "I understand how to make a better garbage collector. We must keep a reference count of the pointers to each cons."

Moon patiently told the student the following story:

"One day a student came to Moon and said: 'I understand how to make a better garbage collector...

Crashworks