views:

763

answers:

8

I've been reading up on garbage collection looking for features to include in my programming language and I came across "weak pointers". From here:

Weak pointers are like pointers, except that references from weak pointers do not prevent garbage collection, and weak pointers must have their validity checked before they are used.

Weak pointers interact with the garbage collector because the memory to which they refer may in fact still be valid, but containing a different object than it did when the weak pointer was created. Thus, whenever a garbage collector recycles memory, it must check to see if there are any weak pointers referring to it, and mark them as invalid (this need not be implemented in such a naive way).

I've never heard of weak pointers before. I would like to support many features in my language, but in this case I cannot for the life of me think of a case where this would be useful. For what would one use weak pointer?

+6  A: 

A really big one is caching. Let's think through how a cache would work:

The idea behind a cache is to store objects in memory until memory pressure becomes so great that some of the objects need to be pushed out (or are explicitly invalidated of course). So your cache repository object must hold on to these objects somehow. By holding onto them via weak reference, when the garbage collector goes looking for things to consume because memory is low, the items referred to only by weak reference will appear as candidates for garbage collection. Items in the cache that are currently being used by other code will have hard references still active, so those items will be protected from garbage collection.

In most situations you won't be rolling your own caching mechanism, but it is common to use a cache. Let's suppose you want to have a property which refers to an object in cache, and that property stays in scope for a long time. You would prefer to fetch the object from cache, but if it's not available, you can get it from persisted storage. You also don't want to force that particular object to stay in memory if pressure gets too high. So you can use a weak reference to that object, which will allow you to fetch it if it is available but also allow it to fall out of cache.

Rex M
This makes a lot of sense. Thank you.
Imagist
I tend to agree with asaph on this though; I've implemented caches before and it seems like there might be a better way to do this.
Imagist
@Imagist more sophisticated caches can be built other ways, certainly. However, I've found many situations where my second example comes in handy when interacting with a very robust caching system.
Rex M
In Java, you should use SoftReferences and not WeakReferences for cacheing (WeakReferences are more aggressively cleared than SoftReferences are). WeakReferences should be used for things like additional object attributes as per the accepted answer.
Keith Randall
I agree with Keith. A perfect GC would instantly (and at no overhead) deallocate any weakly referenced object that is not at the same time strongly referenced. The implementors would work really hard to provide you with this GC. But this ideal GC would render your "cache" pretty much useless, with entries falling out of it as if stored in a sieve. The GC doesn't know anything about the program's speed/memory consumption trade-offs. The programmer knows (or should know). Ergo the programmer should decide how much memory is used for caching.
Pascal Cuoq
A: 

Weak references can for example be used in caching scenarios - you can access data through weak references, but if you don't access the data for a long time or there is high memory pressure, the GC can free it.

Daniel Brückner
+1  A: 

Use them when you wanted to keep a cached list of objects but not prevent those objects from getting garbage collected if the "real" owner of the object is done with it.

A web browser might have a history object that keeps references to image objects that the browser loaded elsewhere and saved in the history/disk cache. The web browser might expire one of those images (user cleared the cache, the cache timeout elapsed, etc) but the page would still have the reference/pointer. If the page used a weak reference/pointer the object would go away as expected and the memory would be garbage collected.

BradC
A valid use of weak pointers (I made up this criterion for valid uses: "it's a valid use when the design explicitly separates the cases where there is also a strong pointer to the objects from the cases where there isn't").
Pascal Cuoq
+4  A: 

A typical use case is storage of additional object attributes. Suppose you have a class with a fixed set of members, and, from the outside, you want to add more members. So you create a dictionary object -> attributes, where the keys are weak references. Then, the dictionary doesn't prevent the keys from being garbage collected; removal of the object should also trigger removal of the values in the WeakKeyDictionary (e.g. by means of a callback).

Martin v. Löwis
Is this sort of like how extension methods work in C#?
Imagist
@Imagist: exactly.
Martin v. Löwis
extension methods have *absolutely nothing* to do with GC and weak references.
lubos hasko
@lubos My guess is that once an extension method is loaded into memory, it stays around, but one *could* load it as a weak reference to reduce its impact on memory usage. If you think of methods as a specific kind of object attribute, then this is *exactly* the use case Martin described.
Imagist
@lubos: implementation-wise, they don't have to do anything with each other. However, extension methods allow to extend a class with methods from the outside the same way that the approach I describe allows extension of a class with additional attributes.
Martin v. Löwis
+2  A: 

Another example... not quite caching, but similar: Suppose an I/O library provides an object which wraps a file descriptor and permits access to the file. When the object is collected, the file descriptor is closed. It is desired to be able to list all currently opened files. If you use strong pointers for this list, then files are never closed.

A: 

Just because a programming language (any language) includes a specific feature, doesn't make it a good idea. Almost every language has misfeatures in it that should be avoided. Examples: multiple inheritance of C++, global variables of PHP, just to name a couple. Sounds like weak pointers might fall into this category. Others have suggested caching as a possible use which I suppose is valid. However there are probably more elegant ways to handle caching. IMO, language designers should take a minimalist approach to new languages. In the words of Albert Einstein: "Make things as simple as possible, but not simpler."

For an example of a new language design taking a new approach, check out Google's noop. I highly doubt they'll be including weak pointers.

Asaph
I knew it would be risky to poo-poo the weak pointers concept in this question, but nevertheles, would the down-voters care to leave a comment?
Asaph
I've already looked at noop and I'm really unimpressed. Their stated goals are reducing Java boilerplate but almost all the code examples I've seen *increase* boilerplate. I didn't downvote you, though.
Imagist
Didn't downvote but I believe that there *is* a time and place for every "evil" feature (namely, when it's the least evil choice). Often when you're writing code that you're going to run *once*, and then throw out, coding priorities are totally different to usual. If a maintenance-nightmare feature saves you 20 minutes on such a 3 hour task, why not use it?
Artelius
+3  A: 

If your language's garbage collector is incapable of collecting circular data structures, then you can use weak references to enable it to do so. Normally, if you have two objects which have references to each other, but no other outside object has a reference to those two, they would be candidates for garbage collection. But, a naïve garbage collector wouldn't collect them, since they contain references to each other.

To fix this, you make it so one object has a strong reference to the second, but the second has a weak reference to the first. Then, when the last outside reference to the first object goes away, the first object becomes a candidate for garbage collection, followed shortly thereafter by the second, since now its only reference is weak.

Adam Rosenfield
This seems like a poor language design choice to me. Garbage collection that requires the programmer to manage their own memory isn't worth much in my book.
Imagist
A: 

The reason for garbage collection at all is that in a language like C where memory management is totally under explicit control of the programmer, when object ownership is passed around, especially between threads or, even harder, between processes sharing memory, avoiding memory leaks and dangling pointers can become very hard. If that weren't hard enough, you also have to deal with the need to have access to more objects than will fit in memory at one time—you need to have a way to have free up some objects for a while so that other objects can be in memory.

So, some languages (e.g., Perl, Lisp, Java) provide a mechanism where you can just stop "using" an object and the garbage collector will eventually discover this and free up memory used for the object. It does this correctly without the programmer worrying about all the ways they can get it wrong (albeit there are lots of ways programmers can screw this up).

If you conceptually multiply the number of times you access an object by the time that it takes to compute the value of an object, and possibly multiply again by the cost of not having the object readily available or by the size of an object since keeping a large object around in memory can prevent keeping several smaller objects around, you could classify objects into three categories.

Some objects are so important that you want to explicitly manage their existence—they will not be managed by the garbage collector or they must never be collected until explicitly freed. Some objects are cheap to compute, are small, are not accessed frequently or have similar characteristics that allow them to be garbage collected at any time.

The third class, objects which are expensive to be recomputed but could be recomputed, are accessed somewhat frequently (perhaps for a short burst of time), are of large size, and so on are a third class. You'd like to keep them in memory as long as possible because they might be reused again, but you don't want to run out of memory needed for critical objects. These are candidates for weak references.

You want these objects kept around as long as possible if they aren't conflicting with critical resources, but they should be dropped if memory is needed for a critical resource because it can be recomputed again when needed. These are hat weak pointers are for.

An example of this might be pictures. Say you have a photo web page with thousands of pictures to display. You need to know how many pictures to lay out and maybe you have to do a database query to get the list. The memory to hold a list of a few thousand items is probably very small. You want to do the query once and keep it around.

You can only physically show perhaps a few dozen pictures at a time, though, in a pane of a web page. You don't need to fetch the bits for the pictures that the user can't be looking at. When the user scrolls the page, you'll gather the actual bits for the pictures visible. Those pictures could require many megabytes to show them. If the user scrolls back and forth between a few scroll positions, you'd like not to have to refetch those megabytes over and over again. But you can't keep all the pictures in memory all the time. So you use weak pointers.

If the user just looks at a few pictures over and over again, they may stay in cache and you don't have to refetch them. But if they scroll enough, you need to free up some memory so the visible pictures can be fetched. With a weak reference, you check the reference just before you use it. If its still valid, you use it. If its not, you make the expensive calculation (fetch) to get it.

MikeW
Just an observation to help you provide better answers: The question was "Why are weak pointers useful?" not "What is caching all about and how can weak pointers help?". Try to stay to the point, be succinct, and if the question begs a detailed answer, take advantage of the formatting markup to make your answer easier to read.
Artelius