views:

792

answers:

11

Most times I see people try to use linked lists, it seems to me like a poor (or very poor) choice. Perhaps it would be useful to explore the circumstances under which a linked list is or is not a good choice of data structure.

Ideally, answers would expound on the criteria to use in selecting a data structure, and which data structures are likely to work best under specified circumstances.

Edit: I must say, I'm quite impressed by not only the number, but the quality of answers. I can only accept one, but there are two or three more I'd have to say would have been worth accepting if something a bit better hadn't been there. Only a couple (especially the one I ended up accepting) pointed to situations where a linked list provided a real advantage. I do think Steve Jessop deserves some sort of honorable mention for coming up with not just one, but three different answers, all of which I found quite impressive. Of course, even though it was posted only as a comment, not an answer, I think Neil's blog entry is well worth reading as well -- not only informative, but quite entertaining as well.

A: 

I have used linked lists (even doubly linked lists) in the past in a C/C++ application. This was prior to .NET and even stl.

I probably wouldn't use a linked list now in a .NET language because all the traversal code you need is provided for you via the Linq extension methods.

ChrisF
+1  A: 

They're useful when you need high-speed push, pop and rotate, and don't mind O(n) indexing.

Ignacio Vazquez-Abrams
Have you ever bothered to time C++ linked lists in comparison with (say) a deque?
anon
@Neil: Can't say that I have.
Ignacio Vazquez-Abrams
@Neil: if C++ has deliberately sabotaged its linked list class in order to make it slower than any other container (which is not far from the truth), what has that to do with a language-agnostic question? An intrusive linked list is still a linked list.
Steve Jessop
@Steve C++ is a language. I can't see how it can have volition. If you are suggesting that members of the C++ Committee somehow sabotaged linked lists (which must logically be slow for many operations), then name the guilty men!
anon
It's not really sabotage - external list nodes have their advantages, but performance is not one of them. However, surely everyone was aware when making the trade-off of the same thing you're aware of, which is that it's quite difficult to come up with a good use for `std::list`. An intrusive list just doesn't fit with the C++ philosophy of minimum requirements on container elements.
Steve Jessop
+7  A: 

Linked lists are very flexible: With the modification of one pointer, you can make a massive change, where the same operation would be very inefficient in an array list.

Chris Lercher
+1 I think this is their key advantage. But it is one that few apps actually take advantage of.
anon
+3  A: 

Under what circumstances are linked lists useful?

When learning how to implement basic data structures using pointers in a low-level C/C++ course.

Pat
anon
+7  A: 

They can be useful for concurrent data structures. (There is now a non-concurrent real-world usage sample below - that would not be there if @Neil hadn't mentioned FORTRAN. ;-)

For example, ConcurrentDictionary<TKey, TValue> in .NET 4.0 RC use linked lists to chain items that hash to the same bucket.

The underlying data structure for ConcurrentStack<T> is also a linked list.

ConcurrentStack<T> is one of the data structures that serve as the foundation for the new Thread Pool, (with the local "queues" implemented as stacks, essentially). (The other main supporting structure being ConcurrentQueue<T>.)

The new Thread Pool in turn provides the basis for the work scheduling of the new Task Parallel Library.

So they can certainly be useful - a linked list is currently serving as one of the main supporting structures of at least one great new technology.

(A singly-linked list makes a compelling lock-free - but not wait-free - choice in these cases, because main operations can be carried out with a single CAS (+retries). In a modern GC-d environment - such as Java and .NET - the ABA problem can easily be avoided. Just wrap items you add in freshly created nodes and do not reuse those nodes - let the GC do its work. The page on the ABA problem also provides the implementation of a lock-free stack - that actually works in .Net (&Java) with a (GC-ed) Node holding the items.)

Edit: @Neil: actually, what you mentioned about FORTRAN reminded me that the same kind of linked lists can be found in probably the most used and abused data structure in .NET: the plain .NET generic Dictionary<TKey, TValue>.

Not one, but many linked lists are stored in an array.

  • It avoids doing many small (de)allocations on inserts/deletes.
  • Initial loading of the hash table is pretty fast, because the array is filled sequentially (plays very nice with CPU cache).
  • Not to mention that a chaining hash table is expensive in terms of memory - and this "trick" cuts "pointer sizes" in half on x64.

Essentially, many linked lists are stored in an array. (one for each bucket used.) A free list of reusable nodes is "interwoven" between them (if there were deletes). An array is allocated at the start/on rehash and nodes of chains are kept in it. There is also a free pointer - an index into the array - that follows deletes. ;-) So - believe it or not - the FORTRAN technique still lives on. (...and nowhere else, than in one of the most commonly used .NET data structures ;-).

andras
In case you have missed, here is Neil's comment: "People (including me, I'm sorry to say) used to implement linked lists without pointers in languages like FORTRAN IV (which had no notion of pointers), much as they did trees. You used arrays instead of "real" memory."
andras
I should add that the "linked lists in an array" approach in case of the `Dictionary` saves significantly more in .NET: otherwise each node would require a separate object on the heap - and every object allocated on the heap have some overhead. (http://en.csharp-online.net/Common_Type_System%E2%80%94Object_Layout)
andras
+11  A: 

Linked lists are very useful when you need to do a lot of insertions and removals, but not too much searching, on a list of arbitrary (unknown at compile-time) length.

Splitting and joining (bidirectionally-linked) lists is very efficient.

You can also combine linked lists - e.g. tree structures can be implemented as "vertical" linked lists (parent/child relationships) connecting together horizontal linked lists (siblings).

Using an array based list for these purposes has severe limitations:

  • Adding a new item means the array must be reallocated (or you must allocate more space than you need to allow for future growth and reduce the number of reallocations)
  • Removing items leaves wasted space or requires a reallocation
  • inserting items anywhere except the end involves (possibly reallocating and) copying lots of the data up one position
Jason Williams
So the question reduces to, when *do* you need to do a lot of insertions and removals at the middle of a sequence, but not very many lookups in the list by ordinal? Traversing a linked list is typically as or more expensive than copying an array, so everything you say about removing and inserting items in arrays is just as bad for random-access in lists. LRU cache is one example I can think of, you need to remove at the middle a lot, but you never need to walk the list.
Steve Jessop
Adding to a list involves memory allocation for every element you add. This may involve a system call which will be very expensive. Adding to an array only requires such a call if the array must be grown. In fact, in most languages (for exactly these reasons) the array is the preferred data structure and lists are hardly used at all.
anon
"This may involve a system call" elsewhere you appeared to criticise someone else for assuming a bad array implementation (failed to amortise exponential reallocation). Why now make scary noises about a bad list implementation (fails to use a decent allocation strategy for nodes)? For example in Java, memory allocation is astonishingly fast, much faster than a typical C implementation even once you've accounted for the time cost in Java of GC.
Steve Jessop
@Steve I also suggested that such a call *may* be needed for an array.
anon
So nothing to choose between arrays and lists on that score then - either might require a system call to add an element. Probably with about the same frequency, assuming equivalent growth strategies for both. Sorry, I do mostly agree with you: linked lists and especially `std::list` are pretty rubbish choices almost all the time. I just don't feel anyone is trying to present specific cases for linked lists other than me, it's mostly quite vague "well, if you want the big-O characteristics of a linked list, use a linked list"...
Steve Jessop
@Steve I would not like to assume that - presumably as you do assume it you have some figures to back it up?
anon
Assume which? That allocation is astonishingly fast is evident - usually requires adding the object size to a pointer. That total overhead for GC is low? Last time I tried measuring it on a real app, the key point was that Java was doing all the work when the processor was otherwise idle anyway, so naturally it didn't affect visible performance much. In a busy-CPU benchmark it was easy to upset Java, and get very bad worst case allocation time. This was many years ago, though, and generational garbage collection has markedly reduced the total cost of GC since.
Steve Jessop
Anyway, the point wasn't intended to be that malloc/free is typically rubbish (although of course you can't count on it to be good). The point was intended to be that the difference between "requires memory allocation" and "uses a bit of space at the end of an existing array, if there's any left" is, with the right memory allocator, a distinction without a difference. So to say, "lists are rubbish if your memory allocator is rubbish" isn't IMO much of a criticism of lists, since the exact same technique that `std::vector` uses can be used by a list to limit calls to the problematic allocator.
Steve Jessop
@Steve: You're wrong about allocation being "the same" between lists and arrays. Each time you need to allocate memory for a list, you simply allocate a small block - O(1). For an array you must allocate a new block large enough for the whole list, and then copy the whole list - O(n). To insert into a known location in a list you update a fixed number of pointers - O(1), but to insert into an array and copy any later items up one position to make room for the insertion - O(n). There are many cases where arrays are therefore much less efficient than LLs.
Jason Williams
@Jason:I think his point was that for an array, you typically grow it in chunks, so you don't do an allocation every time. In fact, you usually grow it geometrically, so complexity of addition is amortized constant.
Jerry Coffin
@Jerry: I understand. My point is that a large part of the cost of reallocating the array is not *allocating memory*, it is the need to *copy the entire array contents* to the new memory. To insert into item 0 of an array you must *copy the entire array contents* up one position in memory. I'm not saying arrays are bad; just that there are situations where random access is not needed, and where the genuinely constant-time insert/delete/relink of LLs are preferable.
Jason Williams
If you read back, I didn't say they were "the same". I said that they're the same "on that score" (the score being the number of system calls needed for memory allocation, that Neil was worried about) and that they can use "the same technique" (allocating large blocks from the expensive allocator and using them a little at a time). What you do after making the call varies between the two, as you say, but Neil's particular objection to lists in comment 2 of this thread, that I disagreed with, wasn't about that, it was about system calls. Of course there are other objections too.
Steve Jessop
+3  A: 

Singly-linked list is a good choice for the free list in a cell allocator or object pool:

  1. You only need a stack, so a singly-linked list is sufficient.
  2. Everything is divided into nodes already. There is no allocation overhead for an intrusive list node, provided the cells are large enough to contain a pointer.
  3. A vector or deque would impose an overhead of one pointer per block. This is significant given that when you first create the heap, all cells are free, so it's an up-front cost. In the worst case it doubles the memory requirement per cell.
Steve Jessop
Well, agreed. But how many programmers are actually creating such things? Most are simply re-implementing what std::list etc. give you. And actually "intrusive" normally has a slightly different meaning than you have given it - that each possible list element contains a pointer separate from the data.
anon
How many? More than 0, less than a million ;-) Was Jerry's question "give good uses of lists", or "give good uses of lists which every programmer uses on a daily basis", or something in between? I don't know any other name than "intrusive" for a list node which is contained within the object that's a list element - whether as part of a union (in C terms) or not. Point 3 only applies in languages that let you do it - C, C++, assembler good. Java bad.
Steve Jessop
+3  A: 

Doubly-linked list is a good choice to define the ordering of a hashmap which also defines an order on the elements (LinkedHashMap in Java), especially when ordered by last access:

  1. More memory overhead than an associated vector or deque (2 pointers instead of 1), but better insert/remove performance.
  2. No allocation overhead, since you need a node for a hash entry anyway.
  3. Locality of reference is no additional problem compared with a vector or deque of pointers, since you'd have to pull each object into memory either way.

Sure, you can argue about whether an LRU cache is a good idea in the first place, compared with something more sophisticated and tuneable, but if you're going to have one, this is a fairly decent implementation. You do not want to perform a delete-from-middle-and-add-to-the-end on a vector or deque at every read access, but moving a node to the tail is typically fine.

Steve Jessop
+4  A: 

Arrays are the data structures to which Linked Lists are usually compared.

Normally linked lists are useful when you have to make a lot of modification to the list itself while arrays performs better than lists on direct element access.

Here's a list of operations that can be performed on lists and arrays, compared with the relative operation cost (n = list/array length):

  • Adding an element:
    • on lists you just need to allocate memory for the new element and redirecting pointers. O(1)
    • on arrays you have to relocate the array. O(n)
  • Removing an element
    • on lists you just redirect pointers. O(1).
    • on arrays you spend O(n) time to relocate the array if the element to remove is not the first or the last element of the array; otherwise you can simply relocate the pointer to the start of the array or decrease the array length
  • Getting an element in a known position:
    • on lists you have to walk the list from the first element to the element in the specific position. Worst case: O(n)
    • on arrays you can access the element immediately. O(1)

This is a very low-level comparison of these two popular and basic data structures and you can see that lists performs better in situations where you have to make a lot of modifications to the list it self (removing or adding elements). On the other hand arrays performs better than lists when you have to access directly the elements of the array.

From the point of view of the memory allocation, lists are better because there's no need to have all the elements next to each others. On the other hand there's the (little) overhead of storing the pointers to the next (or even to the previous) element.

Knowing these differences is important to developers to choose between lists and arrays in their implementations.

Note that this is a comparison of lists and arrays. There are good solutions to the problems here reported (eg: SkipLists, Dynamic Arrays, etc...). In this answer I've taken into account the basic data structure every programmer should know about.

Andrea Zilio
This is somewhat true for a good implementation of lists and an awful implementation of arrays. Most array implementations are far more sophisticated than you give them credit for. And I don't think you understand how expensive dynamic memory allocation can be.
anon
This answer is not supposed to cover the program of a Data Structures University course.This is a comparison written taking into account Linked lists and arrays, which are implemented the way you, I and most people know.Geometrically expanding arrays, Skip Lists, etc... are solutions that I know, I use and I study but that would need a deeper explanation and that would not fit a stackoverflow answer.
Andrea Zilio
+2  A: 

Singly-linked lists are the obvious implementation of the common "list" data type in functional programming languages:

  1. Adding to the head is fast, and (append (list x) (L)) and (append (list y) (L)) can share almost all of their data. No need for copy-on-write in a language with no writes. Functional programmers know how to take advantage of this.
  2. Adding to the tail is unfortunately slow, but so would be any other implementation.

By comparison, a vector or deque would typically be slow to add at either end, requiring (at least in my example of two distinct appends) that a copy be taken of the entire list (vector), or the index block and the data block being appended to (deque). Actually, there may be something to be said there for deque on large lists which do need to add at the tail for some reason, I'm not sufficiently informed about functional programming to judge.

Steve Jessop
+1  A: 

From my experience, implementing sparse-matrices and fibonacci heaps. Linked lists give you more control over the overall structure for such data structures. Though I'm not sure if sparse-matrices are best implemented using linked lists - probably there is a better way, but it really helped learning the ins-and-outs of sparse matrices using linked lists in undergrad CS :)

Zaki