views:

755

answers:

9

I need a data structure that holds a list of elements of the same type. Needed features are

  • Add
  • GetEnumerator
  • (may be) Clear

An indexed access, sorting, searching, removing of elements are not required. What is the best collection class for it? The following aspects should be taken in consideration: performance, memory usage, behavior of the garbage collector.

My current candidates are List<T> and LinkedList<T>

+20  A: 

Unless you are dealing with a massive structure or you plan on iterating over this thing a trillion times, it doesn't matter. Just pick one and start coding. If your app slows to a crawl later, figure out why then and change as necessary.

(Seriously, it does. not. matter. In the slightest. Every minute you spend on finding the answer to this question is a minute closer you could've been to having working code).

If anyone has gotten to the point where they need to know the difference, LinkedList is faster than List and could be used if you only need non-random, forward-only reading and appending functionality.

Rex M
+1: Yes, don't sweat the small stuff unless it matters. The nearly identical interfaces should make subsequent replacement trivial. If it isn't, then you know you already picked the right datastructure b/c you're using its unique features. OP should learn the differences for future reference, tho
Greg D
-1: Sorry, it's not what I'm asking for.
Michael Damatov
Man, -1 for explaining how to be a better developer? Harsh :)
Rex M
+1 for explaining how to be a better developer, as Albert said, stop reading and do.
Tom Anderson
+1 Sometimes the correct answer is not the one you were looking for.
Ed Swangren
While the answer is likely valid it would be a more useful answer if it also then stated what the differences are. perhaps someone will stumble upon this question because they are infact getting awful performance using a List and the particulars of their algorithm mean LinkedList is appropriate...
ShuggyCoUk
as of now http://www.google.co.uk/search?q=site%3Astackoverflow.com+list+linkedlist this is the third hit
ShuggyCoUk
Good point Shuggy, added.
Rex M
This answer reminds me one of the developers on my team. I work at a high-frequency trading firm and he's firmly in the "only profile when you actually have a performance problem" camp. Problem is, every time he adds something new, we have a new performance problem, because he doesn't concern himself with questions like this until after the fact (as you have essentially recommended). I'm not saying it's not good advice in general, but I really think it's premature to say things like "it doesn't matter" without understanding the context. Sometimes performance is a very important issue.
Dan Tao
@Dan that's what a staging environment is for.
Rex M
+2  A: 

If indexed access is not required, you're probably better off with a LinkedList<T>. There's not much difference, but the LinkedList can be faster for the appends.

Joel Coehoorn
Actually, not the case (unless you care about symptomatic performance - "Worst Case" - for large data sets), the heap alloc, linking is much worse than the List's assign to inner array. *Prepends*, maybe, but again, only for medium-to-large data sets.
Simon Buchan
Like Simon says the _amortized_ cost will be much better for the List<T>. if you want *stable* append cost the linked list will be better though but that's very particular.
ShuggyCoUk
+6  A: 

I would use List<T> because all of your data will be stored sequentially if they're value types and still do well with reference types (internally, List<T> manages an array that it grows by a factor of two each time you run out of space).

LinkedList<T> used to make a lot more sense when things weren't IO bound. People will often quote its seeming "O(1)" nature. However, this discounts the true cost that comes with increased chances of page faults getting the nodes.

If you can get contiguous memory regions with an array or List<T> and avoid the potential of a page fault, you're much better off with modern processor and main memory caching lines.

If you know how many elements in advance you'll have, use an array. If you have a good idea of how many elements, use a List<T> (and in the constructor pass the likely upper bound to potentially avoid a reallocation).

The only time I'd use a LinkedList<T> is if you need to bump items in a list by one value constantly. For example, if you're implementing a least recently used cache algorithm and need to add something to the front and take something off the end.

For small items, it really won't make a difference. The generational garbage collector will compact scattered heap items together over time so the linked list won't get too bad.

I'd pick a List<T> and run with it unless you noticed problems (via profiling)

Jeff Moser
I'd avoid arrays in C#, they have terrible type safety - setting .Capacity beforehand is just as good, perf-wise.
Simon Buchan
and LRU cache will liekyl be better with a CircularBuffer anyway...
ShuggyCoUk
+5  A: 

Unless you're dealing with hundreds of thousands or millions of entries, and you've profiled your program to see that there's a major problem, you probably won't notice the difference between the two.

Other than that:

LinkedList<T> provides separate nodes of type LinkedListNode<T>, so insertion and removal are O(1) operations.

from here.

Robert P
i ran out of votes for today, otherwise +1
Tim
A: 

A LinkedList will carry more memory overhead (for the nodes), but is better if you are inserting many elements into the middle. A linked list will require more memory allocations too, as LinkedListNode is a class and is allocated dynamically. If you are not inserting elements (just appending), I would use a List. A list should be as fast or faster for appending, although it will occasionally have to enlarge.

Jordan Miner
A: 

If your array is going to be large enough to matter, you need to roll your own ... otherwise just use a List.

Andrew
Why? List is fast.
Simon Buchan
+1  A: 

Given your requirements for the interface, you should be using ICollection<T> everywhere in your code instead of referring to a particular concrete class.

The reason is that if you use List then you may write a bunch of code that uses l[0] to get the first item. On the other hand, if you use LinkedList then you might spread calls to AddLast throughout your code. To preserve the choice to switch, you need to avoid becoming accidentally dependent on features you don't really need. You need to use an interface that both containers can support.

So write a class like this:

public static class Collections
{
    public static ICollection<T> Make<T>()
    {
        return new List<T>();
    }
}

Then whenever you need a collection, do this:

ICollection<int> c = Collections.Make<int>();

Isolate your choice of container from the rest of your program. Now when you profile and change your mind, you just have to edit the Make<T> method.

Daniel Earwicker
Surely IEnumerable? :)
Simon Buchan
No, he needs Add and possibly Clear.
Daniel Earwicker
I should have said "s/he needs..." of course!
Daniel Earwicker
I don't agree; unless you are writing a library or have strong architectural reasons in a large system. Pick one and use it like what it is. If you need to change it, change it and let the compiler tell you what needs fixing. Premature Abstraction is almost as bad as Premature Optimization.
Euro Micelli
Premature Abstraction is bad because of the cost. Picking the simplest data type for a variable costs nothing.
Daniel Earwicker
Earwicker (on simplest data type): I agree with you there. But IEnumerable is not a data type but a least-common-denominator interface. I don't think it's quite the same. I want to expand on this, so I posted http://stackoverflow.com/questions/613652 to elicit some healthy discussion around it.
Euro Micelli
+2  A: 

I think this is a good question to clarify the existence of two similar classes. I opened up Reflector and looked at how they work:

  • List implementation is in mscorlib while LinkedList is in System.dll. I'm not sure why there is such a distinction but it's there.
  • List keeps a dynamically growing array of objects while LinkedList keeps "LinkedListNode" object chain.
  • Every instance of LinkedListNode adds 4 int fields, which can consume up to 36 bytes of additional memory per node on 64-bit. That is 36MB more memory consumption for 1 million items.
  • On the other hand List object starts with 4 items in the array and doubles up the list capacity whenever array limit is reached. Adding 1 million items without specifying an initial capacity would cause 19 array growth operations causing 1,048,576 items to be allocated. Corresponding up to 388608 bytes surplus allocation.
  • The growth operation List uses itself is an Array.Copy to the new array. This means every time List grows the growth will become exponentially slower. To do the 19th growth the List object has to copy up to 4MB of memory. The previous array is discarded to GC's own discretion.

To summarize LinkedList would be very performant in Add operations and would not trigger GC cleanup when growing. List on the other hand has less surplus memory than LinkedList in the end. My take is your "Add & Iterate" code would be faster with LinkedList if number of items are around millions. Otherwise both should perform equally well.

Note that if you know how many items you'll be adding, List with initial capacity would outperform anything on any use.

ssg
+5  A: 

Short answer
default to using List<T> in almost all cases.

Slightly longer answer
LinkedList<T> is only going to be better if you are doing a lot of adding and removing values in comparison to enumerating over those values and the size of the list is large. This should only be a factor in your choice if, subsequent to profiling you find using List<T> is a problem.

Longer answer

Assumption, you have identified the usage of one or the other as being a problem for performance.

It you do a lot of random access then List<T> will almost always be vastly faster no matter what. If you enumerate a lot and rarely insert (or almost always insert near/at the end) the List<T> will almost always be faster. If you are constantly inserting/removing in random locations but while iterating over the list so are already at or near the relevant node and will have at least several thousand elements you may want to try LinkedList<T>

Working out what values/usage translate to better performance are very much dependent on your usage profile. Microbenchmarks can be very misleading here since they brush under the carpet aspects of the linked lists behaviour like the nodes in in being distributed about in memory rather than nicely adjacent if they happen to get allocated all at once in your tests. Likewise pre-creating the List<T> with the right size can make a big difference.

As to computer science style reasoning and big O notation (which really need big N's to be meaningful in this case)

  • operation
    • cost List<T>
    • cost LinkedList<T>
  • insert at end
    • O(1) (amortized cost, allocation to double size as needed)
    • O(1) allocation each time
  • insert at start
    • O(N) (though done as fast memory move so somewhat complex running time behaviour)
    • O(1)
  • insert in location x (and remove as well)
    • O(N-x) (see comment for insert at end)
    • O(1)
  • forward enumeration
    • O(N) (though cache misses minimized)
    • O(N) (though heavily dependent on cache locality)
  • reverse enumeration
    • O(N)
    • O(N) (the LinkedList<T> implementation is doubly linked)
  • random access
    • O(1)
    • O(N)

Memory usage is complex since List can have at most Count - 1 excess cells at any point but LinkedList<T> will consume a LinkedListNode<T> for each cell which is an additional 3 references (4/8 bytes a pop) plus the usual object overhead. In normal usage the List is likely to win but again this should only be something you worry about if you find the memory consumption is actually a problem.

ShuggyCoUk