views:

268

answers:

3

I was looking through my code and I found a couple of extension methods that I wrote in order to remove items from a System.Collections.Generic.Stack. I was curious, so I looked at the source of Stack with Reflector and I can see that they implemented it as an array instead of a linked list and I'm just wondering why? With a linked list there would be no need to resize an internal array...

Here are my extensions, any criticisms or suggestions are welcome. Thanks.

public static Stack<T> Remove<T>(this Stack<T> stack, T item)
{
    Stack<T> newStack = new Stack<T>();

    EqualityComparer<T> eqc = EqualityComparer<T>.Default;

    foreach( T newItem in stack.Reverse() )
    {
     if( !eqc.Equals(newItem, item) )
     {
      newStack.Push(newItem);
     }
    }

    return newStack;
}
/// <summary>
/// Returns a new Stack{T} with one or more items removed, based on the given predicate.
/// </summary>
/// <typeparam name="T"></typeparam>
/// <param name="stack"></param>
/// <param name="fnRemove"></param>
/// <returns>The new stack.</returns>
/// <remarks>
/// We have to turn tricks in order to save the LIFO order of the pool
/// since there is no built-in remove method since they implement a Stack internally
/// as an array instead of a linked list. Maybe they have a good reason, I don't know...
/// 
/// So, to fix this I'm just using a LINQ extension method to enumerate in reverse.
/// </remarks>
public static Stack<T> RemoveWhere<T>(this Stack<T> stack, Predicate<T> fnRemove)
{
    Stack<T> newStack = new Stack<T>();

    foreach( T newItem in stack.Reverse() )
    {
     /// Check using the caller's method.
     if( fnRemove(newItem) )
     {
      /// It's not allowed in the new stack.
      continue;
     }
     newStack.Push(newItem);
    }

    return newStack;
}
+3  A: 

When you run benchmarks, you end up almost always end up seeing huge performance benefit in Array because of locality. The ArrayList puts everything into an array which is stored in a big contiguous block of memory. Linked lists store their data through links, and so end up with tons more memory allocations and also tons more jumping around in memory.

commondream
It would have to be Stack with a large number of items for this to be a big issue.
Mitch Wheat
Agreed, but I'm sure that the research on the topic directs the Framework library developers when they're writing stuff like this, and it says that Arrays end up benchmarking faster than Linked Lists.
commondream
That blanket statement "Arrays end up benchmarking faster than Linked Lists" is so flawed. What about inserting an element in the middle of a linked list and array? I hardly think the array will be faster!
Mitch Wheat
-unless it was directed to just Stacks...
Mitch Wheat
@Mitch: Yes, except this is a Stack (and not a List), which is a LIFO queue; items are pushed onto the end and pulled off the end - stacks don't (typically) insert into the middle. It's the stack behavior that dictates the choice of implementation.
Software Monkey
Yeah... sorry about not clarifying. We're talking stacks here and I took it for granted. Although, you know, if we limited it to inserting a single element to the middle of a 2 element list I think an array may be able to give the linked list a run for its money.
commondream
@Software Monkey: I realise that a Stack is a LIFO queue (I have implemented quite a few!)
Mitch Wheat
+10  A: 

A LIFO queue (a stack) is typically most efficient with an array, because you are pushing items onto the end of the array and pulling items off the same end of the array. So an array works well, without the memory and allocation overhead of a linked list. The cost of making and resizing an array is offset by not needing to create and garbage collect the list item wrapper objects.

Software Monkey
have you actually benchmarked this?
Mitch Wheat
Yep, but it was in Java and a long time ago.
Software Monkey
BTW, you don't need to Garbage collect the wrapper objects explicitly. This will happen automatically at some undetermined time.
Mitch Wheat
@Mitch: The work to GC the objects is still work.
Software Monkey
@Software Monkey: Yes but at best it might occur only when you shutdown your application.
Mitch Wheat
+5  A: 

Consider the size of a Stack<byte> with 1,000,000 items.

  • With an array, size ~= 1,000,000 bytes. Usually more due to spare capacity, but probably not more than double.
  • With a linked list, each entry needs its own object, the data itself, and at least one reference (for a singly-linked list, which is probably all that's needed for a stack). With padding to 4 bytes, total size is likely to be ~16,000,000 bytes. Ouch.

Add to that the locality of reference and the reduced GC pressure, and I think it makes perfect sense to use arrays as the underlying data structure for stacks, queues and deques - the last two would use the array as a circular buffer, of course. The only obvious downsides are the wasted "spare capacity" and the need to copy everything when that capacity runs out.

Jon Skeet
Makes sense. Thanks.
wizlb
Actually, on a 32 bit system the typical object size for a linked list is 16 (base) + prev/next links (8) + payload ref (4), and those double for a 64 bit system. So the reality is worse. I've measured these via a profiler.
Software Monkey