tags:

views:

269

answers:

5

Given a collection, is there a way to get the last N elements of that collection? If there isn't a method in the framework, what would be the best way to write an extension method to do this?

+6  A: 
coll.Reverse().Take(N).Reverse().ToList();


public static IEnumerable<T> TakeLast(this IEnumerable<T> coll, int N)
{
    return coll.Reverse().Take(N).Reverse();
}
James Curran
+8  A: 
collection.Skip(Math.Max(0, collection.Count() - N)).Take(N);
kbrimington
+1, as this works in Linq to Entities/SQL. I'm guessing it's also more performant in Linq to Objects than James Curran's strategy.
StriplingWarrior
Depends on the nature of collection. Count() might be O(N).
James Curran
@James: Absolutely correct. If dealing strictly with IEnumerable collections, this could be a two-pass query. I'd be very interested in seeing a guaranteed 1-pass algorithm. It might be useful.
kbrimington
I would make two such methods, one for `IQueryable<T>`, and one for `IEnumerable<T>`, using different methods. I suspect this method, and the double-reverse method, would both run nicely on Linq-To-SQL and Linq-To-Entities.
Lasse V. Karlsen
I would imagine that this would be an inefficient way to do this for LINQ-to-SQL. The Reverse and Take method probably uses the very efficient TOP N SORT query plan. http://msdn.microsoft.com/en-us/library/ms189054.aspx. I don't think this method will.
Mark Byers
Did some benchmarks. It turns out LINQ to Objects performs some optimizations based on the type of collection you're using. Using arrays, `List`s, and `LinkedList`s, James's solution tends to be faster, though not by an order of magnitude. If the IEnumerable is calculated (via Enumerable.Range, e.g.), James's solution takes longer. I can't think of any way to guarantee a single pass without either knowing something about the implementation or copying values to a different data structure.
StriplingWarrior
@Mark: The Reverse method is not supported by LINQ to SQL or LINQ to Entities. This strategy, on the other hand, produces a `Select ... where row_number between ... and ...` query in LINQ to SQL, which is probably pretty quick. Of course, if you know what data type you're dealing with, you could do an OrderBy, Take, and OrderByDescending, which might be quicker.
StriplingWarrior
@StriplingWarrior: Nice. Thanks for the footwork! How large were the collections you did your benchmarks in?
kbrimington
@Stripling: I had never tried it. I would do it by reversing the order by condition. If you have `OrderBy(x => x.Foo)` I'd change it to `OrderByDescending(x => x.Foo)`. Obviously this won't work if you receive an IQueryable ad you can't modify its source code.
Mark Byers
@kbrimington: 1000000 elements in the collection, and I took the last 10000 elements from it, ten times. Any more, and I started running into memory issues which make the tests unreliable. @Mark: That makes sense, assuming you know what you're ordering by (which is a reasonable assumption since you can't Take in Linq to SQL or Entities without first performing OrderBy). I was actually thinking of that and edited my comment before I saw your response.
StriplingWarrior
@StriplingWarrior: Interesting that you mention memory issues. I have posted an algorithm that was exactly designed to avoid memory issues. :)
Mark Byers
+3  A: 

Note: I missed your question title which said Using Linq, so my answer does not in fact use Linq.

If you want to avoid caching a non-lazy copy of the entire collection, you could write a simple method that does it using a linked list.

The following method will add each value it finds in the original collection into a linked list, and trim the linked list down to the number of items required. Since it keeps the linked list trimmed to this number of items the entire time through iterating through the collection, it will only keep a copy of at most N items from the original collection.

It does not require you to know the number of items in the original collection, nor iterate over it more than once.

Usage:

IEnumerable<int> sequence = Enumerable.Range(1, 10000);
IEnumerable<int> last10 = sequence.Last(10);
...

Extension method:

public static class Extensions
{
    public static IEnumerable<T> Last<T>(this IEnumerable<T> collection, int n)
    {
        if (collection == null)
            throw new ArgumentNullException("collection");
        if (n < 0)
            throw new ArgumentOutOfRangeException("n", "n must be 0 or greater");

        LinkedList<T> temp = new LinkedList<T>();

        foreach (var value in collection)
        {
            temp.AddLast(value);
            if (temp.Count > n)
                temp.RemoveFirst();
        }

        return temp;
    }
}
Lasse V. Karlsen
Nice, clean and much better performance.
Kyle Rozendo
I still think you have a good, valid answer even if it isn't technically using Linq, so I still give you a +1 :)
mgroves
+5  A: 

Here's a method that works on any enumerable but uses only O(N) temporary storage:

public static class TakeLastExtension
{
    public static IEnumerable<T> TakeLast<T>(this IEnumerable<T> source, int n)
    {
        if (source == null) { throw new ArgumentNullException("source"); }
        if (n < 0) { throw new ArgumentOutOfRangeException("must not be negative", "n"); }
        if (n == 0) { yield break; }

        T[] result = new T[n];
        int i = 0;
        int count = 0;
        foreach (T t in source)
        {
            result[i] = t;
            i = (i + 1) % n;
            count++;
        }

        if (count < n)
        {
            n = count;
            i = 0;
        }

        for (int j = 0; j < n; ++j)
        {
            yield return result[(i + j) % n];
        }
    }
}

Usage:

List<int> l = new List<int> {4, 6, 3, 6, 2, 5, 7};
List<int> lastElements = l.TakeLast(3).ToList();

It works by using a ring buffer of size N to store the elements as it sees them, overwriting old elements with new ones. When the end of the enumerable is reached the ring buffer contains the last N elements.

Mark Byers
+1: This should have better performance than mine, but you should make sure it does the right thing when the collection contains fewer elements than `n`.
Lasse V. Karlsen
@Lasse V. Karlsen: +1 Good point. I guess I should probably also test that n is a positive integer and correctly handle the case when n is zero, since someone might use this code in production.
Mark Byers
Well, most of the time I assume people will take care when copying code from SO for production use to add such things themselves, it might not be a problem. If you're going to add it, consider checking the collection variable for null as well. Otherwise, excellent solution :) I was considering using a ring-buffer myself, because a linked list will add GC-pressure, but it's been a while since I did one and I didn't want to hassle with test-code to figure out if I did it right. I must say I'm falling in love with LINQPad though :) http://www.linqpad.net/
Lasse V. Karlsen
+2  A: 

If you don't mind dipping into Rx as part of the monad, you can use TakeLast:

IEnumerable<int> source = Enumerable.Range(1, 10000);

IEnumerable<int> lastThree = source.AsObservable().TakeLast(3).AsEnumerable();
Richard Szalay