views:

915

answers:

12

I have a list of numbers e.g. 21,4,7,9,12,22,17,8,2,20,23

I want to be able to pick out sequences of sequential numbers (minimum 3 items in length), so from the example above it would be 7,8,9 and 20,21,22,23.

I have played around with a few ugly sprawling functions but I am wondering if there is a neat LINQ-ish way to do it.

Any suggestions?

UPDATE:

Many thanks for all the responses, much appriciated. Im am currently having a play with them all to see which would best integrate into our project.

+17  A: 

It strikes me that the first thing you should do is order the list. Then it's just a matter of walking through it, remembering the length of your current sequence and detecting when it's ended. To be honest, I suspect that a simple foreach loop is going to be the simplest way of doing that - I can't immediately think of any wonderfully neat LINQ-like ways of doing it. You could certainly do it in an iterator block if you really wanted to, but bear in mind that ordering the list to start with means you've got a reasonably "up-front" cost anyway. So my solution would look something like this:

var ordered = list.OrderBy(x => x);
int count = 0;
int firstItem = 0; // Irrelevant to start with
foreach (int x in ordered)
{
    // First value in the ordered list: start of a sequence
    if (count == 0)
    {
        firstItem = x;
        count = 1;
    }
    // Skip duplicate values
    else if (x == firstItem + count - 1)
    {
        // No need to do anything
    }
    // New value contributes to sequence
    else if (x == firstItem + count)
    {
        count++;
    }
    // End of one sequence, start of another
    else
    {
        if (count >= 3)
        {
            Console.WriteLine("Found sequence of length {0} starting at {1}",
                              count, firstItem);
        }
        count = 1;
        firstItem = x;
    }
}
if (count >= 3)
{
    Console.WriteLine("Found sequence of length {0} starting at {1}",
                      count, firstItem);
}

EDIT: Okay, I've just thought of a rather more LINQ-ish way of doing things. I don't have the time to fully implement it now, but:

  • Order the sequence
  • Use something like SelectWithPrevious (probably better named SelectConsecutive) to get consecutive pairs of elements
  • Use the overload of Select which includes the index to get tuples of (index, current, previous)
  • Filter out any items where (current = previous + 1) to get anywhere that counts as the start of a sequence (special-case index=0)
  • Use SelectWithPrevious on the result to get the length of the sequence between two starting points (subtract one index from the previous)
  • Filter out any sequence with length less than 3

I suspect you need to concat int.MinValue on the ordered sequence, to guarantee the final item is used properly.

EDIT: Okay, I've implemented this. It's about the LINQiest way I can think of to do this... I used null values as "sentinel" values to force start and end sequences - see comments for more details.

Overall, I wouldn't recommend this solution. It's hard to get your head round, and although I'm reasonably confident it's correct, it took me a while thinking of possible off-by-one errors etc. It's an interesting voyage into what you can do with LINQ... and also what you probably shouldn't.

Oh, and note that I've pushed the "minimum length of 3" part up to the caller - when you have a sequence of tuples like this, it's cleaner to filter it out separately, IMO.

using System;
using System.Collections.Generic;
using System.Linq;

static class Extensions
{
    public static IEnumerable<TResult> SelectConsecutive<TSource, TResult>
        (this IEnumerable<TSource> source,
         Func<TSource, TSource, TResult> selector)
    {
        using (IEnumerator<TSource> iterator = source.GetEnumerator())
        {
           if (!iterator.MoveNext())
           {
               yield break;
           }
           TSource prev = iterator.Current;
           while (iterator.MoveNext())
           {
               TSource current = iterator.Current;
               yield return selector(prev, current);
               prev = current;
           }
        }
    }
}

class Test
{
    static void Main()
    {
        var list = new List<int> {  21,4,7,9,12,22,17,8,2,20,23 };

        foreach (var sequence in FindSequences(list).Where(x => x.Item1 >= 3))
        {
            Console.WriteLine("Found sequence of length {0} starting at {1}",
                              sequence.Item1, sequence.Item2);
        }
    }

    private static readonly int?[] End = { null };

    // Each tuple in the returned sequence is (length, first element)
    public static IEnumerable<Tuple<int, int>> FindSequences
         (IEnumerable<int> input)
    {
        // Use null values at the start and end of the ordered sequence
        // so that the first pair always starts a new sequence starting
        // with the lowest actual element, and the final pair always
        // starts a new one starting with null. That "sequence at the end"
        // is used to compute the length of the *real* final element.
        return End.Concat(input.OrderBy(x => x)
                               .Select(x => (int?) x))
                  .Concat(End)
                  // Work out consecutive pairs of items
                  .SelectConsecutive((x, y) => Tuple.Create(x, y))
                  // Remove duplicates
                  .Where(z => z.Item1 != z.Item2)
                  // Keep the index so we can tell sequence length
                  .Select((z, index) => new { z, index })
                  // Find sequence starting points
                  .Where(both => both.z.Item2 != both.z.Item1 + 1)
                  .SelectConsecutive((start1, start2) => 
                       Tuple.Create(start2.index - start1.index, 
                                    start1.z.Item2.Value));
    }
}
Jon Skeet
+1 neat programming
Mario The Spoon
@Jon thanks for your quick response, I ideally want to return a list of sequences found from the function. But yor code is a great starting point.. cheers
Dve
+1 for nice explanation
saurabh
@Dve: Given that a sequence is basically just a starting point and a count, why not represent it that way? You can always use `SelectMany` to convert that later on.
Jon Skeet
Your method has a bug. Given the input `[ 1, 2, 3, 2 ]`, it won’t find the sequence `1, 2, 3`.
Timwi
Another bug: it won’t find any sequences that start at the beginning of the sorted list unless it starts with 0, e.g. `[ 5, 1, 2, 3 ]` won’t find `1, 2, 3`.
Timwi
@Timwi: Thanks, have fixed. I changed implementation track half way through, and forgot about the initial values :(
Jon Skeet
Still has the first bug, and also doesn’t overlapping sequences (mine does).
Timwi
@Timwi: Dammit, I must have screwed something else up when experimenting. It fixed the first problem a minute ago...
Jon Skeet
@Timwi: Fixed by a call to `Distinct()`.
Jon Skeet
@Timwi: Now fixed *without* the call to Distinct(), just by detecting the "same value as before" case.
Jon Skeet
+1 nice work! This has got to be some of the coolest code I've seen in a while :-)
DoctaJonez
@DoctaJonez: Where "cool" almost always means "unsuitable for a production codebase" :)
Jon Skeet
Wouldn't `Cast<int?>()` work instead of `Select(x => (int?)x)`?
configurator
@configurator: I can never remember offhand whether Cast includes non-nullable to nullable conversions. But yes, it quite possibly would :)
Jon Skeet
Jon: See my answer (http://stackoverflow.com/questions/3844611/c-detecting-sequence-of-at-least-3-sequential-numbers-from-a-given-list/3847505#3847505) for another way to handle the extra first element without using a `null` value.
Gabe
@Jon: Darn tooting! It's fun to play like this in our spare time though :)
DoctaJonez
+1  A: 

Here is how to solve the problem in a "LINQish" way:

int[] arr = new int[]{ 21, 4, 7, 9, 12, 22, 17, 8, 2, 20, 23 };
IOrderedEnumerable<int> sorted = arr.OrderBy(x => x);
int cnt = sorted.Count();
int[] sortedArr = sorted.ToArray();
IEnumerable<int> selected = sortedArr.Where((x, idx) =>
    idx <= cnt - 3 && sortedArr[idx + 1] == x + 1 && sortedArr[idx + 2] == x + 2);
IEnumerable<int> result = selected.SelectMany(x => new int[] { x, x + 1, x + 2 }).Distinct();

Console.WriteLine(string.Join(",", result.Select(x=>x.ToString()).ToArray()));

Due to the array copying and reconstruction, this solution - of course - is not as efficient as the traditional solution with loops.

fmunkert
+1 Yuck! But it's LINQ-ish.
John K
Same bug as Jon’s: Given input `[ 1, 2, 3, 2 ]`, it won’t find `1, 2, 3`. In your case, the fix is simpler: just add `.Distinct()` after `.OrderBy()` (and change `IOrderedEnumerable` to `IEnumerable`, or even better, change everything to `var`).
Timwi
Hm, another problem with this approach is that you only get a single list at the end, not a properly delineated set of sequences.
Timwi
+3  A: 

I think my solution is more elegant and simple, and therefore easier to verify as correct:

/// <summary>Returns a collection containing all consecutive sequences of
/// integers in the input collection.</summary>
/// <param name="input">The collection of integers in which to find
/// consecutive sequences.</param>
/// <param name="minLength">Minimum length that a sequence should have
/// to be returned.</param>
static IEnumerable<IEnumerable<int>> ConsecutiveSequences(
    IEnumerable<int> input, int minLength = 1)
{
    var results = new List<List<int>>();
    foreach (var i in input.OrderBy(x => x))
    {
        var existing = results.FirstOrDefault(lst => lst.Last() + 1 == i);
        if (existing == null)
            results.Add(new List<int> { i });
        else
            existing.Add(i);
    }
    return minLength <= 1 ? results :
        results.Where(lst => lst.Count >= minLength);
}

Benefits over the other solutions:

  • It can find sequences that overlap.
  • It’s properly reusable and documented.
  • I have not found any bugs ;-)
Timwi
Could you give an example of what you mean by "overlapping sequences"? Do you mean where a value occurs more than once?
Jon Skeet
I can't agree with your claim that it's simpler. I'm finding it fairly hard to understand at the moment.
Jon Skeet
Okay, I've understood it now, I think... but I still don't find it as simple as mine.
Jon Skeet
Out of interest, it doesn't look to me like you *actually* need to convert to an array in this case - you're only ever using `i` for `arr[i]` unless I've missed something... so would you be able to just use a foreach loop? That would definitely simplify things IMO.
Jon Skeet
@Timwi: I'd be interested to hear your thoughts on the "LINQy" approach I've used as the second half of my answer. I don't think it's actually *nice*, but it's interesting.
Jon Skeet
Any reason you chose not to use an iterator-method with a `List<T>` local that you kept yielding out?
Ani
@Jon: “overlapping sequences”: Input `[1,2,3,4,5,2,3,4]` yields `[[1,2,3,4,5],[2,3,4]]`. Of course, “simpler” is subjective — I felt it was simpler because it has less “if” and “+/−/==” magic. I’ve changed it to a `foreach` loop like you suggested, thanks. I’ve looked at your LINQy approach and it’s clever, but I think Ani’s is even cleverer, sorry :). @Ani: I suppose you could do that, but it would make the logic more complex, and I wanted to keep it simple.
Timwi
http://programmers.stackexchange.com/questions/4693/why-are-so-many-programmers-arrogant
spender
+5  A: 

Jon Skeet's / Timwi's solutions are the way to go.

For fun, here's a LINQ query that does the job (very inefficiently):

var sequences = input.Distinct()
                     .GroupBy(num => Enumerable.Range(num, int.MaxValue - num + 1)
                                               .TakeWhile(input.Contains)
                                               .Last())  //use the last member of the consecutive sequence as the key
                     .Where(seq => seq.Count() >= 3)
                     .Select(seq => seq.OrderBy(num => num)); // not necessary unless ordering is desirable inside each sequence.

The query's performance can be improved slightly by loading the input into a HashSet (to improve Contains), but that will still not produce a solution that is anywhere close to efficient.

The only bug I am aware of is the possibility of an arithmetic overflow if the sequence contains negative numbers of large magnitude (we cannot represent the count parameter for Range). This would be easy to fix with a custom static IEnumerable<int> To(this int start, int end) extension-method. If anyone can think of any other simple technique of dodging the overflow, please let me know.

EDIT: Here's a slightly more verbose (but equally inefficient) variant without the overflow issue.

var sequences = input.GroupBy(num => input.Where(candidate => candidate >= num)
                                          .OrderBy(candidate => candidate)
                                          .TakeWhile((candidate, index) => candidate == num + index)
                                          .Last())
                     .Where(seq => seq.Count() >= 3)
                     .Select(seq => seq.OrderBy(num => num));
Ani
Interesting. I think I understand it...
Jon Skeet
Re. negative numbers, even `-1` will fail with an overflow, and in fact so will `0`, because of the call `Enumerable.Range(num, int.MaxValue - num + 1)`
configurator
Why not just use `int.MaxValue` there?
configurator
@configurator: You were right about the negative values. Edited. As for using `int.MaxValue` for `count`, it's not possible since it would then choke on positive numbers. Any other suggestions? I think `Enumerable.Range` will always have this bug since it's not possible to represent the range of Int32 as an Int32.
Ani
+1  A: 

Not 100% Linq but here's a generic variant:

static IEnumerable<IEnumerable<TItem>> GetSequences<TItem>(
        int minSequenceLength, 
        Func<TItem, TItem, bool> areSequential, 
        IEnumerable<TItem> items)
    where TItem : IComparable<TItem>
{
    items = items
        .OrderBy(n => n)
        .Distinct().ToArray();

    var lastSelected = default(TItem);

    var sequences =
        from startItem in items
        where startItem.Equals(items.First())
            || startItem.CompareTo(lastSelected) > 0
        let sequence =
            from item in items
            where item.Equals(startItem) || areSequential(lastSelected, item)
            select (lastSelected = item)
        where sequence.Count() >= minSequenceLength
        select sequence;

    return sequences;
}

static void UsageInt()
{
    var sequences = GetSequences(
            3,
            (a, b) => a + 1 == b,
            new[] { 21, 4, 7, 9, 12, 22, 17, 8, 2, 20, 23 });

    foreach (var sequence in sequences)
        Console.WriteLine(string.Join(", ", sequence.ToArray()));
}

static void UsageChar()
{
    var list = new List<char>(
        "abcdefghijklmnopqrstuvwxyz".ToCharArray());

    var sequences = GetSequences(
            3,
            (a, b) => (list.IndexOf(a) + 1 == list.IndexOf(b)),
            "PleaseBeGentleWithMe".ToLower().ToCharArray());

    foreach (var sequence in sequences)
        Console.WriteLine(string.Join(", ", sequence.ToArray()));
}
it depends
+1  A: 

What about sorting the array then create another array that is the difference between each element the previous one


sortedArray = 8, 9, 10, 21, 22, 23, 24, 27, 30, 31, 32
diffArray   =    1,  1, 11,  1,  1,  1,  3,  3,  1,  1
Now iterate through the difference array; if the difference equlas 1, increase the count of a variable, sequenceLength, by 1. If the difference is > 1, check the sequenceLength if it is >=2 then you have a sequence of at at least 3 consecutive elements. Then reset sequenceLenght to 0 and continue your loop on the difference array.

sh_kamalh
A: 

Here is a solution I knocked up in F#, it should be fairly easy to translate this into a C# LINQ query since fold is pretty much equivalent to the LINQ aggregate operator.

#light

let nums = [21;4;7;9;12;22;17;8;2;20;23]

let scanFunc (mainSeqLength, mainCounter, lastNum:int, subSequenceCounter:int, subSequence:'a list, foundSequences:'a list list) (num:'a) =
        (mainSeqLength, mainCounter + 1,
         num,
         (if num <> lastNum + 1 then 1 else subSequenceCounter+1),
         (if num <> lastNum + 1 then [num] else subSequence@[num]),
         if subSequenceCounter >= 3 then
            if mainSeqLength = mainCounter+1
                then foundSequences @ [subSequence@[num]]
            elif num <> lastNum + 1
                then foundSequences @ [subSequence]
            else foundSequences
         else foundSequences)

let subSequences = nums |> Seq.sort |> Seq.fold scanFunc (nums |> Seq.length, 0, 0, 0, [], []) |> fun (_,_,_,_,_,results) -> results
Oenotria
A: 

Linq isn't the solution for everything, sometimes you're better of with a simple loop. Here's a solution, with just a bit of Linq to order the original sequences and filter the results

void Main()
{
    var numbers = new[] { 21,4,7,9,12,22,17,8,2,20,23 };
    var sequences =
        GetSequences(numbers, (prev, curr) => curr == prev + 1);
        .Where(s => s.Count() >= 3);
    sequences.Dump();
}

public static IEnumerable<IEnumerable<T>> GetSequences<T>(
    IEnumerable<T> source,
    Func<T, T, bool> areConsecutive)
{
    bool first = true;
    T prev = default(T);
    List<T> seq = new List<T>();
    foreach (var i in source.OrderBy(i => i))
    {
        if (!first && !areConsecutive(prev, i))
        {
            yield return seq.ToArray();
            seq.Clear();
        }
        first = false;
        seq.Add(i);
        prev = i;
    }
    if (seq.Any())
        yield return seq.ToArray();
}
Thomas Levesque
A: 

I thought of the same thing as Jon: to represent a range of consecutive integers all you really need are two measly integers! So I'd start there:

struct Range : IEnumerable<int>
{
    readonly int _start;
    readonly int _count;

    public Range(int start, int count)
    {
        _start = start;
        _count = count;
    }

    public int Start
    {
        get { return _start; }
    }

    public int Count
    {
        get { return _count; }
    }

    public int End
    {
        get { return _start + _count - 1; }
    }

    public IEnumerator<int> GetEnumerator()
    {
        for (int i = 0; i < _count; ++i)
        {
            yield return _start + i;
        }
    }

    // Heck, why not?
    public static Range operator +(Range x, int y)
    {
        return new Range(x.Start, x.Count + y);
    }

    // skipping the explicit IEnumerable.GetEnumerator implementation
}

From there, you can write a static method to return a bunch of these Range values corresponding to the consecutive numbers of your sequence.

static IEnumerable<Range> FindRanges(IEnumerable<int> source, int minCount)
{
    // throw exceptions on invalid arguments, maybe...

    var ordered = source.OrderBy(x => x);

    Range r = default(Range);

    foreach (int value in ordered)
    {
        // In "real" code I would've overridden the Equals method
        // and overloaded the == operator to write something like
        // if (r == Range.Empty) here... but this works well enough
        // for now, since the only time r.Count will be 0 is on the
        // first item.
        if (r.Count == 0)
        {
            r = new Range(value, 1);
            continue;
        }

        if (value == r.End)
        {
            // skip duplicates
            continue;
        }
        else if (value == r.End + 1)
        {
            // "append" consecutive values to the range
            r += 1;
        }
        else
        {
            // return what we've got so far
            if (r.Count >= minCount)
            {
                yield return r;
            }

            // start over
            r = new Range(value, 1);
        }
    }

    // return whatever we ended up with
    if (r.Count >= minCount)
    {
        yield return r;
    }
}

Demo:

int[] numbers = new[] { 21, 4, 7, 9, 12, 22, 17, 8, 2, 20, 23 };

foreach (Range r in FindConsecutiveRanges(numbers, 3))
{
    // Using .NET 3.5 here, don't have the much nicer string.Join overloads.
    Console.WriteLine(string.Join(", ", r.Select(x => x.ToString()).ToArray()));
}

Output:

7, 8, 9
20, 21, 22, 23
Dan Tao
A: 

Here's my LINQ-y take on the problem:

static IEnumerable<IEnumerable<int>>
    ConsecutiveSequences(this IEnumerable<int> input, int minLength = 3)
{
    int order = 0;
    var inorder = new SortedSet<int>(input);
    return from item in new[] { new { order = 0, val = inorder.First() } }
               .Concat(
                 inorder.Zip(inorder.Skip(1), (x, val) =>
                         new { order = x + 1 == val ? order : ++order, val }))
           group item.val by item.order into list
           where list.Count() >= minLength
           select list;
}
  • uses no explicit loops, but should still be O(n lg n)
  • uses SortedSet instead of .OrderBy().Distinct()
  • combines consecutive element with list.Zip(list.Skip(1))
Gabe
A: 

Here's a solution using a Dictionary instead of a sort... It adds the items to a Dictionary, and then for each value increments above and below to find the longest sequence.
It is not strictly LINQ, though it does make use of some LINQ functions, and I think it is more readable than a pure LINQ solution..

static void Main(string[] args)
    {
        var items = new[] { -1, 0, 1, 21, -2, 4, 7, 9, 12, 22, 17, 8, 2, 20, 23 };
        IEnumerable<IEnumerable<int>> sequences = FindSequences(items, 3);

        foreach (var sequence in sequences)
        {   //print results to consol
            Console.Out.WriteLine(sequence.Select(num => num.ToString()).Aggregate((a, b) => a + "," + b));
        }
        Console.ReadLine();
    }

    private static IEnumerable<IEnumerable<int>> FindSequences(IEnumerable<int> items, int minSequenceLength)
    {
        //Convert item list to dictionary
        var itemDict = new Dictionary<int, int>();
        foreach (int val in items)
        {
            itemDict[val] = val;
        }
        var allSequences = new List<List<int>>();
        //for each val in items, find longest sequence including that value
        foreach (var item in items)
        {
            var sequence = FindLongestSequenceIncludingValue(itemDict, item);
            allSequences.Add(sequence);
            //remove items from dict to prevent duplicate sequences
            sequence.ForEach(i => itemDict.Remove(i));
        }
        //return only sequences longer than 3
        return allSequences.Where(sequence => sequence.Count >= minSequenceLength).ToList();
    }

    //Find sequence around start param value
    private static List<int> FindLongestSequenceIncludingValue(Dictionary<int, int> itemDict, int value)
    {
        var result = new List<int>();
        //check if num exists in dictionary
        if (!itemDict.ContainsKey(value))
            return result;

        //initialize sequence list
        result.Add(value);

        //find values greater than starting value
        //and add to end of sequence
        var indexUp = value + 1;
        while (itemDict.ContainsKey(indexUp))
        {
            result.Add(itemDict[indexUp]);
            indexUp++;
        }

        //find values lower than starting value 
        //and add to start of sequence
        var indexDown = value - 1;
        while (itemDict.ContainsKey(indexDown))
        {
            result.Insert(0, itemDict[indexDown]);
            indexDown--;
        }
        return result;
    }
markt
A: 

Here's my shot at it:

public static class SequenceDetector
{
    public static IEnumerable<IEnumerable<T>> DetectSequenceWhere<T>(this IEnumerable<T> sequence, Func<T, T, bool> inSequenceSelector)
    {
        List<T> subsequence = null;
        // We can only have a sequence with 2 or more items
        T last = sequence.FirstOrDefault();
        foreach (var item in sequence.Skip(1))
        {
            if (inSequenceSelector(last, item))
            {
                // These form part of a sequence
                if (subsequence == null)
                {
                    subsequence = new List<T>();
                    subsequence.Add(last);
                }
                subsequence.Add(item);
            }
            else if (subsequence != null)
            {
                // We have a previous seq to return
                yield return subsequence;
                subsequence = null;
            }
            last = item;
        }
        if (subsequence != null)
        {
            // Return any trailing seq
            yield return subsequence;
        }
    }
}

public class test
{
    public static void run()
    {
        var list = new List<int> { 21, 4, 7, 9, 12, 22, 17, 8, 2, 20, 23 };
        foreach (var subsequence in list
            .OrderBy(i => i)
            .Distinct()
            .DetectSequenceWhere((first, second) => first + 1 == second)
            .Where(seq => seq.Count() >= 3))
        {
            Console.WriteLine("Found subsequence {0}", 
                string.Join(", ", subsequence.Select(i => i.ToString()).ToArray()));
        }
    }
}

This returns the specific items that form the sub-sequences and permits any type of item and any definition of criteria so long as it can be determined by comparing adjacent items.

Jacob O'Reilly