tags:

views:

146

answers:

3

I'm trying to select a subgroup of a list where items have contiguous dates, e.g.

ID  StaffID  Title              ActivityDate
--  -------  -----------------  ------------
 1       41  Meeting with John    03/06/2010
 2       41  Meeting with John    08/06/2010
 3       41  Meeting Continues    09/06/2010
 4       41  Meeting Continues    10/06/2010
 5       41  Meeting with Kay     14/06/2010
 6       41  Meeting Continues    15/06/2010

I'm using a pivot point each time, so take the example pivot item as 3, I'd like to get the following resulting contiguous events around the pivot:

ID  StaffID  Title              ActivityDate
--  -------  -----------------  ------------
 2       41  Meeting with John    08/06/2010
 3       41  Meeting Continues    09/06/2010
 4       41  Meeting Continues    10/06/2010

My current implementation is a laborious "walk" into the past, then into the future, to build the list:

var activity = // item number 3: Meeting Continues (09/06/2010)

var orderedEvents = activities.OrderBy(a => a.ActivityDate).ToArray();

// Walk into the past until a gap is found
var preceedingEvents = orderedEvents.TakeWhile(a => a.ID != activity.ID);
DateTime dayBefore;
var previousEvent = activity;
while (previousEvent != null)
{
    dayBefore = previousEvent.ActivityDate.AddDays(-1).Date;
    previousEvent = preceedingEvents.TakeWhile(a => a.ID != previousEvent.ID).LastOrDefault();
    if (previousEvent != null)
    {
        if (previousEvent.ActivityDate.Date == dayBefore)
            relatedActivities.Insert(0, previousEvent);
        else
            previousEvent = null;
    }
}


// Walk into the future until a gap is found
var followingEvents = orderedEvents.SkipWhile(a => a.ID != activity.ID);
DateTime dayAfter;
var nextEvent = activity;
while (nextEvent != null)
{
    dayAfter = nextEvent.ActivityDate.AddDays(1).Date;
    nextEvent = followingEvents.SkipWhile(a => a.ID != nextEvent.ID).Skip(1).FirstOrDefault();
    if (nextEvent != null)
    {
        if (nextEvent.ActivityDate.Date == dayAfter)
            relatedActivities.Add(nextEvent);
        else
            nextEvent = null;
    }
}

The list relatedActivities should then contain the contiguous events, in order.

Is there a better way (maybe using LINQ) for this?

I had an idea of using .Aggregate() but couldn't think how to get the aggregate to break out when it finds a gap in the sequence.

+1  A: 

Somehow, I don't think LINQ was truly meant to be used for bidirectional-one-dimensional-depth-first-searches, but I constructed a working LINQ using Aggregate. For this example I'm going to use a List instead of an array. Also, I'm going to use Activity to refer to whatever class you are storing the data in. Replace it with whatever is appropriate for your code.

Before we even start, we need a small function to handle something. List.Add(T) returns null, but we want to be able to accumulate in a list and return the new list for this aggregate function. So all you need is a simple function like the following.

private List<T> ListWithAdd<T>(List<T> src, T obj)
{
    src.Add(obj);
    return src;
}

First, we get the sorted list of all activities, and then initialize the list of related activities. This initial list will contain the target activity only, to start.

List<Activity> orderedEvents = activities.OrderBy(a => a.ActivityDate).ToList();
List<Activity> relatedActivities = new List<Activity>();
relatedActivities.Add(activity);

We have to break this into two lists, the past and the future just like you currently do it.

We'll start with the past, the construction should look mostly familiar. Then we'll aggregate all of it into relatedActivities. This uses the ListWithAdd function we wrote earlier. You could condense it into one line and skip declaring previousEvents as its own variable, but I kept it separate for this example.

var previousEvents = orderedEvents.TakeWhile(a => a.ID != activity.ID).Reverse();
relatedActivities = previousEvents.Aggregate<Activity, List<Activity>>(relatedActivities, (items, prevItem) => items.OrderBy(a => a.ActivityDate).First().ActivityDate.Subtract(prevItem.ActivityDate).Days.Equals(1) ? ListWithAdd(items, prevItem) : items).ToList();

Next, we'll build the following events in a similar fashion, and likewise aggregate it.

var nextEvents = orderedEvents.SkipWhile(a => a.ID != activity.ID);
relatedActivities = nextEvents.Aggregate<Activity, List<Activity>>(relatedActivities, (items, nextItem) => nextItem.ActivityDate.Subtract(items.OrderBy(a => a.ActivityDate).Last().ActivityDate).Days.Equals(1) ? ListWithAdd(items, nextItem) : items).ToList();

You can properly sort the result afterwards, as now relatedActivities should contain all activities with no gaps. It won't immediately break when it hits the first gap, no, but I don't think you can literally break out of a LINQ. So it instead just ignores anything which it finds past a gap.

Note that this example code only operates on the actual difference in time. Your example output seems to imply that you need some other comparison factors, but this should be enough to get you started. Just add the necessary logic to the date subtraction comparison in both entries.

ccomet
"Your example output seems to imply that you need some other comparison factors" as I state in the question, there is a starting point: ID 3. The result should only have events that are contiguous from item 3.
Codesleuth
LukeH just cleared this up for me, sorry. I've changed the question to only include meetings now.
Codesleuth
@Codesleuth Well, as long as the only requirement is getting all activities with a sequential gapless point of time originating from a specified index (at the start, at the end, or anywhere in the middle), both LukeH and my solutions will do the trick perfectly. But I stand by my statement that LukeH's is more elegant, and also only requires one traversal (mine requires 2).
ccomet
@ccomet: Unfortunately, yours and LukeH's code doesn't match the result I require. Both will now (after I've cleared up the sample) return only the first element (ID 1).
Codesleuth
@Codesleuth I ran both of our codes using your new sample. With `activity` being an element with an ID of 3, I got identical output for both methods, and they were the 3 elements in your expected output (IDs 2, 3, and 4). I'm not sure where we're being different here. One slight difference is that in my initial implementation, I defined a property of `Date` instead of `ActivityDate` for my Activity class (I'm updating my snippets here to be consistent with your code now), but this wouldn't affect LukeH's code.
ccomet
@ccomet: My fault again, when I tested LukeH's code I was forgetting to put in the type filter (argh!) and you're quite right, it works like a charm. I'm going to mark that as the answer as it's what I've got using now, thanks for all your help :)
Codesleuth
Hah, no worries, I'm just glad everything's working for you, @Codesleuth.
ccomet
+1  A: 

In this case I think that a standard foreach loop is probably more readable than a LINQ query:

var relatedActivities = new List<TActivity>();
bool found = false;

foreach (var item in activities.OrderBy(a => a.ActivityDate))
{
    int count = relatedActivities.Count;
    if ((count > 0) && (relatedActivities[count - 1].ActivityDate.Date.AddDays(1) != item.ActivityDate.Date))
    {
        if (found)
            break;

        relatedActivities.Clear();
    }

    relatedActivities.Add(item);
    if (item.ID == activity.ID)
        found = true;
}

if (!found)
    relatedActivities.Clear();

For what it's worth, here's a roughly equivalent -- and far less readable -- LINQ query:

var relatedActivities = activities
    .OrderBy(x => x.ActivityDate)
    .Aggregate
    (
        new { List = new List<TActivity>(), Found = false, ShortCircuit = false },
        (a, x) =>
        {
            if (a.ShortCircuit)
                return a;

            int count = a.List.Count;
            if ((count > 0) && (a.List[count - 1].ActivityDate.Date.AddDays(1) != x.ActivityDate.Date))
            {
                if (a.Found)
                    return new { a.List, a.Found, ShortCircuit = true };

                a.List.Clear();
            }

            a.List.Add(x);
            return new { a.List, Found = a.Found || (x.ID == activity.ID), a.ShortCircuit };
        },
        a => a.Found ? a.List : new List<TActivity>()
    );
LukeH
As ccomet also mentions, the example results shown in your question don't match your spec or code. My code should generate the same results as yours (that is, they should match your spec). You'll probably need some additional logic if you want the results to match your example results.
LukeH
This is much, much cleaner than mine. Probably much faster, and it definitely looks fancier.
ccomet
@LukeH "the example results shown in your question don't match your spec or code" well I've been using it and it works perfectly. I'm not sure what you mean by this.
Codesleuth
@Codesleuth: Your example source data shows a continuous run of dates from 07/06/2010 to 11/06/2010. If you only group on `ActivityDate` then your results will contain records 1 to 5 (if the pivot ID is 3). Your example results only contain records 2, 3 and 4 - to achieve that you would need to be grouping on something else too. (`Type` maybe?)
LukeH
@LukeH: Ahhh I see now. Sorry, I shouldn't have left the `Type` in the question - I've already filtered those out by the time it gets to this point. I'll restructure the question a little bit to make more sense.
Codesleuth
+1  A: 

Here's an implementation:

public static IEnumerable<IGrouping<int, T>> GroupByContiguous(
  this IEnumerable<T> source,
  Func<T, int> keySelector
)
{
   int keyGroup = Int32.MinValue;
   int currentGroupValue = Int32.MinValue;
   return source
     .Select(t => new {obj = t, key = keySelector(t))
     .OrderBy(x => x.key)
     .GroupBy(x => {
       if (currentGroupValue + 1 < x.key)
       {
         keyGroup = x.key;
       }
       currentGroupValue = x.key;
       return keyGroup;
     }, x => x.obj);
}

You can either convert the dates to ints by means of subtraction, or imagine a DateTime version (easily).

David B