tags:

views:

267

answers:

4

What I'd like to be able to do is construct a LINQ query that retrieved me a few values from some DataRows when one of the fields changes. Here's a contrived example to illustrate:

Observation   Temp  Time
------------- ----  ------
Cloudy        15.0  3:00PM
Cloudy        16.5  4:00PM
Sunny         19.0  3:30PM
Sunny         19.5  3:15PM
Sunny         18.5  3:30PM
Partly Cloudy 16.5  3:20PM
Partly Cloudy 16.0  3:25PM
Cloudy        16.0  4:00PM
Sunny         17.5  3:45PM

I'd like to retrieve only the entries when the Observation changed from the previous one. So the results would include:

Cloudy        15.0  3:00PM
Sunny         19.0  3:30PM
Partly Cloudy 16.5  3:20PM
Cloudy        16.0  4:00PM
Sunny         17.5  3:45PM

Currently there is code that iterates through the DataRows and does the comparisons and construction of the results but was hoping to use LINQ to accomplish this.

What I'd like to do is something like this:

var weatherStuff = from row in ds.Tables[0].AsEnumerable()
                   where row.Field<string>("Observation") != weatherStuff.ElementAt(weatherStuff.Count() - 1) )
                   select row;

But that doesn't work - and doesn't compile since this tries to use the variable 'weatherStuff' before it is declared.

Can what I want to do be done with LINQ? I didn't see another question like it here on SO, but could have missed it.

+2  A: 

You could use the IEnumerable extension that takes an index.

var all = ds.Tables[0].AsEnumerable();
var weatherStuff = all.Where( (w,i) => i == 0 || w.Field<string>("Observation") != all.ElementAt(i-1).Field<string>("Observation") );
tvanfosson
Ah - good answer, I hadn't thought of that. One caveat though is that if your `IEnumerable` doesn't actually have indexed access like `List<T>`, then I think the performance will be O(N²).
Aaronaught
Noted -- I'm not sure what the EnumerableRowCollection's underlying storage mechanism is. I'd suspect that it's array-based, though.
tvanfosson
Thanks for the response. Wouldn't element 0 NOT make the list if it and element 1 were different? It would seem that it would be skipped over as written. Not complaining, mind you, just trying to understand. I've still got a lot to learn about LINQ. It isn't a "natural" thing for me at this point. Your idea didn't seem to work for me as written - not sure why but I'm still messing with it at this point. Thanks again.
itsmatt
I read your question as only "changed" values -- those different than the previous. If you wan to include element 0, then the condition is just different. I'll update.
tvanfosson
A: 

This is one of those instances where the iterative solution is actually better than the set-based solution in terms of both readability and performance. All you really want Linq to do is filter and pre-sort the list if necessary to prepare it for the loop.

It is possible to write a query in SQL Server (or various other databases) using windowing functions (ROW_NUMBER), if that's where your data is coming from, but very difficult to do in pure Linq without making a much bigger mess.


If you're just trying to clean the code up, an extension method might help:

public static IEnumerable<T> Changed(this IEnumerable<T> items,
    Func<T, T, bool> equalityFunc)
{
    if (equalityFunc == null)
    {
        throw new ArgumentNullException("equalityFunc");
    }
    T last = default(T);
    bool first = true;
    foreach (T current in items)
    {
        if (first || !equalityFunc(current, last))
        {
            yield return current;
        }
        last = current;
        first = false;
    }
}

Then you can call this with:

var changed = rows.Changed((r1, r2) =>
    r1.Field<string>("Observation") == r2.Field<string>("Observation"));
Aaronaught
Here we go again. How was this answer wrong/misleading/unhelpful?
Aaronaught
Thanks for the idea here. I'm going to try it out and see how your idea works. I do agree with your initial statement at least about the readability part (I can't speak intelligently on the performance aspects at this point). LINQ to me is still one of those things that feels a bit odd and awkward to write. I suspect that is mostly due to my lack of experience with it.
itsmatt
A: 

I think what you are trying to accomplish is not possible using the "syntax suggar". However it could be possible using the extension method Select that pass the index of the item you are evaluating. So you could use the index to compare the current item with the previous one (index -1).

Carlos Loth
+2  A: 

Here is one more general thought that may be intereting. It's more complicated than what @tvanfosson posted, but in a way, it's more elegant I think :-). The operation you want to do is to group your observations using the first field, but you want to start a new group each time the value changes. Then you want to select the first element of each group.

This sounds almost like LINQ's group by but it is a bit different, so you can't really use standard group by. However, you can write your own version (that's the wonder of LINQ!). You can either write your own extension method (e.g. GroupByMoving) or you can write extension method that changes the type from IEnumerable to some your interface and then define GroupBy for this interface. The resulting query will look like this:

var weatherStuff = 
  from row in ds.Tables[0].AsEnumerable().AsMoving()
  group row by row.Field<string>("Observation") into g
  select g.First();

The only thing that remains is to define AsMoving and implement GroupBy. This is a bit of work, but it is quite generally useful thing and it can be used to solve other problems too, so it may be worth doing it :-). The summary of my post is that the great thing about LINQ is that you can customize how the operators behave to get quite elegant code.

I haven't tested it, but the implementation should look like this:

// Interface & simple implementation so that we can change GroupBy
interface IMoving<T> : IEnumerable<T> { }
class WrappedMoving<T> : IMoving<T> {
  public IEnumerable<T> Wrapped { get; set; }
  public IEnumerator<T> GetEnumerator() { 
    return Wrapped.GetEnumerator(); 
  }
  public IEnumerator<T> GetEnumerator() { 
    return ((IEnumerable)Wrapped).GetEnumerator(); 
  }
}

// Important bits:
static class MovingExtensions { 
  public static IMoving<T> AsMoving<T>(this IEnumerable<T> e) {
    return new WrappedMoving<T> { Wrapped = e };
  }

  // This is (an ugly & imperative) implementation of the 
  // group by as described earlier (you can probably implement it
  // more nicely using other LINQ methods)
  public static IEnumerable<IEnumerable<T>> GroupBy<T, K>(this IEnumerable<T> source, 
       Func<T, K> keySelector) {
    List<T> elementsSoFar = new List<T>();
    IEnumerator<T> en = source.GetEnumerator();
    if (en.MoveNext()) {
      K lastKey = keySelector(en.Current);
      do { 
        K newKey = keySelector(en.Current);
        if (newKey != lastKey) { 
          yield return elementsSoFar;
          elementsSoFar = new List<T>();
        }
        elementsSoFar.Add(en.Current);
      } while (en.MoveNext());
      yield return elementsSoFar;
    }
  }
Tomas Petricek
Thanks, Tomas. That is an interesting approach, though it certainly is longer (and seemingly more complex) than the way it is currently implemented. I haven't tried it out yet, but will and appreciate your taking the time to post your idea.
itsmatt