views:

145

answers:

1

This is my first real-world LINQ-to-SQL query. I was wondering if I am making any large, obvious mistakes.

I have a medium-large sized (2M+ records, adding 13k a day) table with data, dataTypeID, machineID, and dateStamp. I'd like to get the average, min, and max of data from all machines and of a specific dataType within a 4 hour period, going back for 28 days.

E.g

DateTime            Avg   Min   Max
1/1/10 12:00AM  74.2  72.1  75.7
1/1/10 04:00AM  74.5  73.1  76.2
1/1/10 08:00AM  73.7  71.5  74.2
1/1/10 12:00PM  73.2  71.2  76.1
etc..
1/28/10 12:00AM  73.1  71.3  75.5

So far I have only been able to group the averages by 1 hour increments, but I could probably deal with that if the alternatives are overly messy.

Code:

var q =
    from d in DataPointTable
    where d.dateStamp > DateTime.Now.AddDays(-28) && (d.dataTypeID == (int)dataType + 1)
    group d by new {
        d.dateStamp.Year,
        d.dateStamp.Month,
        d.dateStamp.Day,
        d.dateStamp.Hour
    } into groupedData
    orderby groupedData.Key.Year, groupedData.Key.Month, groupedData.Key.Day, groupedData.Key.Hour ascending
    select new {
        date = Convert.ToDateTime(
            groupedData.Key.Year.ToString() + "-" +
            groupedData.Key.Month.ToString() + "-" +
            groupedData.Key.Day.ToString() + " " + 
            groupedData.Key.Hour.ToString() + ":00"
            ),
        avg = groupedData.Average(d => d.data),
        max = groupedData.Max(d => d.data),
        min = groupedData.Min(d => d.data)
    };
+2  A: 

If you want 4 hour increments divide the hour by 4 (using integer division) and then multiply by 4 when creating the new datetime element. Note you can simply use the constructor that takes year, month, day, hour, minute, and second instead of constructing a string and converting it.

var q = 
    from d in DataPointTable 
    where d.dateStamp > DateTime.Now.AddDays(-28) && (d.dataTypeID == (int)dataType + 1) 
    group d by new { 
        d.dateStamp.Year, 
        d.dateStamp.Month, 
        d.dateStamp.Day, 
        Hour = d.dateStamp.Hour / 4
    } into groupedData 
    orderby groupedData.Key.Year, groupedData.Key.Month, groupedData.Key.Day, groupedData.Key.Hour ascending 
    select new { 
        date = new DateTime(
            groupedData.Key.Year, 
            groupedData.Key.Month, 
            groupedData.Key.Day, 
            (groupedData.Key.Hour * 4),
            0, 0), 
        avg = groupedData.Average(d => d.data), 
        max = groupedData.Max(d => d.data), 
        min = groupedData.Min(d => d.data) 
    };

To improve efficiency you might want to consider adding an index on the dateStamp column. Given that you're only selecting a potentially small range of the dates, using an index should be a significant advantage. I would expect the query plan to do an index seek for the first date, making it even faster.

tvanfosson
+1 for the index on `dateStamp`
lc
Thanks for the suggestions.I had tried "d.dateStamp.Hour / 4", but it is not legal.VS2008 is saying: "Invalid anonymous type member declarator. Anonymous type members must be declared with a member assignment, simple name, or member access."
PHiZiX
Sorry -- it needs a name...
tvanfosson
nice. thank you! great answer.
PHiZiX