views:

566

answers:

3

I have a SQL table that contains data of the form:

Id int EventTime dateTime CurrentValue int

The table may have multiple rows for a given id that represent changes to the value over time (the EventTime identifying the time at which the value changed).

Given a specific point in time, I would like to be able to calculate the count of distinct Ids for each given Value.

Right now, I am using a nested subquery and a temporary table, but it seems it could be much more efficient.

SELECT [Id],
(
SELECT
TOP 1 [CurrentValue]
FROM [ValueHistory]
WHERE [Ids].[Id]=[ValueHistory].[Id]
AND [EventTime] < @StartTime
ORDER BY [EventTime] DESC
) as [LastValue]
INTO #temp
FROM [Ids]

SELECT [LastValue], COUNT([LastValue]) FROM #temp
GROUP BY [LastValue]

DROP TABLE #temp

+1  A: 

Here is my first go:

select ids.Id, count( distinct currentvalue)
from ids
join valuehistory vh on ids.id = vh.id
where vh.eventtime < @StartTime
group by ids.id

However, I am not sure I understand your table model very clearly, or the specific question you are trying to solve.

This would be: The distinct 'currentvalues' from valuehistory before a certain date that for each Id.

Is that what you are looking for?

Nathan Feger
+1  A: 

I think I understand your question.

You want to get the most recent value for each id, group by that value, and then see how many ids have that same value? Is this correct?

If so, here's my first shot:

declare @StartTime datetime
set @StartTime = '20090513'

select ValueHistory.CurrentValue, count(ValueHistory.id)
from
(
    select id, max(EventTime) as LatestUpdateTime
    from ValueHistory
    where EventTime < @StartTime
    group by id
) CurrentValues
inner join ValueHistory on CurrentValues.id = ValueHistory.id
and CurrentValues.LatestUpdateTime = ValueHistory.EventTime
group by ValueHistory.CurrentValue

No guarantee that this is actually faster though - for this to work with any decent speed you'll need an index on EventTime.

Dan Fuller
You could also group on ValueHistory.EventTimt here, and display it in the select.
Andomar
A: 

Let us keep in mind that, because the SQL language describes what you want and not how to get it, there are many ways of expressing a query that will eventually be turned into the same query execution plan by a good query optimizer. Of course, the level of "good" depends on the database you're using.

In general, subqueries are just a syntactically different way of describing joins. The query optimizer is going to recognize this and determine the most optimal way, to the best of its knowledge, to execute the query. Temporary tables may be created as needed. So in many cases, re-working the query is going to do nothing for your actual execution time -- it may come out to the same query execution plan in the end.

If you're going to attempt to optimize, you need to examine the query plan by doing a describe on that query. Make sure it's not doing full-table scans against large tables, and is picking the appropriate indices where possible. If, and only if, it is making sub-optimal choices here, should you attempt to manually optimize the query.

Now, having said all that, the query you pasted isn't entirely compatible with your stated goal of "calculat[ing] the count of distinct Ids for each given Value". So forgive me if I don't quite answer your need, but here's something to perf-test against your current query. (Syntax is approximate, sorry -- away from my desk).

SELECT [IDs].[Id], vh1.[CurrentValue], COUNT(vh2.[CurrentValue]) FROM
    [IDs].[Id] as ids JOIN [ValueHistory] AS vh1 ON ids.[Id]=vh1.[Id]
        JOIN [ValueHistory] AS vh2 ON vh1.[CurrentValue]=vh2.[CurrentValue]
GROUP BY [Id], [LastValue];

Note that you'll probably see better performance increases by adding indices to make those joins optimal than re-working the query, assuming you're willing to take the performance hit to update operations.

Brad B
you should wrap your sql in a 'code' tag.
Nathan Feger
Thanks, Nathan! Forgive me, I'm new here :-)
Brad B
No Problem. Welcome.
Nathan Feger
Good query, but I don't think "subqueries are just a syntactically different way of describing joins".
Andomar