views:

305

answers:

5

Example

There is an application that measures temperature in each town of the world. Each measurement is taken every 5 minutes and written in to the Measurement table.

CREATE TABLE [dbo].[Measurement](
    [MeasurementID] [int] IDENTITY(1,1) NOT NULL,
    [Town] [varchar](50) NOT NULL,
    [Date] [datetime] NOT NULL,
    [Temp] [int] NOT NULL,
CONSTRAINT [PK_Measurement] PRIMARY KEY CLUSTERED 
(
    [MeasurementID] ASC
)) ON [PRIMARY]

Question

What is the most efficient query to get a list of the towns and their current temperature?

Assume there are 100k towns and 10 million records

NOTE: I have added a couple of possible answers, but there are probably other options.

A: 
select s.*
from Measurement s
where exists ( 
   select 1
   from Measurement s1
   where s.Town = s1.Town
   group by s1.Town
   having max( s1.Date )= s.Date)
   order by s.Town
Andrew Rimmer
A: 
select *
from
(
    select distinct *, --Keyword,Total,CreatedOn,EngineInstanceID,
    Rank() over (PARTITION by Town order by Date DESC) as DateOrder
    from Measurement
    where Town is not null
) CurrentMeasurement
where DateOrder = 1
Andrew Rimmer
+3  A: 

Here are a couple that should work:

SELECT
m1.Town, m1.Temp
FROM
Measurement AS m1
LEFT JOIN
Measurement AS m2
ON
m1.Town = m2.Town
AND m1.Date < m2.Date
WHERE
m2.MeasurementID IS NULL
ORDER BY m1.Town


You'll need an index on Town and Date.

This technique is especially useful for early versions of MySQL, which couldn't handle the more obvious

SELECT Town, Temp
FROM Measurement AS m1
WHERE NOT EXISTS (
SELECT 1 FROM Measurement
WHERE Town = m1.Town
AND Date > m1.date
)
ORDER BY Town

le dorfier
Interesting approach. Do you know if it is more/less expensive that joining to a subquery that grabs the max date for each town?
Rick
I generally choose my query design by testing alternatives, and the NULL test always seems as fast as any. Then NOT EXISTS. Aggregate functions (like MAX) can sometimes give the query optimizer an excuse for reading multiple records which in this case aren't needed (with good indexing).
le dorfier
A: 
select m.town, m.temperature, m.date
from Measurement m
where m.date = (select max(m2.date) from Measurement m2 where m2.town = m.town)
order by 1
Gordon Bell
You're missing a parenthesis.
le dorfier
Not - I didn't see the scroll.
le dorfier
+1  A: 

Good to see so many ways to skin this cat. Here's one using a CTE (you can also nest the query for more ANSI-ism, but I find CTEs great to avoid a lot of indenting and declaring things up front makes it pretty readable up top and down below):

WITH LastMeasurements AS (
    SELECT [Town], MAX([Date]) AS LastMeasurementDate
    FROM [Measurement]
    GROUP BY [Town]
)
SELECT [Measurement].Town, [Measurement].[Date], [Measurement].Temp
FROM [Measurement]
INNER JOIN LastMeasurements
    ON [Measurement].[Town] = LastMeasurements.[Town]
    AND [Measurement].[Date] = LastMeasurements.LastMeasurementDate

What I like about the explicit seeking back technique is that it easily gives you access to all the information in the top row selected for the group and is very flexible in changing the grouping and low on repeating yourself.

The optimizer tends to perform these pretty quickly on SQL Server - like most solutions, if you have an index on Town, Date, Temp this will be covering and will run super fast. Even if it's just on Town, Date, the bulk of the work in the GROUP BY can be done super fast anyway.

Cade Roux