views:

81

answers:

1

Alright, this problem is a little complicated, so bear with me.

I have a table full of data. One of the table columns is an EntryDate. There can be multiple entries per day. However, I want to select all rows that are the latest entry on their respective days, and I want to select all the columns of said table.

One of the columns is a unique identifier column, but it is not the primary key (I have no idea why it's there; this is a pretty old system). For purposes of demonstration, say the table looks like this:

create table ExampleTable (
    ID int identity(1,1) not null,
    PersonID int not null,
    StoreID int not null,
    Data1 int not null,
    Data2 int not null,
    EntryDate datetime not null
)

The primary key is on PersonID and StoreID, which logically defines uniqueness.

Now, like I said, I want to select all the rows that are the latest entries on that particular day (for each Person-Store combination). This is pretty easy:

--Figure 1
select PersonID, StoreID, max(EntryDate)
from ExampleTable
group by PersonID, StoreID, dbo.dayof(EntryDate)

Where dbo.dayof() is a simple function that strips the time component from a datetime. However, doing this loses the rest of the columns! I can't simply include the other columns, because then I'd have to group by them, which would produce the wrong results (especially since ID is unique).

I have found a dirty hack that will do what I want, but there must be a better way -- here's my current solution:

select
    cast(null as int) as ID,
    PersonID,
    StoreID,
    cast(null as int) as Data1,
    cast(null as int) as Data2,
    max(EntryDate) as EntryDate
into #StagingTable
from ExampleTable
group by PersonID, StoreID, dbo.dayof(EntryDate)

update Target set
    ID = Source.ID,
    Data1 = Source.Data1,
    Data2 = Source.Data2,
from #StagingTable as Target
inner join ExampleTable as Source
    on Source.PersonID = Target.PersonID
   and Source.StoreID = Target.StoreID
   and Source.EntryDate = Target.EntryDate

This gets me the correct data in #StagingTable but, well, look at it! Creating a table with null values, then doing an update to get the values back -- surely there's a better way to do this? A single statement that will get me all the values the first time?

It is my belief that the correct join on that original select (Figure 1) would do the trick, like a self-join or something... but how do you do that with the group by clause? I cannot find the right syntax to make the query execute.

I am pretty new with SQL, so it's likely that I'm missing something obvious. Any suggestions?

(Working in T-SQL, if it makes any difference)

+2  A: 

There is no really "elegant" way. When you have group Group By queries like this, you're going to have either sub-queries or temporary tables.

This will work:

Select ID, A.PersonID, A.StoreID, Data1, Data2, A.EntryDate
From ExampleTable As A
Inner Join
    (select PersonID, StoreID, max(EntryDate) As EntryDate
    from ExampleTable
    group by PersonID, StoreID, dbo.dayof(EntryDate)) As B
  On ExampleTable.PersonID = B.PersonID 
    And ExampleTable.StoreID = B.StoreID 
    And ExampleTable.EntryDate = B.EntryDate

You should not be too down on the solution you came up with though. Using temporary tables never looks elegant, but it is efficient; I would not be surprised if your original two-step solution is actually faster than my one-step solution. (you'll have to test to know for sure.)

Patrick Karcher
Thanks for the heads up about performance. I tried it both ways, and you're right -- the temp table method finished 1 second faster (83 vs 84 seconds total)! But I'll take the conciseness of your solution over that boost (almost) any day...
Ian Henry