ansaurus

Question

Selecting date intervals, doing it fast, and always returning the latest entry with the result.

Answer 1

A:

If you wait for postgresql 8.4 you might be able to make use of Window Functions

http://www.postgresql.org/docs/8.4/static/tutorial-window.html

http://www.postgresql.org/docs/8.4/static/functions-window.html

mikelikespie 2009-04-24 09:48:17

Answer 2

+1 A:

You can do this in a relatively straightforward way by creating a period table, which you can join with the accounts table to create one row per account per period.

Here's an example. Let's set up some temporary tables:

create table #balance (
    id int identity,
    balance float,
    date datetime,
    aid int
)

create table #period (
    id int identity,
    startdt datetime,
    enddt datetime
)

Enter some test data:

insert into #yourtable (balance, date, aid) values (4,'2009-01-01',1)
insert into #yourtable (balance, date, aid) values (5,'2009-01-10',1)
insert into #yourtable (balance, date, aid) values (6,'2009-01-10',1)
insert into #yourtable (balance, date, aid) values (7,'2009-01-16',1)
insert into #yourtable (balance, date, aid) values (2,'2009-01-01',2)
insert into #yourtable (balance, date, aid) values (3,'2009-01-10',2)
insert into #yourtable (balance, date, aid) values (4,'2009-01-10',2)
insert into #yourtable (balance, date, aid) values (5,'2009-01-16',2)

insert into #period (startdt, enddt) values ('2009-01-01','2009-01-06')
insert into #period (startdt, enddt) values ('2009-01-06','2009-01-11')
insert into #period (startdt, enddt) values ('2009-01-11','2009-01-16')
insert into #period (startdt, enddt) values ('2009-01-16','2009-01-21')

Now let's query all periods:

from #period p

Add one row for each balance before the end of the period:

left join #balance b1 on 
    b1.date <= p.enddt

Search for balances in between the balance from the first join, and the end of the period:

left join #balance b2 on 
    b2.aid = b1.aid
    and b1.id < b2.id
    and b2.date <= p.enddt

Then filter out the rows that are not the last balance for their period.

where
    b2.aid is null

The b2 join basically looks for the "in-between" value, and by saying it's id is null, you say no in-between row exists. The final query looks like this:

select 
    b1.aid
,   p.startdt
,   b1.balance
from #period p
left join #balance b1 on 
    b1.date <= p.enddt
left join #balance b2 on 
    b2.aid = b1.aid
    and b1.id < b2.id
    and b2.date <= p.enddt
where
    b2.aid is null
order by b1.aid, p.startdt

Note: the queries assume a balance with a later date always has a larger id. If you never have to balances with exactly the same end date, you can replace "b1.id < b2.id" with "b1.date < b2.date".

Andomar 2009-04-24 10:11:18

Answer 3

+2 A:

I would use Andomar's Period table idea, but I would try a slightly different final query. This assumes that your Account_Balances table has a PK on aid and date. If you ended up with two balances for the same account for the same exact date and time then you would get some duplicate rows.

SELECT
     P.start_date,
     P.end_date,
     AB1.account_id,
     AB1.balance
FROM
     Periods P
LEFT OUTER JOIN Account_Balances AB1 ON
     AB1.date <= P.end_date
LEFT OUTER JOIN Account_Balances AB2 ON
     AB2.aid = AB1.aid AND
     AB2.date > AB1.date AND
     AB2.date <= P.end_date
WHERE
     AB2.aid IS NULL

If the account has no rows before or during the given period you will not get a row back for it.

Tom H. 2009-04-24 14:22:41

ool, looks better than mine: you're not doing the "distinct aid", and the noone-in-between join is probably faster. Is it ok if I change my query based on yours?

Andomar 2009-04-24 21:54:57

@Andomar: Yep, feel free. Although sometimes one method might be faster than another depending on the data. Most of the times I find the LEFT OUTER JOIN to be faster though.

Tom H. 2009-04-24 22:25:16

Thanks, edited. Excpect for the date check, which didn't work for the test data because it has multiple balances with the same date.

Andomar 2009-04-25 19:13:34

Just to clarify, if there were a time portion to the dates, it would still work by using the latest balance for the given day. If two dates are EXACTLY the same though then SQL doesn't know which one is really wanted without clearer business rules.

Tom H. 2009-04-26 01:52:48

ansaurus

tags:

views:

answers:

Selecting date intervals, doing it fast, and always returning the latest entry with the result.

related questions