views:

121

answers:

6

I have a postgres database with a user table (userid, firstname, lastname) and a usermetadata table (userid, code, content, created datetime). I store various information about each user in the usermetadata table by code and keep a full history. so for example, a user (userid 15) has the following metadata:

15, 'QHS', '20', '2008-08-24 13:36:33.465567-04'
15, 'QHE', '8', '2008-08-24 12:07:08.660519-04'
15, 'QHS', '21', '2008-08-24 09:44:44.39354-04'
15, 'QHE', '10', '2008-08-24 08:47:57.672058-04'

I need to fetch a list of all my users and the most recent value of each of various usermetadata codes. I did this programmatically and it was, of course godawful slow. The best I could figure out to do it in SQL was to join sub-selects, which were also slow and I had to do one for each code.

+1  A: 

I suppose you're not willing to modify your schema, so I'm afraid my answe might not be of much help, but here goes...

One possible solution would be to have the time field empty until it was replaced by a newer value, when you insert the 'deprecation date' instead. Another way is to expand the table with an 'active' column, but that would introduce some redundancy.

The classic solution would be to have both 'Valid-From' and 'Valid-To' fields where the 'Valid-To' fields are blank until some other entry becomes valid. This can be handled easily by using triggers or similar. Using constraints to make sure there is only one item of each type that is valid will ensure data integrity.

Common to these is that there is a single way of determining the set of current fields. You'd simply select all entries with the active user and a NULL 'Valid-To' or 'deprecation date' or a true 'active'.

You might be interested in taking a look at the Wikipedia entry on temporal databases and the article A consensus glossary of temporal database concepts.

Henrik Gustafsson
A: 

@henrik-gustafsson I didn't want to do an active flag because I'd have to make sure (programmatically) that there was only one active row for each code, but I could just put a trigger in so I just insert a new row and the DB handles marking it active and all others with the same code and userid inactive. then I can select just active tags and know I'll get one per code (unless someone/something evil has manually mucked with active flags)

adambox
+4  A: 

This is actually not that hard to do in PostgreSQL because it has the "DISTINCT ON" clause in its SELECT syntax (DISTINCT ON isn't standard SQL).

SELECT DISTINCT ON (code) code, content, createtime
FROM metatable
WHERE userid = 15
ORDER BY code, createtime DESC;

That will limit the returned results to the first result per unique code, and if you sort the results by the create time descending, you'll get the newest of each.

Neall
A: 

@neall, will that work when I want the latest of each code for all 1000 users? and how do I know how it decides which row of a given code to display? with group by, you have to resolve that ambiguity by using only aggregate functions. is the order by guaranteed to happen before the distinct on?

adambox
Yes, the ORDER BY happens first, so you always get the record for each code with the newest createtime.
Neall
A: 

A subselect is the standard way of doing this sort of thing. You just need a Unique Constraint on UserId, Code, and Date - and then you can run the following:

SELECT * 
FROM Table
JOIN (
   SELECT UserId, Code, MAX(Date) as LastDate
   FROM Table
   GROUP BY UserId, Code
) as Latest ON
   Table.UserId = Latest.UserId
   AND Table.Code = Latest.Code
   AND Table.Date = Latest.Date
WHERE
   UserId = @userId
Mark Brackett
A: 

@mark-brackett, that's pretty much how I'm doing it now and the query takes like 10 seconds or more. the problem is that it doesn't scale with our growing userbase. I need something that doesn't take time proportional to userbase size

adambox