views:

145

answers:

5

Hey, I have 2 tables in PostgreSql:

1 - documents: id, title
2 - updates: id, document_id, date

and some data:

documents:

| 1 | Test Title |

updates:

| 1 | 1 | 2006-01-01 |
| 2 | 1 | 2007-01-01 |
| 3 | 1 | 2008-01-01 |

So All updates are pointing to the same document, but all with different dates for the updates.

What I am trying to do is to do a select from the documents table, but also include the latest update based on the date.

How should a query like this look like? This is the one I currently have, but I am listing all updates, and not the latest one as the one I need:

SELECT * FROM documents,updates WHERE documents.id=1 AND documents.id=updates.document_id ORDER BY date

To include; The reason I need this in the query is that I want to order by the date from the updates template!

Edit: This script is heavily simplified, so I should be able to create a query that returns any number of results, but including the latest updated date. I was thinking of using a inner join or left join or something like that!?

+1  A: 

From the top of my head:

ORDER BY date DESC LIMIT 1

If you really want only id 1 your can use this query:

SELECT * FROM documents,updates 
    WHERE documents.id=1 AND updates.document_id=1 
    ORDER BY date DESC LIMIT 1

http://www.postgresql.org/docs/8.4/interactive/queries-limit.html

EricSchaefer
A: 

This should also work

SELECT * FROM documents, updates 
    WHERE documents.id=1 AND updates.document_id=1
    AND updates.date = (SELECT MAX (date) From updates)
David Harris
No, as written at present, this will not work. Your subquery returns the most recent "date" from the entire updates table irrespective of document, so this will only yield results if the documents.id you pick happens to have been updated on the most recent update date.
pilcrow
You are correct.The subquery should read SELECT MAX (date) From updates where id = 1)
David Harris
+2  A: 

You may create a derived table which contains only the most recent "updates" records per document_id, and then join "documents" against that:

SELECT d.id, d.title, u.update_id, u."date"
FROM documents d
LEFT JOIN
-- JOIN "documents" against the most recent update per document_id
(
SELECT recent.document_id, id AS update_id, recent."date"
FROM updates
INNER JOIN
(SELECT document_id, MAX("date") AS "date" FROM updates GROUP BY 1) recent
ON updates.document_id = recent.document_id
WHERE
  updates."date" = recent."date"
) u
ON d.id = u.document_id;

This will handle "un-updated" documents, like so:

pg=> select * from documents;
 id | title 
----+-------
  1 | foo
  2 | bar
  3 | baz
(3 rows)

pg=> select * from updates;
 id | document_id |    date    
----+-------------+------------
  1 |           1 | 2009-10-30
  2 |           1 | 2009-11-04
  3 |           1 | 2009-11-07
  4 |           2 | 2009-11-09
(4 rows)

pg=> SELECT d.id ...
 id | title | update_id |    date    
----+-------+-----------+------------
  1 | foo   |         3 | 2009-11-07
  2 | bar   |         4 | 2009-11-09
  3 | baz   |           | 
(3 rows)
pilcrow
A: 
select *
from documents
left join updates
  on updates.document_id=documents.id
  and updates.date=(select max(date) from updates where document_id=documents.id)
where documents.id=?;

It has the some advantages over previous answers:

  • you can write document_id only in one place which is convenient;
  • you can omit where and you'll get a table of all documents and their latest updates;
  • you can use more broad selection criteria, for example where documents.id in (1,2,3).

You can also avoid a subselect using group by, but you'll have to list all fields of documents in group by clause:

select documents.*, max(date) as max_date
  from documents
  left join updates on documents.id=document_id
  where documents.id=1
  group by documents.id, title;
Tometzky
+1  A: 

Use PostgreSQL extension DISTINCT ON:

SELECT  DISTINCT ON (documents.id) *
FROM    document
JOIN    updates
ON      updates.document_id = document_id
ORDER BY
        documents.id, updates.date DESC

This will take the first row from each document.id cluster in ORDER BY order.

Test script to check:

SELECT  DISTINCT ON (documents.id) *
FROM    (
        VALUES
        (1, 'Test Title'),
        (2, 'Test Title 2')
        ) documents (id, title)
JOIN    (
        VALUES
        (1, 1, '2006-01-01'::DATE),
        (2, 1, '2007-01-01'::DATE),
        (3, 1, '2008-01-01'::DATE),
        (4, 2, '2009-01-01'::DATE),
        (5, 2, '2010-01-01'::DATE)
        ) updates (id, document_id, date)
ON      updates.document_id = documents.id
ORDER BY
        documents.id, updates.date DESC
Quassnoi