tags:

views:

828

answers:

5

I have 2 tables, a "document" table and a "content" table. They look like this (simplified):

document table:
docID
docTitle

content table:
contentID
docID
dateAdded
content

For every content change, a new record is inserted into the "content" table. This way there is a complete history of all changes. I would like to get a list of all the documents, with the latest content joined. It should return the docID, docTitle, and the associated content record with the newest "dateAdded" value. My brain is failing me right now, how would I create this join?

A: 

Could you not just do a simple join, and order by date added, and grab only the first record?

SELECT docTable.docId, docTable.docTitle from docTable
INNER JOIN content ON content.docID = docTable.contentID
WHERE docTable.docId = <some id>
ORDER BY content.dateAdded DESC
Zoidberg
I thnk you want the order by to be DESC so the newest version is first. Also if you add "LIMIT 1" you will get a single row.
Craig
Your right about the ordering. As for the limit, your also right, I just left that out because i was unsure of the database technology. That command may vary (as far as I know).
Zoidberg
A: 

This is a 2 query solution:

First query:

select docID, max(dateadded) from [TblContent] group by docID

Second query:

select [TblDocument].* from [TblDocument]  
inner join [TblDocument] on [TblDocument].[Docid]=[FirstQuery].[DocID]
PowerUser
This would require a separate query for each row, not just two queries.
Christian Oudard
A: 

try this:

select d.id, d.docTitle, MAX(c.dateAdd)
from document_table d
left join content_table c
on d.id = c.docId
group by d.id

Here is the thinking behind it: suppose document table has record A related to content(1, 2, 3, 4) and B related to (5, 6, 7, 8)

document     content

A      1
   2
                 3
   4

B      5
   6
   7
   8

a inner join with max(dateadded) will give you

document     content  max(dateadded)

A      1   1-1-2009...

A      2   1-1-2009...

A      3   1-1-2009...

A      4   1-1-2009...

B      5   2-1-2009...

B      6   2-1-2009...

B      7   2-1-2009...

B      8   2-1-2009...

after group by document id you will get

document    content  max(dateadded)

A     1   1-1-2009...

B     5   2-1-2009...

note: content id does not necessary match the id of the max dateadded

ez
Is it possible to return documents that have no associated content records yet by changing this to an outer join, or will that change the behaviour? Also, can this return the contentID and content fields as well by including them in the select, or will that mess it up?
Jon Tackabury
Is this My Sql? Because if it isn't, depending on the database, you may not be able to select anything that is not in the group by clause.
Zoidberg
you can do left join, but you can't put the contentID column there.
ez
this solution does not "return ... the associated content record" as requested.
longneck
+5  A: 

This can be done with a subquery:

SELECT d.docID, docTitle, c.dateAdded, c.content
FROM document d LEFT JOIN content c ON c.docID = d.docID
WHERE dateAdded IS NULL
    OR dateAdded = (
        SELECT MAX(dateAdded)
        FROM content c2
        WHERE c2.docID = d.docID
    )

This is known as a "groupwise maximum" query

Edit: Made the query return all document rows, with NULLs if there is no related content.

Christian Oudard
Thanks for the name. Whenever I have a SQL question, I never know what to Google for, which makes it more difficult to find a solution. :)
Jon Tackabury
Would this query work when there are no content records for a document? I would still like it to list the document, and just return null for the dateAdded and content fields.
Jon Tackabury
updated the query to return all document rows
Christian Oudard
Saved me hours of grief, thanks!
Matt
+1  A: 

Use:

SELECT t.docid, 
       t.docTitle, 
       mc.dateAdded, 
       mc.content
  FROM DOCUMENT t
  JOIN (SELECT c.docid,
               c.content,
               MAX(c.dateAdded)
          FROM CONTENT c
      GROUP BY c.docid, c.content) mc ON mc.docid = t.docid 
                                     AND mc.dateadded = t.dateadded

This should be faster than a correlated subquery.

Alternative for when there are no content records for a document:

   SELECT t.docid, 
          t.docTitle, 
          mc.dateAdded, 
          mc.content
     FROM DOCUMENT t
LEFT JOIN (SELECT c.docid,
                  c.content,
                  MAX(c.dateAdded)
             FROM CONTENT c
         GROUP BY c.docid, c.content) mc ON mc.docid = t.docid 
                                     AND mc.dateadded = t.dateadded
OMG Ponies