views:

72

answers:

1

Hi everyone!

I need to know if there's a best way to optimize this kind of DB model :

Here's my tables :

[category]
idCategory
name

[postCategory] (a post can be in more than 1 category)
idCategory
idPost

[post]
idPost
post

[comment]
idComment
idPost
inputDate
comment

I'm going to have to display all the posts, from a specific category, within a specific time range (the time is from "comments"). The time range is fixed (1 day, 1 week, 1 month, 1 year). Here's what I came with :

SELECT DISTINCT(post.idPost), post.post 
from post 
INNER JOIN comment ON post.idPost = comment.idPost 
INNER JOIN postCategory ON postCategory.idPost = post.idPost 
WHERE postCategory.idCategory = <myCategoryId> 
    AND comment.inputDate >= <today - time range>

Let say that I wish to support 10k posts and 500k comments... Is there a way to optimize this (besides using indexes)? Would you use a stored proc, a query with temp tables, add "precalculated" fields somewhere... ?

Thanks a lot! :)

A: 

I would compute your <today - time range> portion client side, before even connecting to the database.

Beyond that, it's going to depend on what indexes you have, what load your server has (what it keeps cached in memory), and the amount of data in each table (how many comments per typical post, how many posts per category, etc). In other words, you need to profile. Assuming all that is moot (it's not!), a decent query optimizer should be able to pretty much take care of any thing else.

There are a couple things I'd do differently out of habit, but they shouldn't matter in this case without knowing your system more. The main one though is that I'd think about volume.

In general, I like to express my queries such that if the joins are done in order, the result set is kept as small as possible for as long as possible. In this case, that would likely mean listing the postCategory join above the comments join, and moving the " = <MyCategoryID>" condition up to be part of the join expression.

Joel Coehoorn
Mmm ok.. Is it a good idea to try the model with dummy data inserted by an automatic script or something? Or I could just get different results from real life data anyway?
mrmuggles
dummy data is better than no data, especially if you can be reasonably confident that is somewhat representative of actual data.
Joel Coehoorn
I'm going to try that way and ask some help if I see a problem. Thanks!
mrmuggles
Well- take anything I say with a grain of salt. My experience is more in SQL Server dbs. Most of this _should_ apply across systems, but sometimes things don't work as you expect.
Joel Coehoorn