ansaurus

Question

Effecient way to model aggregate data of a many-to-one relationship (e.g. votes count on a stackoverflow question)

Answer 1

+1 A:

It's unlikely that a join will be too slow in this case, especially if you have an index on (question) in the Votes table.

If it is REALLY too slow, you can cache the vote count in the Question table:

 id - title - votecount

You can update the votecount whenever you record a vote. For example, from a stored procedure or directly from your application code.

Those updates are tricky, but since you're not that worried about consistency, I guess it's ok if the vote is sometimes not exactly right. To fix any errors, you can periodically regenerate all cached counts like:

 UPDATE q
 SET votecount = count(v.question)
 FROM questions q
 LEFT JOIN votes v on v.question = q.id

The aggregate count(v.question) returns 0 if no question was found, as opposed to count(*), which would return 1.

If locks are an issue, consider using "with (nolock)" or "set transaction isolation level read uncommited" to bypass locks (again, based on data integrity being a low priority.)

As an alternative to nolock, consider "read committed snapshot", which is meant for databases with heavy read and less write activity. You can turn it on with:

ALTER DATABASE YourDb SET READ_COMMITTED_SNAPSHOT ON;

It is available for SQL Server 2005 and higher. This is how Oracle works by default, and it's what stackoverflow itself uses. There's even a coding horror blog entry about it.

Andomar 2009-05-28 21:57:10

right. i addressed the materializing the vote count suggestion in the question. i was wondering if there is another way as this causes double the writes (locking out any reads)i know with proper indexing it should be alright. but if i'm retrieving a lot of questions and perhaps i have several many to many relationships (e.g. votes and comment count), the joins become nasty

nategood 2009-05-28 22:17:09

Post edited. Be careful that you're not doing premature optimization; there has to be hard proof, backed by numbers, of performance issues before I'd move away from the normal join.

Andomar 2009-05-28 22:26:04

Answer 2

+1 A:

I used indexed views from sql 2005 all over the place for this kind of thing on a social networking site. Our load was definitely a high ratio of reads/writes so it worked well for us.

2009-05-28 22:08:33

I agree with hainstech. Create an indexed view of the Votes Table and have it aggregated by question and count.

JD 2009-05-29 00:39:32

Answer 3

A:

I would suggest keeping the vote in memory for the lifetime of the application. Why hit a db for something as simple as a count, when at some point you will have loaded the item once and asked what the initial amount was on a request basis. It also has alot to do with how you are implementing repositories, if your question object lazy loads votes but eager loads the count of votes then you can speed up the process while not having an issue about keeping it in memory. Still keep the votes in db, just maintain the count in your application

Brandon Grossutti 2009-05-28 22:18:47

ansaurus

tags:

views:

answers:

Effecient way to model aggregate data of a many-to-one relationship (e.g. votes count on a stackoverflow question)

related questions