I'm using ASP.net and an SQL database. I have a blog like system where a number of comments are made against a post and I want to display the number of those comments next to the post. To get that number I could either hold it in the post record and add/subtrack when a comment is added or deleted or I could use the SQL to calculate the number of comments using a query each time a user hits the page. The latter seems to be a bad idea as its going to hit my SQL database harder however holding the number against the record feels like it could be error prone. What do you think is best coding practice in this case?
I think the SQL statement should be fine. The other is duplication of data you already have. A count query should be quick.
Always start with a normalized database (your second option). Only denormalize if you have an absolute necessity for performance reasons. Designing it in the denormalized way (which is error-prone as you guessed) is premature optimization. With proper indexes it should be fine calculating the number on the fly.
Don't optimize prematurely. Use the simple solution and pagefault in optimizations only when they're needed.
I would query the database each time you want the information. I would revisit it later if you find that performance is lacking (optimize later). For the traffic most blog type applications will get, that should be sufficient.
Depending on how your data model looks...Don't add the total post count to the main thread record, it is error prone, you should calculate the comment count when needed based on the thread ID, IMHO
Caching the pages and updating that cache as comments are added/removed would be a good option a long with the SQL count query if you are that worried about the number of queries happening against the db..
Perhaps get the count back as part of the main thread query so as to limit the number of hits on the actual DB from the webserver. But I would always query the actual count and not try and keep it in a field, data will eventually get out of sync as that is reality.
I usually use an indexed view for this kind of thing. This allows you to denormalize the data for quick retrieval, but there is no way for it to get out of sync. Folks will also not be confused and think the view is the master of the data. I have mostly used the standard sku of SS2K5, so I have to specify the (noexpand) hint to get it to actually use the index on the view (enterprise will do it automatically). So for standard sku, I always create a wrapper view that everyone hits so I know the hint is always in place.
Coding this on the web page, so hopefully no syntax errors ;)
create view postCount__
as
select
threadId
,postCount=count_big(*)
from thread
group by threadId
go
create unique clustered index postCount__xpk_threadid on postCount__(threadId)
go
create view postCount
as
select
threadId
,postCount=cast(postCount as int)
from postCount__ with (noexpand)
go
So I use a nomenclature on the actual indexed view to let everyone know not to query it directly. Instead they look for the associated wrapper view that enforces the noexpand hint. Using an indexed view forces you to do count_big, so I often cast down to int in the wrapper view to be able to keep our asp.net code lazily using 32 bit ints. It would be better to omit the cast, but it hasn't been of any significant impact for me.
EDIT - I can tell you that forum software always denormalizes the post count to the thread table. It kills the DB to continually count the post count on every page view if you have an active forum. I love that mssql has indexed views so you can define the denormalization declaratively rather than maintain it yourself.