views:

97

answers:

3

I need to show the number of results for a given category, and hide any categories that give no results.

Example: This Yahoo!Jobs page shows the number of results in categories like City, Job Category, Experience etc.

I work in C#/Asp.Net, and fear that our server will choke without some serious caching and sql optimization.

How would you go forward creating a solution like this?

A: 

I think you will have to go into a heck of a lot more detail about your set-up. Are you using an ORM? Is the data already in memory? How is the request coming in? What specifically is your concern? Have you done any profiling to determine this is indeed a problem? I cannot stress that last one enough. Ensure that what you're afraid of is an actual problem before you try to find ways around it.

To attempt an answer however, if you do something like this

_categoryRepository
  .Where(category=>category.Matches(query))
  .Select(category.Items.Count())

Count() will only be executed for categories where category.Matches(query) returns true.

George Mauer
I'm bulding on a 7 year old solution with no ORM, no objects, just plain old datasets. The total number of rows is large (10K), so it's not an option to keep all the data in-memory. Refactoring/rebuilding to use DDD / linq would be great, but there's no time for it. So.. as far as I'm concerned, I need to add a query for each category item in each cateogory (select count(*) where city =='[city]'). And there will literally be hundreds of those "count-queryies" for each main query - that's my big worry. The site is slow enough today, so even if no profiling is done, I can guess the outcome.
comichael
@comichael - Where there's few options there's few solutions. If the ONLY thing you are willing to work with is SQL then the ONLY solution available is caching. At the very least build a category repository class which hides the raw SQL - it will be easier to cache at that level. Also, you should profile even if the site is slow - are you certain that the database is the bottleneck? In asp.net applications its usually something else.
George Mauer
+1  A: 

Relational databases are not suited for that kind of stuff. It's just the wrong tool for the job. Instead, take a look at Lucene, Solr, Sphinx, etc.

I personally recommend Solr, it's very easy to get started and with SolrNet you can write a faceted ASP.NET app in a few lines of code.

Disclaimer: I'm the author of said library.

Mauricio Scheffer
A: 

The typical way these counts are done is via caching. The model is to have a trigger on insert into the relevant tables, then update a calculated field with 'CurrentCount'.

You can add further levels of caching into the app itself, so that counts only get incremented every few hours, or similar.

It is significantly better than actually counting all the rows each time, but it does mean you need to know in-advance what sorts of data you will be counting, so you can pre-calculate it.

Noon Silk