ansaurus

Question

Answer 1

+1 A:

Are you experiencing a problem that can't be solved by adding another index to your table? COUNT(*) operations are usually O(log n) in terms of total rows, and O(n) in terms of returned rows.

Edit: What I mean is (in case I misunderstood your question)

Given this structure:

CREATE TABLE emails (
    id INT,
    .... OTHER FIELDS
)

CREATE TABLE filters (
    filter_id int,
    filter_expression nvarchar(max) -- Or whatever...
)

Create the table

CREATE TABLE email_filter_matches (
    filter int,
    email int,
    CONSTRAINT pk_email_filter_matches PRIMARY KEY(filter, email)
)

The data in this table would have to be updated every time a filter is updated, or when a new email is received.

Then, a query like

SELECT COUNT(*) FROM email_filter_matches WHERE filter = @filter_id

should be O(log n) with regard to total number of filter matches, and O(n) in regard to number of matches for this particular filter. Since your example shows only a small number of matches (which seems realistic when it comes to email filters), this could very well be OK.

If you really want to, of course you could create a trigger on the email_filter_matches table to keep a cached value in the filters table in sync, but that can be done the day you hit performance issues. It's not trivial to get these kinds of things right in concurrent systems.

erikkallen 2009-11-05 09:50:50

As I mentioned the filters are defined by users so where clauses are unpredictable and can be complex and time consuming for the server. There are also custom fields that make where clauses more unpredictable. And if you do a couple of count * where ... in each round trip against a lot of rows it will cause performance problems regardless of indexes, etcIndex would do if the query was predictable or at least simple

2009-11-05 10:36:24

Answer 2

A:

Here are a few ideas for speeding up count(*) at the data tier:

Keep the table and the clustered index as narrow as possible, so that more rows fit per page
Keep the filtering criteria as simple as possible, so the counting goes fast
Do what you can to make sure the rows to be counted are in memory before you start to count them (perhaps using pre-caching)
Make sure your hardware is optimized (enough RAM, fast enough disks, etc)
Consider caching results in separate tables

As an alternative, if only the filters change frequently and not the data itself, you might consider building a cube using Analysis Services, and run your queries against that.

RickNZ 2009-11-15 12:57:37

ansaurus

tags:

views:

answers:

Performance of sql Count*

related questions