ansaurus

Question

SQL Server aggregates for very large tables

Answer 1

A:

Without specifics on your table structures and physical environment and (non)clustered indexes etc. the first place I would look for bottlenecks is "Show Execution plan" for the query, also, Database Engine Tuning Advisor and SQL profiler. Hope this helps.

Ash Machine 2009-10-02 20:19:24

Answer 2

+2 A:

I think this is a perfect candidate for an olap cube. I have fact data with 100s of millions of rows. I was doing the kind of queries you described above and queries were coming back in minutes. I moved this into an OLAP cube and queries are now almost instantaneous. There is a bit of a learning curve for olap. I'd strongly suggest you find a tutorial on some simple cube building just to get your head around it. DBA colleagues had been telling me about cubes for years and I never quite got it. Now I don't know why I went so long without it.

In addition to OLAP, you may also want to research indexed views but if you are slicing the data in several ways, that may not be feasible.

Matt Wrock 2009-10-02 20:21:13

Correct me if I'm wrong, but OLAP cubes needs to be refreshed (or recompiled, I don't remember the correct word) for the data to be updated. So if Jody Powlette needs realtime data it might not be the best solution.

MaxiWheat 2009-10-02 20:25:46

Loading the data in a star schema could be thought of as the "poor man's OLAP".

Philip Kelley 2009-10-02 20:27:11

I agree with Matt on this one. Yes they do need to be reprocessed and redeployed with updates but if this table Jody speaks of is being used for reports they could schedule these specific updates as often as needed without interrupting service and access.

ajdams 2009-10-02 20:29:44

Also, for data this large, you want to make sure and partition the cube (possibly by month). That way when you do add data every day, you only have to reprocess the current month.

Matt Wrock 2009-10-02 20:43:00

Answer 3

A:

Depends on your index and schema

Anyway, your index for this query should be one of

CREATE INDEX IX_foo ON table (shapeID, price, colorID) INCLUDE (brandID, sizeID)
CREATE INDEX IX_foo ON table (shapeID, price, colorID, brandID, sizeID)

However, you added "additional complex where clause here" which mitigates against a good answer

My thinking:

The WHERE clause is important because this reduces the row counts
The ORDER BY is less important than the aggregate/WHERE
Cover the query to remove key lookups

Extra things:

The colorID in (1,2,3) is bad because it's an OR
Ensure parameter datatypes match column datatypes exactly to avoid implicit conversions
You can swap shapeID, Price and colorID around a wee bit to see what's best (or create a few indexes and see which one it uses)
DO you have a server bottleneck (eg running on SQL Express etc)?

gbn 2009-10-02 20:21:36

Answer 4

A:

If you're using SQL 2008 and have some specific often-used filtering, consider using filtered indexes (possibly in combination with INCLUDE indexes as gbn suggested).

Say you have only five sizeID values. You could break your current indexes into multiple filtered indexes (e.g., "WHERE sizeID = 1").

Using filtering in combination with INCLUDE can make your queries return much faster.

Reference: Exploring SQL Server 2008’s Filtered Indexes

Rob Garrison 2009-10-02 21:56:54

ansaurus

tags:

views:

answers:

SQL Server aggregates for very large tables

related questions