ansaurus

Question

SQL Server - Partitioned Tables vs. Clustered Index?

Answer 1

A:

If you are using the partitions in the select statements, then you cn gain some speed.

If you are not using it, only using "standard" selects, then you have no benefit.

On your original problem: I would recommend you option #1 with the non-clustered index on id included.

Biri 2008-09-23 12:57:42

Answer 2

+2 A:

A clustered index will give you performance benefits for queries when localising the I/O. Date is a traditional partitioning strategy as many D/W queries look at movements by date.

A rule of thumb for a partitioned table suggests that partitions should be around 10m rows in size.

It would be somewhat unusual to see much performance gain from a clustered index on a diverse analytic workload. The query optimiser will use a technique called 'Index Intersection' to select rows without even hitting the fact table. See Here for a post I did on another question that explains this in more depth with some links. A clustered index may or may not participate in the index intersection, so you may find that it gains you relatively little on a general query workload.

You may find circumstances in loading where clustered indexes give you some gain, particularly if you have derived calculations (such as Earned Premium) that are computed within the ETL process. In this case you may get some benefits. If you have a specific query that you know will be executed all the time it might make sense to use clustered indexes for this. Options #2 and #3 are only going to significantly benefit you if you expect this type of query to be the overwhelming majority of the work done by the application.

For a flexible system, a simple date range partition with an index on the ID (and date if the partitions hold a range would probably get you as good a performance as any. You might get some benefit from clustering the index limited circumstances. You might also get some mileage from building a cube over the data and ensuring that the aggregations are set up correctly for this query.

ConcernedOfTunbridgeWells 2008-09-23 13:01:23

Answer 3

A:

I would do the following:

Non-Clustered Index on [Id]
Clustered Index on [Date]
Convert the [sales] datatype to numeric instead of float

GateKiller 2008-09-23 13:03:21

Your last point is interesting. What kind of performance benefit would you expect from converting to numeric from float?

David Kreps 2008-09-23 13:42:40

You can be more precise about the data you are storing and the numeric data type is an exact number where as a float is an approximate number.

GateKiller 2008-09-23 18:04:46

Answer 4

+2 A:

This table is awesomely narrow. If the real table will be this narrow, you should be happy to have table scans instead of index->lookups.

I would do this:

CREATE TABLE Narrow
(
  [id] INT NOT NULL,
  [date] SMALLDATETIME NOT NULL,
  [sales] FLOAT NULL,
  PRIMARY KEY(id, date)  --EDIT, just noticed your id is not unique.
)

CREATE INDEX CoveringNarrow ON Narrow(date, id, sales)

This handles point queries with seeks and wide-range queries with limited scans against date criteria and id criteria. There is no per-record lookup from index. Yes, I've doubled the write time (and space used) but that's fine, imo.

If there's some need for a specific piece of data (and that need is demonstrated by profiling!!), I'd create a clustered view targetting that section of the table.

CREATE VIEW Narrow200801
AS
SELECT * FROM Narrow WHERE '2008-01-01' <= [date] AND [date] < '2008-02-01'
--There is some command that I don't have at my finger tips to make this a clustered view.

Clustered views can be used in queries by name, or the optimizer will choose to use the clustered views when the FROM and WHERE clause are appropriate. For example, this query will use the clustered view. Note that the base table is referred to in the query.

SELECT SUM(sales) FROM Narrow WHERE '2008-01-01' <= [date] AND [date] < '2008-02-01'

As index lets you make specific columns conveniently accessible... Clustered view lets you make specific rows conveniently accessible.

David B 2008-09-23 13:11:13

Thanks for the response. I am not familiar with clustered views. No clear results returned when I googled it. Can you provide / point me to some more information?

David Kreps 2008-09-23 13:57:45

Sure, here's msdnhttp://msdn.microsoft.com/en-us/library/aa933148.aspxThe big requirement is the schemabinding (which locks out changes to the dependent structures while this structure exists).

David B 2008-09-23 14:03:07

Answer 5

A:

Partition the table by date. Several horizontal partitions will be more performant than one large table with that many rows.

Thomas Wagner 2008-09-23 13:50:16

Answer 6

A:

Clustered index on the date column isn't good if you'll have inserts that will be inserted faster that the datetime resolution of 3.33 ms is. if you do you'll get 2 keys with the same value and your index will have to get another internal uniquifier which will increase its size.

i'd go with #2 of your options.

Mladen 2008-09-23 22:13:18

ansaurus

tags:

views:

answers:

SQL Server - Partitioned Tables vs. Clustered Index?

related questions