ansaurus

Question

SQL Server table partitioning based on a modulus function?

Answer 1

A:

From the documentation, it seems like you have to give values to the function:

To create 4 partitions...

CREATE PARTITION FUNCTION myRangePF1 (int)
AS RANGE LEFT FOR VALUES (1, 100, 1000);

Couldn't you just do your computations above this call and find the proper values to split on? Substitute the values into the call? Or am I missing why you want to use the modulus? Based on the possibility of your ID's having gaps, you may need to use some statistics math to find out where to partition.

CREATE PARTITION FUNCTION myRangePF1 (int)
AS RANGE LEFT FOR VALUES (@low, @Med, @High);

Sam 2010-02-01 18:19:54

Answer 2

+2 A:

Hash partitioning is not available in SQL Server 2005/2008. You must use range partitioning.

That being said, you should be aware that partitioning is primarily a storage option, see Partitioned Table and Index Concepts:

Partitioning makes large tables or indexes more manageable, because partitioning enables you to manage and access subsets of data quickly and efficiently, while maintaining the integrity of a data collection. By using partitioning, an operation such as loading data from an OLTP to an OLAP system takes only seconds, instead of the minutes and hours the operation takes in earlier versions of SQL Server. Maintenance operations that are performed on subsets of data are also performed more efficiently because these operations target only the data that is required, instead of the whole table.

As you can see, the introduction of partitioning in MSDN focuses on maintenance, manageability and data load. In my experience partitioning gives, at best, 0 performance gain. Specially in SQL 2005. Usualy it gives performance degradation. To improve performance you should use a correct clustered index and properly designed non-clustered indexes.

In SQL 2008 there are improvements in the parallel operators in regard to partitions if they are properly distributed from an IO point of view, see Designing Partitions to Improve Query Performance. Their benefit are marginal though and overshadowed by the benefits of a properly designed set of clustered and non-clustered indexes. Case in point a clustered index in (id, topic_id) where id is an identity is usefull solely for single item lookup by id. On the other hand a clustered index by (topic_id, id) would benefit any queries that look for specific topic(s). I don't know your system requirements and the queries you run, but 10M rows performance problems on such a narrow table smell like indexing ands querying issue, no partitioning issue.

Remus Rusanu 2010-02-01 20:36:02

Thanks! Changing the index from (id, topic_id) to (topic_id, id) sped up the queries noticeably. I wish I could give two accepted answers here, but Matt's answer was just what I was looking for.

_rusty 2010-02-03 15:30:55

Answer 3

A:

10 million rows isn't that many for SQL server to handle; regular index design would probably solve this without the need for partitioning. As has been noted, try clustering on different sets of columns; clustering on topicid, id seems like something to test out, especially if most queries have topicid as a criterion. A clustered index like that has approximately the same effect as paritioning, at least in that it groups the related rows of data together on disk and allows a range scan to fetch them quickly.

If that design works, all you have to worry about is fragmentation from inserts, but that's manageable. After getting the indexing right, make sure you have enough RAM, and that you don't have a disk bottleneck.

onupdatecascade 2010-02-02 00:30:15

Answer 4

+1 A:

You just need to create your modulus column as a PERSISTED computed column.

Blue Peter style, here's one I made earlier (although I'm not 100% sure I have the partition values clause right):

CREATE PARTITION FUNCTION [PF_PartitonFour] (int)
AS RANGE RIGHT
FOR VALUES (
  0,
  1,
  2)
GO

CREATE PARTITION SCHEME [PS_PartitionFourScheme]
AS PARTITION [PF_PartitonFour]
TO ([TestPartitionGroup1],
    [TestPartitionGroup2],
    [TestPartitionGroup3],
    [TestPartitionGroup4])
GO

CREATE TABLE [my_data] (
  [id] [int] IDENTITY(1,1) NOT NULL,
  [topic_id] [int] NULL,
  [data_value] [decimal](19, 5) NULL
  [PartitionElement] AS [topic_id] % 4 PERSISTED,
) ON [PS_PartitionFourScheme] (PartitionElement);
GO

Matt Whitfield 2010-02-02 01:12:50

Thanks! This is exactly what I was looking for.

_rusty 2010-02-03 15:29:10

Technically correct, but doesn't this miss the point of the OP's question (performance)?

onupdatecascade 2010-02-05 20:19:18

No, it's simply answering the question being asked, not the question that *I think* should be asked. Plus, somebody finding this question from google will not want to find a load of preaching, they will want to find how to partition a table based on a modulus function, which, correct me if i'm wrong, is the answer I gave.

Matt Whitfield 2010-02-05 22:10:58

Based on the OP's response, I see no point missed ;)

Rob Cooper 2010-02-06 06:46:04

OK, fair enough. I read "... I'm looking into partitioning the table to squeeze out some query performance." and went straight to, "well, partitioning's not what you want to do to fix that problem." :-)

onupdatecascade 2010-02-11 21:52:46

Answer 5

A:

Hi, How does partitioning improve query performance? Is it just enough that we include the partitioning column in the where clause or is there something more to be done to enhance query performance on a partitioned table? Thanks in advance..

Developer 2010-09-29 10:10:58

Ask this as a new question. Not an answer to somebody elses!

Martin Smith 2010-09-29 10:21:32

ansaurus

tags:

views:

answers:

SQL Server table partitioning based on a modulus function?

related questions