views:

69

answers:

4

Hi, I am looking at some feature which will enable me to partition tables horizontally, i.e. the context of me referring to a table is only a sub-set of the entire set of records in a table. This could be a typical scenario in a SaaS model since data of one account is not of significance to another. Let us say, there is an account id attached to a transaction table. Once I log into my account, the account id is set. The searches always end with AND accountid = 25. But, the query execution plan will consider the millions of records in the table which are related to multiple accounts, affecting the performance of the query execution. Is there some way of saying that the table level operations of the execution plan need to be carried out only within the partition defined by accountid = 25 so that a table scan is only the 100 records that qualify under accountid = 25.

A: 

For Microsoft SQL Server, have a look at CREATE PARTITION FUNCTION here

Chris Bednarski
+5  A: 

It sounds to me that you're less in need of partitioning and more in need of indexes on your accountid column. If your queries that include accountid are scanning entire tables then you're most likely missing relevant indexes.

Will A
Agree, this is an indexing issue not a partitioning one
gbn
+3  A: 

You need to modify your table(s) to have account_id as the first column in the clustered index. Simply adding a non-clustered index on account_id will not suffice, because queries will reach the index tipping point and ignore the index. Also, partitioning the table on account_id will not help on its own. Partitioning is a storage and ETL solution, not a performance one.

So if you currently have a table name Transactions currently defined as:

create table Transactions (
  TransactionId int not null primary key,
  TransactionDate datetime not null,
  Amount money not null,
  AccountId int not null,
  constraint FKAccountId 
     foreign key AccountId
     references Accounts(AccountId));

It would have to be changed so that the primary key is not clustered and the clustering index is on (AccountId, TransactionId):

create table Transactions (
  TransactionId int not null ,
  TransactionDate datetime not null,
  Amount money not null,
  AccountId int not null,
  constraint FKAccountId 
     foreign key AccountId
     references Accounts(AccountId),
  constraint PKTransactionId
     primary key nonclustered (TransactionId));
create clustered index cdxTransactions
  on Transactions (AccountId, TransactionId);

This is just an example, I can't claim that I can model out of the blue yonder your correct data model. But the idea is that if your prevalent access pattern always filters by a column, that column is usually required to be part of the clustered index in a leftmost position. Only this way the query can do a range scan that limits all data read only to the relevant Account.

Remus Rusanu
A: 

You can review few things :-
1. Filtered indexes - new feature
2. Partition your table by accountId/clientId and place each partition on separate filegroup and in turn supplying more spindles (i.e. disks) to bigger accounts.
ps: note there is a maximum limit on how many partitions you can have for a table i.e. 1000.

Nikhil S