views:

1258

answers:

1

Microsoft in its MSDN entry about altering SQL 2005 partitions, listed a few possible approaches:

  • Create a new partitioned table with the desired partition function, and then insert the data from the old table into the new table by using an INSERT INTO...SELECT FROM statement.
  • Create a partitioned clustered index on a heap
  • Drop and rebuild an existing partitioned index by using the Transact-SQL CREATE INDEX statement with the DROP EXISTING = ON clause.
  • Perform a sequence of ALTER PARTITION FUNCTION statements.

Any idea what will be the most efficient way for a large scale DB (millions of records) with partitions based on the dates of the records (something like monthly partitions), where data spreads over 1-2 years?

Also, if I mostly access (for reading) recent information, will it make sense to keep a partition for the last X days, and all the rest of the data will be another partition? Or is it better to partition the rest of the data too (for any random access based on date range)?

+2  A: 

I'd recommend the first approach - creating a new partitioned table and inserting into it - because it gives you the luxury of comparing your old and new tables. You can test query plans against both styles of tables and see if your queries are indeed faster before cutting over to the new table design. You may find there's no improvement, or you may want to try several different partitioning functions/schemes before settling on your final result. You may want to partition on something other than date range - date isn't always effective.

I've done partitioning with 300-500m row tables with data spread over 6-7 years, and that table-insert approach was the one I found most useful.

You asked about how to partition - the best answer is to try to design your partitions so that your queries will hit a single partition. If you tend to concentrate queries on recent data, AND if you filter on that date field in your where clauses, then yes, have a separate partition for the most recent X days.

Be aware that you do have to specify the partitioned field in your where clause. If you aren't specifying that field, then the query is probably going to hit every partition to get the data, and at that point you won't have any performance gains.

Hope that helps! I've done a lot of partitioning, and if you want to post a few examples of table structures & queries, that'll help you get a better answer for your environment.

Brent Ozar