views:

1160

answers:

2

Hello folks,

I have some questions about partitioning and strategy, how and when to use it. As I understood from Partitioned Tables and Indexes in SQL Server 2005, partitioning is not used only for manageability but also to improve performance on very large database tables (VLDB). We have a table with millions of records. This table store performance data, for example how many times was specified item be clicked and so on. We need evaluate this data for current month in real application on daily basis. What I want to prevent is to move data from one table to another, only because of performance. My idea was: I create partition on this VLDB by month for current year. Then I would create clustered index on date field of table and partition scheme. I don’t know if I understand it correctly, but what I except is, that index will be created per partition separately.

Generally how my code looks like in development phase.

/*
Maybe it looks like complicated, but instead of static upper bound, 
I use function to determine end of the month for the current year.

So for example (executed in year 2009)
    SELECT DATEADD(ms, -2, DATEADD(month,((DATEPART(YEAR, GetDate())-1901)*12)+1-1,31-1)) 
returns 
    2008-01-30 23:59:59.997
*/
CREATE PARTITION FUNCTION PartitionMonthlyCurrentYear(DATETIME) AS
RANGE LEFT FOR VALUES
(
    DATEADD(ms, -2, DATEADD(month,((DATEPART(YEAR, GetDate())-1901)*12)+1-1,31-1)),
    DATEADD(ms, -2, DATEADD(month,((DATEPART(YEAR, GetDate())-1900)*12)+1-1,31-1)),
    DATEADD(ms, -2, DATEADD(month,((DATEPART(YEAR, GetDate())-1900)*12)+2-1,31-1)),
    DATEADD(ms, -2, DATEADD(month,((DATEPART(YEAR, GetDate())-1900)*12)+3-1,31-1)),
    DATEADD(ms, -2, DATEADD(month,((DATEPART(YEAR, GetDate())-1900)*12)+4-1,31-1)),
    DATEADD(ms, -2, DATEADD(month,((DATEPART(YEAR, GetDate())-1900)*12)+5-1,31-1)),
    DATEADD(ms, -2, DATEADD(month,((DATEPART(YEAR, GetDate())-1900)*12)+6-1,31-1)),
    DATEADD(ms, -2, DATEADD(month,((DATEPART(YEAR, GetDate())-1900)*12)+7-1,31-1)),
    DATEADD(ms, -2, DATEADD(month,((DATEPART(YEAR, GetDate())-1900)*12)+8-1,31-1)),
    DATEADD(ms, -2, DATEADD(month,((DATEPART(YEAR, GetDate())-1900)*12)+9-1,31-1)),
    DATEADD(ms, -2, DATEADD(month,((DATEPART(YEAR, GetDate())-1900)*12)+10-1,31-1)),
    DATEADD(ms, -2, DATEADD(month,((DATEPART(YEAR, GetDate())-1900)*12)+11-1,31-1)),
    DATEADD(ms, -2, DATEADD(month,((DATEPART(YEAR, GetDate())-1900)*12)+12-1,31-1))
);

/*
Create scheme on the primary file group. I'm aware about performance issues of this.
*/
CREATE PARTITION SCHEME SchemeMonthlyCurrentYear 
AS PARTITION PartitionMonthlyCurrentYear
ALL TO ([PRIMARY]);

/*
Create clustered index on table and scheme
*/
CREATE CLUSTERED INDEX [IX_Log_Seiten_archive_Datum] ON [dbo].[Log_Seiten_archiv] 
(
    [Datum] DESC
)ON SchemeMonthlyCurrentYear(Datum)
GO

My questions:

  1. What you think about this approach?
  2. How can I remove table from specified partition scheme? If I drop index, still can’t drop scheme and function because there is still dependency on table *Log_Seiten_archiv*
  3. How can I assign table to use different partition function? Because of development I need often change how partitions function is defined, or create new one. How can I do this for existing table? For example I want change year on function I showed before.

Regards Anton Kalcik

+1  A: 
  1. You can't create SQL Server partition functions with non-deterministic functions like getdate(). If you populated your database with data spanning 10 years, and then simply stopped and watched the SQL Server run for a few years, you would see SQL Server moving that data from one partition to another as dates changed. SQL Server partitioning doesn't work that way. When the partition function is called, the data is placed into a specific partition, and that's the end of it. It's not moved again unless you want to manually move it.

  2. To remove partitioning on a table, apply a new clustered index that doesn't use a partition scheme. The data will be moved. (Be aware that it'll take a long time depending on the size of your data and the speed of your disks - for a multi-terabyte data warehouse, we're probably talking hours if not days.)

  3. To assign a different partition function, you apply a new clustered index using the new partition function. Again, though, if you change your partition functions often, it's going to take a long time to rebuild, and partitioning probably isn't the answer you're looking for.

Partitioning is an awesome solution to >100 million row databases with separate sets of drive arrays, but it sounds like you're just putting everything on the same set of drives, same filegroup. You're not going to see a big performance increase there.

Also, be aware that partitioning is only available in SQL Server Enterprise Edition, not Standard. It's available in Dev, but you can't use that in production due to licensing restrictions.

Brent Ozar
A: 

Hello Brent, thank you for your quick answer.

  1. You are true that developer is responsible how data will be transported into partitioned table. There are many approaches how to this and every of them has advantages and disadvantages. I found excellent article about this on Technet.

I understood now, how can I remove or reassign partition function.

Here example:

Create partition on existing table:

CREATE CLUSTERED INDEX [IX_Log_Seiten_archive_Datum_Kanzleinr] ON [dbo].[Log_Seiten_archiv_Daily] 
(
    [Datum] DESC,
    [kanzleinr] ASC
)ON SchemeDailyCurrentYear(Datum) 
GO

Drop partition on existing table:

DROP INDEX [IX_Log_Seiten_archive_Datum_Kanzleinr] ON [dbo].[Log_Seiten_archiv_Daily];

CREATE CLUSTERED INDEX [IX_Log_Seiten_archive_Datum_Kanzleinr] ON [dbo].[Log_Seiten_archiv_Daily] 
(
    [Datum] DESC,
    [kanzleinr] ASC
)ON [PRIMARY]
GO

Thanks for you help AKa

AKa