views:

213

answers:

3

I have an application that records activity in a table (Oracle 10g). The logging records should be kept for at least 30 days. I expect about 20 million rows to be added to this table every month.

The DBA suggested that the table be split in partitions containing one week of data. The weekly maintenance script would then delete the oldest partition (leaving only 4 weeks of data in the table).

What would be the best way of partitioning this logging table?

A: 

20 million rows every month and you only have to keep 30 days of data? (That's about a months worth).

Even with 12 months worth of data it wouldn't be hard to query this table (as one big table) with the correct index. Inserting is no problemen either with 1 row in the logging table or 20 million.

Partitioning in Oracle is also a feature that needs to be paid for if I'm correct, so it's costly too (if you don't have a license already).

DvE
It's a requirement by the DBA
Philippe Leybaert
+2  A: 

Partitioning a table isn't hard - it appears that you will be removing the data on a weekly basis, so the partition clauses will look like

PARTITION "P2009_45"  VALUES LESS THAN 
(TO_DATE(' 2009-11-02 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')),
 PARTITION "P2009_46"  VALUES LESS THAN 
(TO_DATE(' 2009-11-09 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')),
... etc

where your partitioning column is your date column of interest in the table.

Additional comments:

  • If you can upgrade to 11g you can take advantage of interval partitioning, which is similar to this range partitioning, but Oracle will manage creating new partitions for you.
  • If you're going to routinely drop off partitions, I would advise making all indexes on the table locally-partitioned to avoid the rebuilds that would be necessary with global partitions after partition operations.
  • If you have a good idea of the number of log entries per month, and it stays relatively constant, you might consider using a sequence (as a primary key) that is capped at this number and then recycles back to 0. Then your logging statements must become "MERGE INTO... " statements that either create a new row or overwrite the row if it exists. This only guarantees that you'll retain the number of rows allowed by the sequence max value and NOT a certain time interval, but this might be an alternative to partitioning (which as DvE points out, is an extra-expense option)
dpbradley
+1  A: 

Hi Philippe,

The most likely partitioning scheme would be to range-partition your data on the creation date. Each week you would create a new partition and drop the oldest one. The impact will depend on how this table is used / indexed.

Since it is a logging table perhaps it is not indexed, in that case dropping a partition will have little impact: referencing objects won't be invalidated, the drop will be just require a partition lock (and the oldest partition shouldn't be inserted at that time).

If the table is indexed, you will have to decide if your indexes will be global or partitionned. Global indexes will have to be rebuilt when you drop a partition (which takes time, although 20M rows is still manageable). You can use the UPDATE GLOBAL INDEXES clause to keep the indexes valid after the partition drop.

Local indexes will be partitionned like the table and may be less efficient than global indexes (index range scans will have to scan each local index instead of a common index if you do not query by date). These indexes won't have to be updated after a partition drop.

Vincent Malgrat