ansaurus

Question

Answer 1

+2 A:

In my opinion, the decision to partition should be based more on the need for table maintenance activities (purging, archiving, etc.) than performance. In your case I'm guessing you'll probably be performing index range scans on the samples for a date range, so make sure the date index is locally (instead of globally) partitioned also . This will also eliminate the need to rebuild the index if you truncate a partition. I'd also guess that the joins on the PK will use seeks by rowid so that will happen after the index range scan and there's no way partitioning can affect this.

[Edit]

With regard to relating the PK and CREATED_ON columns, I work with a couple of systems that construct the numeric key from a sequence that is prefixed with YYYYMMDD and that works pretty well. You'll have to:

Liberally estimate the number of samples you'll have per day
Define a sequence that has this as a maximum value and then cycles back to 0
Have a function that returns YYYYMMDD || {sequence value left-padded with
zeros to the appropriate fixed length} that is called from a trigger or application code when the key is needed

Some would disagree with embedding meaning in the key, but in practice it is useful to look at a sample ID and have an idea of when it was processed

dpbradley 2009-08-14 15:43:09

Hmmm, well I profoundly disagree with that. You only need high performance purging (or loading) on a record once, but you need select performance every time you need to find it. If I had to chose one or the other, I'd partition for query performance.

David Aldridge 2009-08-14 15:48:51

Oh, I wrote something a while ago on those lines http://oraclesponge.wordpress.com/2005/11/14/choosing-partitioning-keys-for-etl-or-reporting/

David Aldridge 2009-08-14 15:50:04

Yeah, you're right, I'll edit to make clear it's an opinion... still believe it however...:-)

dpbradley 2009-08-14 15:51:39

@David - good article, thanks, but doesn't this apply more to denormalized DW systems?

dpbradley 2009-08-14 15:58:49

@dpbradley - partitioning is generally more relevant to datawarehouses than it is to OLTP systems.

APC 2009-08-14 21:40:21

@APC - Right, I agree - I support a few mostly-OLTP systems that use partitioning so older transactional data can roll off without too much operational impact. Partitioning generally doesn't help our performance, and actually creates some interesting performance problems when data starts going into a new partition before the statistics have "caught up"

dpbradley 2009-08-15 11:35:39

Answer 2

+1 A:

It's pretty tricky, to be honest. Multicolumn partitioning is one option, whereby you create range-based partitions on more than one column. In 11g you can impliment this either as partitioning on Column A and subpartitioning on Column B, but in 10g you have to partition by range on the two columns together. I think that the tricky part is to know what boundary to partition on because you probably want the two partitioning schemes to "sync".

David Aldridge 2009-08-14 15:47:03

Answer 3

+1 A:

In this case to speed up performance on joins on "table_id" you should also store corresponding "created_on" in tables that you will mostly join. If you do that you can always join on both "table_id" and "created_on" so your "PARTITION RANGE ALL" turns into "PARTITION RANGE SINGLE". You can measure speed gains and weigh them against additional storage costs.

Edit:

How to keep both fields increasing together:

ALTER TABLE my_table MODIFY created_on DEFAULT SYSDATE;

And fill ID from sequence in all your inserts.

jva 2009-08-14 20:38:40

Joining tables on a DATE column doesn't feel right, but I can't put my finger on the exact reason.

Steven 2009-08-24 18:40:17

In some RDBMS (sybase, maybe), DATE types are actually fuzzy and joining them is unpredictable. I'd have the same aversion to joining dates as @Steven, even if there's no rational reason for avoiding it any more.

skaffman 2009-08-24 19:15:32

In Oracle DATE joins work as well as they would with NUMBER. We are doing it on some pretty large tables and it isn't causing any problems. Think about that join as simply partition pointer.

jva 2009-08-24 19:36:16

Answer 4

A:

How do I enforce both columns increasing together?

Assuming it is a bulk load and id is sequence generated at the time of the bulk-load, you could ALTER SEQUENCE between loads to get more control over the range of sequences used for each partition. If the sequence and created on is assigned prior to the bulk load, you may need a stage in your ETL process to work out what the min/max id for each created date is.
Range partition on created_on, Range subpartition on id. Each partition should only have one sub-partition.
Assuming that, since this is a new DB, you'll have 11g how about a check constraints on virtual columns. Virtual column date_partition

CASE WHEN created_on BETWEEN ... AND ... THEN 'PARTITION_1' WHEN created_on BETWEEN ... AND ... THEN 'PARTITION_2' ... END

Similar virtual column on id_partition, though you'd have to query to get the minimum/maximum PK for each partition. Should be quick as, being the primary key, there's an index on it.

Then you add a constraint such that id_partition = date_partition

Gary 2009-08-24 23:00:36

Answer 5

+2 A:

Hi Steven,

the real important question is: will you ever need to query by range of IDs? It is unlikely you will need to build a query with ID BETWEEN :A AND :B. Therefore, Oracle wouldn't benefit from a correlated partition scheme. For all that matters you could use a GUID for the primary key and you will get better scalability for INSERTS.

Vincent Malgrat 2009-08-25 08:32:56

No. However, I will join on the id (PK) all the time, so it would be useful to quickly know what partition a particular id is in to avoid searching through them all.

Steven 2009-08-25 16:56:06

@Steven: The PK index will be global (not partitionned), therefore Oracle will only have to look in one place (the index) to be able to find the rowid. The rowid contains the partition information, and Oracle won't have to scan through all the partitions to find a row -- it will only look in the partition pointed by the rowid.

Vincent Malgrat 2009-08-25 17:30:39

Simply, partition by date range and forget about the rest. Is that what you're saying?

Steven 2009-08-25 19:50:10

If it only adds complexity and doesn't help performance, yes, don't bother =)

Vincent Malgrat 2009-08-26 07:43:07

ansaurus

tags:

views:

answers:

Partition by Date and PK

related questions