ansaurus

Question

How should I range partition an index with a varchar2 column in Oracle? Is it a bad idea?

Answer 1

A:

Is the table is being searched by the partitioning key in addition to the SOUNDEX value? Or is it being searched just by the SOUNDEX column?

If you are just trying to achieve an even distribution of data among partitions, have you considered using hash partitions rather than range partitions? Assuming you choose a power of 2 for the number of partitions, that should give you a pretty even distribution of data between partitions.

Justin Cave 2009-08-16 04:19:51

The table is being searched by several different columns, but the column on which the table is partitioned is the one that is searched on most often and provides the most natural way to break up the data. However, my users still need to be able to search on the soundex column which is why I am trying to create a good partitioned index.I was going to try a hash partition but I read on Ask Tom that the hash values need to be "fairly unique". In the case of my table the number of distinct soundex values is about 25% of the total number of rows (meaning many non-unique values).

Kevin Babcock 2009-08-16 04:51:08

I'm not a great partitioning expert - but if there's only about 4 rows for each soundex value I would describe that as "fairly unique" - hash partitioning may be a good way to go, especially if the queries are usually on a particular soundex value (e.g. WHERE sndx='A12345'). If queries are on ranges (e.g. WHERE sndx LIKE 'A1234%') then range partitioning would probably be better.

Jeffrey Kemp 2009-08-16 06:19:43

Yes, queries are usually on ranges. I tried creating a hash partition, but some querying on values with a large number of corresponding rows still require full table scans. I think range partitioning is definitely the right choice.

Kevin Babcock 2009-08-16 08:39:56

Answer 2

+3 A:

What exactly is your question?

Don't you know how you can split your table in n equal parts to avoid skew?

You can do that with analytic function percentile_disc().

Here an SQL PLUS example with n=100, I admit that it isn't very sophisticated but it will do the job.

set pages 0
set lines 200

drop table random_strings;

create table random_strings 
as 
select upper(dbms_random.string('A', 12)) rndmstr
from dual 
connect by level < 1000;


spool parts

select 'select '||level||'/100,percentile_disc('||level||
       '/100) within group (order by RNDMSTR) from random_strings;' 
       sql_statement
from dual 
connect by level <= 100
/

spool off

This will output in file parts.lst:

select 1/100,percentile_disc(1/100) within group (order by RNDMSTR) from random_strings;                                                                                                                
select 2/100,percentile_disc(2/100) within group (order by RNDMSTR) from random_strings;                                                                                                                
select 3/100,percentile_disc(3/100) within group (order by RNDMSTR) from random_strings;                                                 
...
select 100/100,percentile_disc(100/100) within group (order by RNDMSTR) from random_strings;

Now you can run script parts.lst to get the partition values. Each partition will contain 1% of the data initially.

Script parts.lst will output:

   ,01 AJUDRRSPGMNP
   ,02 AOMJZQPZASQZ
   ,03 AWDQXVGLLUSJ
   ,04 BIEPUHAEMELR
   .... 
   ,99 ZTMHDWTXUJAR
   1   ZYVJLNATVLOY

tuinstoel 2009-08-16 06:44:42

+1 I'm certainly an Oracle beginner. I didn't know about percentile_disc...many many thanks for the tip, its saved me some headache and frustration trying to figure out the data distribution. To answer the first line of your response, I guess what I'm asking is: is my approach to indexing the table I described appropriate?

Kevin Babcock 2009-08-16 08:37:48

You query ranges so hash partioing doesn't help you because after hashing nearness is lost. So your approach to partitioning is appropriate. You can use PARTITION ... VALUES LESS THAN (MAXVALUE)) for the last partition to store values like 'ZZZYY233'.

tuinstoel 2009-08-16 09:18:56

Answer 3

A:

Talk to me! Can you tell me what your reason is for partitioning this table? It sounds like it is an OLTP table and may not need to be partition. We don’t want to partition just to say we are partitioned. Tell me what you are trying to accomplish by partitioning this table and I can help you pick a correct partitioning scheme. Partitioning does not equal faster queries. It actually can cause your queries to be slower in some cases.

I see some of your additional thoughts above and I don’t believe you need to partition your table. If your queries are going to be doing aggregates on entire partitions then you may want to partition. If you are going to have hundreds of millions of rows of data you may want to partition to help with DBA maintenance. If you just want you queries to run fast then the primary key index will suffice. Please let me know

Just create a global index on your desired columns.

Tom Clark AaiCanHelp.com 2009-08-18 19:36:56

The table is partitioned by a different column. It has almost 100M rows, so we partitioned it for maintainability. The table is queried using a variety of columns. One column that is heavily queried is the column I discuss here, and to improve query performance we were trying to partition that column's index. The thought being that if the index - with ~100M entries - is partitioned then our queries (which are by range) will be faster b/c Oracle only has to search a known subset of the values in the index.

Kevin Babcock 2009-08-19 16:21:04

ansaurus

tags:

views:

answers:

How should I range partition an index with a varchar2 column in Oracle? Is it a bad idea?

related questions