views:

86

answers:

6

I have a table with millions of IP range records (start_num, end_num respectively) which I need to query via a single IP address in order to return all ranges which overlap that point. The query is essentially:

SELECT start_num
       , end_num
       , other_data_col 
FROM ip_ranges 
WHERE :query_ip BETWEEN start_num and end_num;

The table has 8 range partitions on start_num and has a local composite index on (start_num, end_num). Call it UNQ_RANGE_IDX. Statistics have been gathered on the table and index.

The query does an index range scan on the UNQ_RANGE_IDX index as expected and in some cases performs very well. The cases where it performs well are toward the bottom of the IP address space (i.e. something like 4.4.10.20) and performance is poor when at the upper end. (i.e. 200.2.2.2) I'm sure that the problem resides in the fact that on the lower end, the optimizer can prune all the partitions above the one that contains the applicable ranges due to the range partitioning on start_num providing the information necessary to prune. When querying on the top end of the IP spectrum, it can't prune the lower partitions and therefore it incurs the I/O of reading the additional index partitions. This can be verified via the number of CR_BUFFER_GETS when tracing the execution.

In reality, the ranges satisfying the query won't be in any partition but the one the query_ip is located in or the one immediately below or above it as the range size won't be greater than an A class and each partition covers many A classes each. I can make Oracle use that piece of information by specifying it in the where clause, but is there a way to convey this type of information to Oracle via stats, histograms, or a custom/domain index? It seems that there would be a common solution/approach to this sort of problem when searching for date ranges that cover a specific date as well.

I'm looking for solutions that use Oracle and its functionality to tackle this problem, but other solution types are appreciated. I've thought of a couple methods outside the scope of Oracle that would work, but I'm hoping for a better means of indexing, statistics gathering, or partitioning that will do the trick.

Requested Info:

CREATE TABLE IP_RANGES (
    START_NUM NUMBER NOT NULL, 
    END_NUM   NUMBER NOT NULL,
    OTHER     NUMBER NOT NULL,
    CONSTRAINT START_LTE_END CHECK (START_NUM <= END_NUM)
)
PARTITION BY RANGE(START_NUM)
(
    PARTITION part1 VALUES LESS THAN(1090519040) TABLESPACE USERS,
    PARTITION part2 VALUES LESS THAN(1207959552) TABLESPACE USERS
    ....<snip>....
    PARTITION part8 VALUES LESS THAN(MAXVALUE) TABLESPACE USERS
);

CREATE UNIQUE INDEX IP_RANGES_IDX ON IP_RANGES(START_NUM, END_NUM, OTHER) LOCAL NOLOGGING;

ALTER TABLE IP_RANGES ADD CONSTRAINT PK_IP_RANGE 
PRIMARY KEY(START_NUM, END_NUM, OTHER) USING INDEX IP_RANGES_IDX;

There is nothing special about the cutoff values selected for the range partitions. They are simply A class addresses where the number of ranges per partition would equate to about 1M records.

+1  A: 

I've had a similar problem in the past; the advantage I had was that my ranges were distinct. I've got several IP_RANGES tables, each for a specific context, and the largest is ~10 million or so records, unpartitioned.

Each of the tables I have is index-organized, with the primary key being (END_NUM, START_NUM). I've also got a unique index on (START_NUM, END_NUM), but it's not used in this case.

Using a random IP address (1234567890), your query takes about 132k consistent gets.

The query below returns in between 4-10 consistent gets (depending on IP) on 10.2.0.4.

select *
  from ip_ranges outr
 where :ip_addr between outr.num_start and outr.num_end
   and outr.num_end = (select /*+ no_unnest */
                              min(innr.num_end)
                             from ip_ranges innr
                            where innr.num_end >= :ip_addr);
---------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                   |     1 |    70 |     6   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN             | IP_RANGES_PK      |     1 |    70 |     3   (0)| 00:00:01 |
|   2 |   SORT AGGREGATE              |                   |     1 |     7 |            |          |
|   3 |    FIRST ROW                  |                   |   471K|  3223K|     3   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN (MIN/MAX)| IP_RANGES_PK      |   471K|  3223K|     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("OUTR"."NUM_END"= (SELECT /*+ NO_UNNEST */ MIN("INNR"."NUM_END") FROM
              "IP_RANGES" "INNR" WHERE "INNR"."NUM_END">=TO_NUMBER(:IP_ADDR)) AND
              "OUTR"."NUM_START"<=TO_NUMBER(:IP_ADDR))
       filter("OUTR"."NUM_END">=TO_NUMBER(:IP_ADDR))
   4 - access("INNR"."NUM_END">=TO_NUMBER(:IP_ADDR))


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          7  consistent gets
          0  physical reads
          0  redo size
        968  bytes sent via SQL*Net to client
        492  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

The NO_UNNEST hint is key; it tells Oracle to run that subquery once, not once for each row, and it gives an equality test for the index to use in the outer query.

Adam Musch
Thank you for your response, but this does not work as written in my case. Queries that return 4 rows in the original query return none in your proposal. It's because with overlapping ranges your inner query may return an end_num that isn't associated with a range in the outer. For example an IP (say 4.4.4.4) may be in a huge a-class range 4.0.0.0-4.255.255.255 but have a smaller range of 4.4.5.0-255 ahead of it which would be returned by your inner query. Your query would never return multiple rows unless the end_nums were all the same
RC
I put this out there in case someone else ran into the problem I had, which does solve the tuning problem of finding **a** matching row quickly if one exists for non-overlapping ranges. A thought I just had was that some of the options in Spatial (specifically, using SDO_POINT / SDO_LINE, an R-Tree index, and the COVERS operator) may help, but Spatial costs money and I don't have it licensed.
Adam Musch
A: 

The problem I see is Local Partitioned Index and as you said looks like Oracle doesn't do prune the partition list efficiently. Can you try with Global Index? Local partitioned index doesn't scale well for OLTP queries. In our environment we don't use any Local partitioned index.

Baski
The performance behavior is the same as with a locally partitioned index.
RC
Did you mean Global partitioned index?
Baski
The global and local partitioned indexes offer the same type of performance. Whether the partitions are local or global, the issue is still that the optimizer can't prune partitions based on :query_ip between start_num and end_num.
RC
Baski
Can you try the query as SELECT start_num , end_num , other_data_col FROM ip_ranges WHERE :query_ip BETWEEN start_num and end_num AND start_num >= :query_ip
Baski
I'm looking for ranges that my query ip falls between. If I query 4.4.4.4 and the range in the table is 4.0.0.0 - 4.255.255.255, you're query won't find it as start_num >= :query_ip will be false.
RC
A: 

Would you please indicate if there are any uniform or ordered characteristics of your IP ranges? For example, I would normally expect IP ranges to lie on power-of-2 boundaries. Is that the case here, so we can assume that all ranges have an implicit net mask that starts with m ones followed by n zeroes where m + n = 32?

If so, there should be a way to exploit this knowledge and "step" into the ranges. Would it be possible to add an index on a calculated value with the count of the masked bits (0-32) or maybe the block size (1 to 2^32)?

32 seeks from masks 0 to 32 using just the start_num would be faster than a scan using BETWEEN start_num AND end_num.

Also, have you considered bit arithmetic as a possible means of checking for matches (again only if the ranges represent evenly-positioned chunks in power-of-2 sizes).

Emtucifor
A: 

Your existing partitioning doesn't work, because Oracle is accessing the table's local index partitions by start_num, and it's got to check each one where there could be a match.

A different solution, assuming no ranges span a class A, would be to list partition by trunc(start_num / power(256,3)) -- the first octet. It may be worth breaking it out into a column (populated via trigger) and adding that as a filtering column to your query.

Your ~10m rows would then be, assuming an even distribution, be spread out into about 40k rows, which might be a lot faster to read through.


I ran the use case discussed below, assuming that no range spans a class A network.

create table ip_ranges
 (start_num         number           not null, 
  end_num           number           not null, 
  start_first_octet number           not null,
   ...
  constraint start_lte_end check (start_num <= end_num), 
  constraint check_first_octet check (start_first_octet = trunc(start_num / 16777216) )
)
partition by list ( start_first_octet )
(
partition p_0 values (0),
partition p_1 values (1),
partition p_2 values (2),
...
partition p_255 values (255)
);

-- run data population script, ordered by start_num, end_num

create index ip_ranges_idx01 on ip_ranges (start_num, end_num) local;

begin 
  dbms_stats.gather_table_stats (ownname => user, tabname => 'IP_RANGES', cascade => true);
end;
/

Using the base query above still performs poorly, as it's unable to do effective partition elimination:

----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name            | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |                 | 25464 |  1840K|   845   (1)| 00:00:05 |       |       |
|   1 |  PARTITION LIST ALL                |                 | 25464 |  1840K|   845   (1)| 00:00:05 |     1 |   256 |
|   2 |   TABLE ACCESS BY LOCAL INDEX ROWID| IP_RANGES       | 25464 |  1840K|   845   (1)| 00:00:05 |     1 |   256 |
|*  3 |    INDEX RANGE SCAN                | IP_RANGES_IDX01 |   825 |       |   833   (1)| 00:00:05 |     1 |   256 |
----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("END_NUM">=TO_NUMBER(:IP_ADDR) AND "START_NUM"<=TO_NUMBER(:IP_ADDR))
       filter("END_NUM">=TO_NUMBER(:IP_ADDR))


Statistics
----------------------------------------------------------
         15  recursive calls
          0  db block gets
     141278  consistent gets
      94469  physical reads
          0  redo size
       1040  bytes sent via SQL*Net to client
        492  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

However, if we add the condition to allow Oracle to focus on a single partition, it makes a huge difference:

SQL> select * from ip_ranges
  2   where :ip_addr between start_num and end_num
  3     and start_first_octet = trunc(:ip_addr / power(256,3));

----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name            | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |                 |   183 | 13542 |   126   (2)| 00:00:01 |       |       |
|   1 |  PARTITION LIST SINGLE             |                 |   183 | 13542 |   126   (2)| 00:00:01 |   KEY |   KEY |
|   2 |   TABLE ACCESS BY LOCAL INDEX ROWID| IP_RANGES       |   183 | 13542 |   126   (2)| 00:00:01 |   KEY |   KEY |
|*  3 |    INDEX RANGE SCAN                | IP_RANGES_IDX01 |     3 |       |   322   (1)| 00:00:02 |   KEY |   KEY |
----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("END_NUM">=TO_NUMBER(:IP_ADDR) AND "START_NUM"<=TO_NUMBER(:IP_ADDR))
       filter("END_NUM">=TO_NUMBER(:IP_ADDR))


Statistics
----------------------------------------------------------
         15  recursive calls
          0  db block gets
          7  consistent gets
          0  physical reads
          0  redo size
       1040  bytes sent via SQL*Net to client
        492  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed
Adam Musch
Can't we just create global partitioned function index on first octet?
Baski
Adam Musch
(Darned 600 character limit!) The OP says "There is nothing special about the cutoff values selected for the range partitions. They are simply A class addresses where the number of ranges per partition would equate to about 1M records." Why not fix the partitioning scheme if the ranges don't span class A records rather than band-aid it by forcing the index to do it?
Adam Musch
I don't think partitioning by A-class does much except make the partitions smaller. That would be better if the optimizer wasn't scanning all the partitions anyway on upper IPs. If I query an IP of 205.x.x.x, all the partitions whether on start_num or the A-class of start num "below" the partition where the start_num of 205.x.x.x would fall COULD have an end_num beyond my query IP and thus would need to be returned. I can add an additional filter on start_num to restrict how far back it can look, but if somewhere down the line an abnormally large range is inserted it may not be found.
RC
It would be nice if Oracle knew based on stats or something that partitions 1-5 have no end_nums great enough to have records that match the query, but I'm not sure if it's possible to accomplish this. Spatial may work, but I don't think we have that as an option either.
RC
A: 

Firstly, what is your performance requirement ?

Your partitions have a definite start value and end values which can be determined from ALL_PARTITIONS (or hard-coded) and used in a function (concept below but you'd need to amend it to go one partition forward/back).

You should then be able to code

SELECT * FROM ip_ranges
WHERE :query_ip BETWEEN start_num and end_num
AND start_num between get_part_start(:query_ip) and get_part_end(:query_ip);

Which should be able to lock it down to specific partition(s). However if, as you suggest, you can only lock it down to three out of eight partitions, that is still going to be a BIG scan. I'm posting another, more radical answer as well which may be more appropriate.

create or replace function get_part_start (i_val in number) 
                              return number deterministic is
  cursor c_1 is 
    select high_value from all_tab_partitions
    where table_name = 'IP_RANGES'
    order by table_owner, table_name;
  type tab_char is table of varchar2(20) index by pls_integer;
  type tab_num is table of number index by pls_integer;
  t_char  tab_char;
  t_num   tab_num;
  v_ind   number;
begin
  open c_1;
  fetch c_1 bulk collect into t_char;
  close c_1;
  --
  for i in 1..t_char.last loop
    IF t_char(i) != 'MAXVALUE' THEN
      t_num(to_number(t_char(i))) := null;
    END IF;
  end loop;
  --
  IF i_val > t_num.last then
    return t_num.last;
  ELSIF i_val < t_num.first then
    return 0;
  END IF;
  v_ind := 0;
  WHILE i_val >= t_num.next(v_ind) loop
    v_ind := t_num.next(v_ind);
    exit when v_ind is null;
  END LOOP;
  return v_ind;
end;
/
Gary
+1  A: 

I suggest you turn your 8 million row table into a bigger table. Google's IP (for me, at the moment) is coming up as

"66.102.011.104"

You store one record as "66.102.011" with the respective range(s) that it falls in. In fact you store at least one record for every "aaa.bbb.ccc". You'll probably end up with a table maybe five times as big, but one you can can pin-point the relevant record with just a few logical IOs each time rather than the hundreds/thousands for a partition scan.

I suspect any data you have is going to be a little out of date anyway (as various authorities around the world issue/re-issue ranges), so regenerating adjustments for that table on a daily/weekly basis shouldn't be a big problem.

Gary
This is basically something I had though about doing too. I can't really change the way the table is formatted as there are too many things linked too it, but I thought about using a similar method to make a table that essentially serves as an index into the existing table. Basically, create a non-unique index containing the c-class and all the records from the existing table that overlap that c-class. Then change the SQL to query this table for all the ranges that overlap the query_ip c-class and then filter out any that do not overlap that specific IP from that small subset.
RC
My only down-side is I would have to manage this table as ranges are added and deleted from the existing table. This is something I will give a go if no other solution is provided that allows Oracle to do all the work. Thanks. +1
RC