ansaurus

Question

Efficient retrieval of overlapping IP range records via a single point.

Answer 1

+1 A:

I've had a similar problem in the past; the advantage I had was that my ranges were distinct. I've got several IP_RANGES tables, each for a specific context, and the largest is ~10 million or so records, unpartitioned.

Each of the tables I have is index-organized, with the primary key being (END_NUM, START_NUM). I've also got a unique index on (START_NUM, END_NUM), but it's not used in this case.

Using a random IP address (1234567890), your query takes about 132k consistent gets.

The query below returns in between 4-10 consistent gets (depending on IP) on 10.2.0.4.

select *
  from ip_ranges outr
 where :ip_addr between outr.num_start and outr.num_end
   and outr.num_end = (select /*+ no_unnest */
                              min(innr.num_end)
                             from ip_ranges innr
                            where innr.num_end >= :ip_addr);
---------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                   |     1 |    70 |     6   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN             | IP_RANGES_PK      |     1 |    70 |     3   (0)| 00:00:01 |
|   2 |   SORT AGGREGATE              |                   |     1 |     7 |            |          |
|   3 |    FIRST ROW                  |                   |   471K|  3223K|     3   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN (MIN/MAX)| IP_RANGES_PK      |   471K|  3223K|     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("OUTR"."NUM_END"= (SELECT /*+ NO_UNNEST */ MIN("INNR"."NUM_END") FROM
              "IP_RANGES" "INNR" WHERE "INNR"."NUM_END">=TO_NUMBER(:IP_ADDR)) AND
              "OUTR"."NUM_START"<=TO_NUMBER(:IP_ADDR))
       filter("OUTR"."NUM_END">=TO_NUMBER(:IP_ADDR))
   4 - access("INNR"."NUM_END">=TO_NUMBER(:IP_ADDR))


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          7  consistent gets
          0  physical reads
          0  redo size
        968  bytes sent via SQL*Net to client
        492  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

The NO_UNNEST hint is key; it tells Oracle to run that subquery once, not once for each row, and it gives an equality test for the index to use in the outer query.

Adam Musch 2010-09-07 19:54:45

Thank you for your response, but this does not work as written in my case. Queries that return 4 rows in the original query return none in your proposal. It's because with overlapping ranges your inner query may return an end_num that isn't associated with a range in the outer. For example an IP (say 4.4.4.4) may be in a huge a-class range 4.0.0.0-4.255.255.255 but have a smaller range of 4.4.5.0-255 ahead of it which would be returned by your inner query. Your query would never return multiple rows unless the end_nums were all the same

RC 2010-09-07 20:21:05

I put this out there in case someone else ran into the problem I had, which does solve the tuning problem of finding **a** matching row quickly if one exists for non-overlapping ranges. A thought I just had was that some of the options in Spatial (specifically, using SDO_POINT / SDO_LINE, an R-Tree index, and the COVERS operator) may help, but Spatial costs money and I don't have it licensed.

Adam Musch 2010-09-07 20:59:41

Answer 2

A:

The problem I see is Local Partitioned Index and as you said looks like Oracle doesn't do prune the partition list efficiently. Can you try with Global Index? Local partitioned index doesn't scale well for OLTP queries. In our environment we don't use any Local partitioned index.

Baski 2010-09-07 20:22:26

The performance behavior is the same as with a locally partitioned index.

RC 2010-09-07 20:58:26

Did you mean Global partitioned index?

Baski 2010-09-07 22:32:13

The global and local partitioned indexes offer the same type of performance. Whether the partitions are local or global, the issue is still that the optimizer can't prune partitions based on :query_ip between start_num and end_num.

RC 2010-09-08 11:56:40

Baski 2010-09-09 07:32:02

Can you try the query as SELECT start_num , end_num , other_data_col FROM ip_ranges WHERE :query_ip BETWEEN start_num and end_num AND start_num >= :query_ip

Baski 2010-09-09 07:39:55

I'm looking for ranges that my query ip falls between. If I query 4.4.4.4 and the range in the table is 4.0.0.0 - 4.255.255.255, you're query won't find it as start_num >= :query_ip will be false.

RC 2010-09-15 12:11:13

Answer 3

A:

Would you please indicate if there are any uniform or ordered characteristics of your IP ranges? For example, I would normally expect IP ranges to lie on power-of-2 boundaries. Is that the case here, so we can assume that all ranges have an implicit net mask that starts with m ones followed by n zeroes where m + n = 32?

If so, there should be a way to exploit this knowledge and "step" into the ranges. Would it be possible to add an index on a calculated value with the count of the masked bits (0-32) or maybe the block size (1 to 2^32)?

32 seeks from masks 0 to 32 using just the start_num would be faster than a scan using BETWEEN start_num AND end_num.

Also, have you considered bit arithmetic as a possible means of checking for matches (again only if the ranges represent evenly-positioned chunks in power-of-2 sizes).

Emtucifor 2010-09-07 21:19:19

Answer 4

A:

Your existing partitioning doesn't work, because Oracle is accessing the table's local index partitions by start_num, and it's got to check each one where there could be a match.

A different solution, assuming no ranges span a class A, would be to list partition by trunc(start_num / power(256,3)) -- the first octet. It may be worth breaking it out into a column (populated via trigger) and adding that as a filtering column to your query.

Your ~10m rows would then be, assuming an even distribution, be spread out into about 40k rows, which might be a lot faster to read through.

I ran the use case discussed below, assuming that no range spans a class A network.

create table ip_ranges
 (start_num         number           not null, 
  end_num           number           not null, 
  start_first_octet number           not null,
   ...
  constraint start_lte_end check (start_num <= end_num), 
  constraint check_first_octet check (start_first_octet = trunc(start_num / 16777216) )
)
partition by list ( start_first_octet )
(
partition p_0 values (0),
partition p_1 values (1),
partition p_2 values (2),
...
partition p_255 values (255)
);

-- run data population script, ordered by start_num, end_num

create index ip_ranges_idx01 on ip_ranges (start_num, end_num) local;

begin 
  dbms_stats.gather_table_stats (ownname => user, tabname => 'IP_RANGES', cascade => true);
end;
/

Using the base query above still performs poorly, as it's unable to do effective partition elimination:

----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name            | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |                 | 25464 |  1840K|   845   (1)| 00:00:05 |       |       |
|   1 |  PARTITION LIST ALL                |                 | 25464 |  1840K|   845   (1)| 00:00:05 |     1 |   256 |
|   2 |   TABLE ACCESS BY LOCAL INDEX ROWID| IP_RANGES       | 25464 |  1840K|   845   (1)| 00:00:05 |     1 |   256 |
|*  3 |    INDEX RANGE SCAN                | IP_RANGES_IDX01 |   825 |       |   833   (1)| 00:00:05 |     1 |   256 |
----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("END_NUM">=TO_NUMBER(:IP_ADDR) AND "START_NUM"<=TO_NUMBER(:IP_ADDR))
       filter("END_NUM">=TO_NUMBER(:IP_ADDR))


Statistics
----------------------------------------------------------
         15  recursive calls
          0  db block gets
     141278  consistent gets
      94469  physical reads
          0  redo size
       1040  bytes sent via SQL*Net to client
        492  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

However, if we add the condition to allow Oracle to focus on a single partition, it makes a huge difference:

SQL> select * from ip_ranges
  2   where :ip_addr between start_num and end_num
  3     and start_first_octet = trunc(:ip_addr / power(256,3));

----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name            | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |                 |   183 | 13542 |   126   (2)| 00:00:01 |       |       |
|   1 |  PARTITION LIST SINGLE             |                 |   183 | 13542 |   126   (2)| 00:00:01 |   KEY |   KEY |
|   2 |   TABLE ACCESS BY LOCAL INDEX ROWID| IP_RANGES       |   183 | 13542 |   126   (2)| 00:00:01 |   KEY |   KEY |
|*  3 |    INDEX RANGE SCAN                | IP_RANGES_IDX01 |     3 |       |   322   (1)| 00:00:02 |   KEY |   KEY |
----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("END_NUM">=TO_NUMBER(:IP_ADDR) AND "START_NUM"<=TO_NUMBER(:IP_ADDR))
       filter("END_NUM">=TO_NUMBER(:IP_ADDR))


Statistics
----------------------------------------------------------
         15  recursive calls
          0  db block gets
          7  consistent gets
          0  physical reads
          0  redo size
       1040  bytes sent via SQL*Net to client
        492  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

Adam Musch 2010-09-07 21:43:40

Can't we just create global partitioned function index on first octet?

Baski 2010-09-07 22:34:02

Adam Musch 2010-09-08 04:43:39

(Darned 600 character limit!) The OP says "There is nothing special about the cutoff values selected for the range partitions. They are simply A class addresses where the number of ranges per partition would equate to about 1M records." Why not fix the partitioning scheme if the ranges don't span class A records rather than band-aid it by forcing the index to do it?

Adam Musch 2010-09-08 04:47:57

I don't think partitioning by A-class does much except make the partitions smaller. That would be better if the optimizer wasn't scanning all the partitions anyway on upper IPs. If I query an IP of 205.x.x.x, all the partitions whether on start_num or the A-class of start num "below" the partition where the start_num of 205.x.x.x would fall COULD have an end_num beyond my query IP and thus would need to be returned. I can add an additional filter on start_num to restrict how far back it can look, but if somewhere down the line an abnormally large range is inserted it may not be found.

RC 2010-09-08 12:15:41

It would be nice if Oracle knew based on stats or something that partitions 1-5 have no end_nums great enough to have records that match the query, but I'm not sure if it's possible to accomplish this. Spatial may work, but I don't think we have that as an option either.

RC 2010-09-08 12:21:30

Answer 5

A:

Firstly, what is your performance requirement ?

Your partitions have a definite start value and end values which can be determined from ALL_PARTITIONS (or hard-coded) and used in a function (concept below but you'd need to amend it to go one partition forward/back).

You should then be able to code

SELECT * FROM ip_ranges
WHERE :query_ip BETWEEN start_num and end_num
AND start_num between get_part_start(:query_ip) and get_part_end(:query_ip);

Which should be able to lock it down to specific partition(s). However if, as you suggest, you can only lock it down to three out of eight partitions, that is still going to be a BIG scan. I'm posting another, more radical answer as well which may be more appropriate.

create or replace function get_part_start (i_val in number) 
                              return number deterministic is
  cursor c_1 is 
    select high_value from all_tab_partitions
    where table_name = 'IP_RANGES'
    order by table_owner, table_name;
  type tab_char is table of varchar2(20) index by pls_integer;
  type tab_num is table of number index by pls_integer;
  t_char  tab_char;
  t_num   tab_num;
  v_ind   number;
begin
  open c_1;
  fetch c_1 bulk collect into t_char;
  close c_1;
  --
  for i in 1..t_char.last loop
    IF t_char(i) != 'MAXVALUE' THEN
      t_num(to_number(t_char(i))) := null;
    END IF;
  end loop;
  --
  IF i_val > t_num.last then
    return t_num.last;
  ELSIF i_val < t_num.first then
    return 0;
  END IF;
  v_ind := 0;
  WHILE i_val >= t_num.next(v_ind) loop
    v_ind := t_num.next(v_ind);
    exit when v_ind is null;
  END LOOP;
  return v_ind;
end;
/

Gary 2010-09-08 00:31:45

Answer 6

+1 A:

I suggest you turn your 8 million row table into a bigger table. Google's IP (for me, at the moment) is coming up as

"66.102.011.104"

You store one record as "66.102.011" with the respective range(s) that it falls in. In fact you store at least one record for every "aaa.bbb.ccc". You'll probably end up with a table maybe five times as big, but one you can can pin-point the relevant record with just a few logical IOs each time rather than the hundreds/thousands for a partition scan.

I suspect any data you have is going to be a little out of date anyway (as various authorities around the world issue/re-issue ranges), so regenerating adjustments for that table on a daily/weekly basis shouldn't be a big problem.

Gary 2010-09-08 00:42:36

This is basically something I had though about doing too. I can't really change the way the table is formatted as there are too many things linked too it, but I thought about using a similar method to make a table that essentially serves as an index into the existing table. Basically, create a non-unique index containing the c-class and all the records from the existing table that overlap that c-class. Then change the SQL to query this table for all the ranges that overlap the query_ip c-class and then filter out any that do not overlap that specific IP from that small subset.

RC 2010-09-08 12:42:53

My only down-side is I would have to manage this table as ranges are added and deleted from the existing table. This is something I will give a go if no other solution is provided that allows Oracle to do all the work. Thanks. +1

RC 2010-09-08 12:44:31

ansaurus

tags:

views:

answers:

Efficient retrieval of overlapping IP range records via a single point.

related questions