ansaurus

Question

Selecting surrounding rows in Oracle SQL

Answer 1

A:

If you just want the latest :n rows for 1234:

select timestamp, activity_code
from
( select timestamp, activity_code
  from   log
  where  reference_id=1234
  order by timestamp desc
)
where rownum <= :n;

Tony Andrews 2009-11-09 13:45:01

That's not quite what I want - I want all rows (whatever the reference number) that are close to my query row in time.

Greg Reynolds 2009-11-09 13:53:22

Is the "reference number" a unique identifier? Doesn't appear to be from your example.

Tony Andrews 2009-11-09 14:06:12

It's not unique no, there can be multiple rows with the same reference number (imagine it as an order number or customer ID or something like that)

Greg Reynolds 2009-11-09 14:31:02

Answer 2

+1 A:

Here's a conceptual decomposition of a way to do it.

Annotate your ordered logs with the row number:

WITH ordered_logs AS (
    SELECT ROWNUM r, log_table.*
    FROM log_table 
    ORDER BY timestamp
)
SELECT * FROM ordered_logs;

Look up the row number for the central information we're looking for:

SELECT r r0 FROM ordered_logs
WHERE reference_id = 1234; -- or whatever uniquely identifies your problem

Browse a few rows around it:

SELECT * FROM ordered_logs, sought WHERE r BETWEEN r0 - 5 AND r0 + 5;

Putting it all back together:

WITH
  ordered_logs AS (
    SELECT ROWNUM r, log_table.*
    FROM log_table 
    ORDER BY timestamp
  ),
  sought AS (
    SELECT r r0
    FROM ordered_logs
    WHERE reference_id = 1234
  )
SELECT *
FROM ordered_logs, sought
WHERE r BETWEEN r0 - 5 AND r0 + 5;

A lot of optimization is possible, but this is the simplest way of describing it I could find.

JB 2009-11-09 14:07:59

Answer 3

A:

create table grep_like (
  id  number,
  dt  date,
  txt varchar2(10)
);


insert into grep_like values(10, sysdate -  9/24/60/60, 'foo');
insert into grep_like values(30, sysdate -  8/24/60/60, 'bar');
insert into grep_like values(39, sysdate -  2/24/60/60, 'baz');
insert into grep_like values(22, sysdate -  5/24/60/60, '***');
insert into grep_like values(87, sysdate -  7/24/60/60, '###');
insert into grep_like values(57, sysdate -  4/24/60/60, '!!!');
insert into grep_like values(32, sysdate +  1/24/60/60, '---');
insert into grep_like values(99, sysdate - 12/24/60/60, '...');
insert into grep_like values(18, sysdate -  1/24/60/60, 'noo');
insert into grep_like values(20, sysdate - 10/24/60/60, 'moo');
insert into grep_like values(81, sysdate -  0/24/60/60, 'huh');


select p.dt, p.txt
from (
  select r.dt, r.txt, r.r,
         max(case when r.id = 57 then r.r else 0 end) over () p
  from (
    select dt, txt, id,
           row_number() over (order by dt)  r
      from grep_like
  ) r
) p
where 
  p.r - p.p between -1 and 1
;

René Nyffenegger 2009-11-09 14:23:11

Thanks - this is almost what I want, but it does a full table scan, which is not really what I want to be doing (it's a large table).

Greg Reynolds 2009-11-09 14:49:47

Answer 4

A:

Another idea would be to grab a few minutes worth' of entries either way like this:

WITH ts as (
    SELECT timestamp
    FROM log_table
    WHERE reference_id = 1234)
SELECT *
FROM log_table join ts
WHERE timestamp > ts.timestamp - 5 minutes
AND timestamp < ts.timestamp + 5 minutes

Of course, the '+/- 5 minutes' has to be implemented according to your db-system.

This may be a bit easier than getting a rownumber and using that to define your "window", but it may not satisfy your requirements.

IronGoofy 2009-11-09 14:23:50

Answer 5

+1 A:

Interesting question.

You could possibly use Analytic Functions to give you a range of times that you might be interested in and then use this range to select from your logging table.

(Haven't had a chance to run this SQL but it should give you the idea).

This gives a TIME for 2 rows before and 2 rows after:

select
    l.code,
    l.reference,
    l.time,
    min(l.time) over (
        order by l.time 
        rows between 2 preceding and current row) 
            preceding_time,
    max(l.time) over (
        order by l.time 
        rows beween current row and 2 following) 
            following_time
from
    log_table l;

You can then use these "time boxes" to select a range in the driving table.

with timebox as
    (
    select
        l.code,
        l.reference,
        l.time,
        min(l.time) over (
            order by l.time 
            rows between 2 preceding and current row) 
                preceding_time,
        max(l.time) over (
            order by l.time 
            rows beween current row and 2 following) 
                following_time
    from
        log_table l
    )
select
    *
from
    log_table a
where
    exists
        (
        select 1 from
            timebox t
        where
            t.reference = 'MYREF1'
        and a.time between t.preceding_time and t.following_time
        );

Is that close to what you're after?

Nick Pierpoint 2009-11-09 14:59:10

This is almost it. Frustratingly, when I use the between clause I get a full table scan on my data table, whereas if I just use an = following_time (or similar) clause I get good performance. I have no idea why this is happening, but it's making this solution not work.

Greg Reynolds 2009-11-09 17:10:42

You're going to have a full scan in there somewhere as your effectively joining your log_table to your log_table. There will be a full table scan to get the timebox but you should be able to get an index search on a.time. Are you seeing two full table scans?

Nick Pierpoint 2009-11-09 20:57:04

If you can't afford any full scan, you'll need an indexed column on which you can rely on its "order by" (so you've already got your 'timebox'). If each row in the real table has a sequential unique ID then you could use that, otherwise you might have to fall back to using a +/- x minutes instead of +/- x rows.

Nick Pierpoint 2009-11-09 21:01:16

ansaurus

tags:

views:

answers:

Selecting surrounding rows in Oracle SQL

related questions