views:

475

answers:

8

I need to generate a report from a table with the structure and data as given below.

Table Ticket has data as given below.

ID         Assigned_To
100       raju
101       raju
102       raju
103       anil
104       anil
105       sam
106       raju
107       raju
108       anil

The Oracle SELECT should generate the below report

From_Id            To_Id    Assigned_To
100                  102      raju
103                  104      anil
105                  105      sam
106                  107      raju
108                  108      anil

Can someone please help me with building a query..?

Thanks in advance, Mathew.

+1  A: 

OK, this isn't pretty, but it works. And nobody else has contributed anything prettier yet, so maybe this is the way to do it.

select min(from_id), to_id, assigned_to from
(
select from_id, max(to_id) as to_id, assigned_to from
(
select t1.id as from_id, t2.id as to_id, t1.assigned_to
from       ticket t1
inner join ticket t2 on t1.assigned_to = t2.assigned_to and t2.id >= t1.id
where not exists
    (
    select * from ticket t3
    where  t3.ID > t1.ID
    and t3.ID < t2.ID
    and t3.assigned_to != t1.assigned_to
    )
) x
group by from_id, assigned_to
) y
group by to_id, assigned_to
;

I'm using mysql; there may well be some oracle goodness that makes this nicer - as there may well be some more elegant plain sql. But at least it's a start.

Carl Manaster
A: 

I believe you want one of the Oracle Analytic functions. You're in luck if you're using Oracle because other RDBMSs don't have this functionality. They let you write SQL that allows you to query data in relation to adjacent rows, such as calculating moving averages. I don't have an Oracle DB here to play with, but I think it would be something like this:

SELECT MIN(ID) AS From_Id, MAX(ID) AS To_Id, Assigned_To
FROM Ticket
PARTITION BY Assigned_To
ORDER BY From_Id
Greg
This is syntactically wrong but more importantly also semantically. Each assignee can be present several times in the desired result set but here he wouldn't... I tried to give a solution in a separate answer.
Michal Pravda
+1  A: 

You can bend over backwards trying to achieve this in pure SQL or you can create something a bit lengthier but both much easier to understand and more performance efficient - use a pipelined function

In essence the function would accept a ref cursor which would have to be pre-ordered by the ID, and then pipe rows only when a contiguous block of records has ended.

CREATE TABLE assignment
(
  a_id         NUMBER,
  assigned_to  VARCHAR2(4000)
);

CREATE OR REPLACE package PCK_CONTIGUOUS_GROUPBY as

 TYPE refcur_t IS REF CURSOR RETURN assignment%ROWTYPE;

 TYPE outrec_typ IS RECORD ( 
    from_id    NUMBER,
    to_id      NUMBER,
    assigned_to  VARCHAR2(4000));

  TYPE outrecset IS TABLE OF outrec_typ;

 FUNCTION f_cont_groupby(p refcur_t) 
      RETURN outrecset PIPELINED;

end;
/

CREATE OR REPLACE package body pck_contiguous_groupby as

 FUNCTION f_cont_groupby(p refcur_t) RETURN outrecset PIPELINED IS

  out_rec             outrec_typ;
  in_rec              p%ROWTYPE;
  first_id      assignment.a_id%type;
  last_id          assignment.a_id%type;
  last_assigned_to    assignment.assigned_to%type;

  BEGIN

   LOOP
     FETCH p INTO in_rec;
     EXIT WHEN p%NOTFOUND;


       IF last_id IS NULL THEN
       -- First record: don't pipe
         first_id := in_rec.a_id;

       ELSIF last_id = in_rec.a_id - 1 AND last_assigned_to = in_rec.assigned_to THEN
       -- Contiguous block: don't pipe
         NULL;

       ELSE
       -- Block not contiguous: pipe 
      out_rec.from_id := first_id;
      out_rec.to_id := last_id;
      out_rec.assigned_to := last_assigned_to;

      PIPE ROW(out_rec);

      first_id := in_rec.a_id;
       END IF;

     last_id := in_rec.a_id;
     last_assigned_to := in_rec.assigned_to;

   END LOOP;
   CLOSE p;

   -- Pipe remaining row 
   out_rec.from_id := first_id;
   out_rec.to_id := last_id;
   out_rec.assigned_to := last_assigned_to;

   PIPE ROW(out_rec);

   RETURN;
 END;

END pck_contiguous_groupby;
/

and then to try it out, populate the table and run:

SELECT * FROM TABLE(pck_contiguous_groupby.f_cont_groupby (CURSOR (SELECT a_id, assigned_to FROM assignment ORDER BY a_id)));
Andrew from NZSG
Plsql row by row solution may be sometimes easier to read, but almost always is less performant than pure SQL.
Michal Pravda
I'm also very curious why you state "and more performance efficient - use a pipelined function". Care to explain?
Rob van Wijk
It is much more efficient than any other other SQL and I'll explain why. The first and foremost step of any performance design is not to place SQL ahead of PL/SQL but to minimize logical I/O. The above query performs a SINGLE SCAN of the table + an order by and that's it. Any other query would at least have to perform 2 scans. E.g. there probably is some clever way to use an analytic function to calculate the 'contiguous block' and then GROUP BY this new calculated column, but this would imply 2 scans + an additional GROUP BY.
Andrew from NZSG
The overhead of using PL/SQL vs. SQL matters only when both incur roughly the same logical I/O and even then the difference with newer version of Oracle is getting smaller. Also, SQL cannot be easily reused unless placed in views and can get quite awkward to be understood if it gets very bloated.
Andrew from NZSG
If you look at the solutions of Michal Pravda and me, you'll notice that we only do one single scan as well ... And I have a different opinion about the reusability part, but let's not go there :-)
Rob van Wijk
@Rob: it's a single scan of the underlying table but it's still double the logical I/O. Firstly you're doing one scan of the table with an analytic function which then materializes into a temporary table, which then must be scanned again for a group by. Plus, you have additional overheads with extra group bys and analytic functions.Michal's query is doing 3 scans (1 base table + 2 materialized temp tables).
Andrew from NZSG
I did some bencmarking, see my answer. The pipelined function is much slower in Oracle XE.
Theo
And why do you think it materializes intermediate result sets into a temporary table?
Rob van Wijk
+1  A: 

Here is my suggestion, not very well tested but in my head it sounds right, as long as id is unique and a continious sequence. Look at the bottom for the sql query.

SQL> create table ticket (id number, assigned_to varchar2(30));

Table created.

SQL> insert into ticket values (100, 'raju');

1 row created.

SQL> insert into ticket values (101, 'raju');

1 row created.

SQL> insert into ticket values (102, 'raju');

1 row created.

SQL> insert into ticket values (103, 'anil');

1 row created.

SQL> insert into ticket values (104, 'anil');

1 row created.

SQL> insert into ticket values (105, 'sam');

1 row created.

SQL> insert into ticket values (106, 'raju');

1 row created.

SQL> insert into ticket values (107, 'raju');

1 row created.

SQL> insert into ticket values (108, 'anil');

1 row created.

SQL> select a.id from_id
  2  ,lead(a.id -1, 1, a.id) over (order by a.id) to_id
  3  ,a.assigned_to
  4  from (
  5  select
  6  id, assigned_to
  7  ,lag(assigned_to, 1) over (order by id) prev_assigned_to
  8  from ticket
  9  ) a
 10  where a.assigned_to != nvl(a.prev_assigned_to, a.assigned_to||'unique')
 11  order by id
 12  ;

   FROM_ID      TO_ID ASSIGNED_TO
---------- ---------- ------------------------------
       100        102 raju
       103        104 anil
       105        105 sam
       106        107 raju
       108        108 anil
A: 

Hi,

this should work. The solution consist of several inline view - each of them computes something. I commented what I intended to do.You must of course read the comments from the innermost out as it is executed.

--get results by grouping by interval_begin
SELECT MIN(id) from_id,
       MAX(id) to_id,
       MAX(assigned_to) assigned_to
  FROM ( --copy ids of a first row of each interval of ids to the all following rows of that interval
         SELECT id,
                 assigned_to,
                 MAX(change_at) over(ORDER BY id) interval_begin
           FROM ( --find each id where a change of an assignee occurs and "mark" it. Dont forget the first row
                 SELECT id,
                         assigned_to,
                         CASE
                           WHEN (lag(assigned_to) over(ORDER BY id) <> assigned_to OR lag(assigned_to)
                                 over(ORDER BY id) IS NULL) THEN
                            id
                         END change_at
                   FROM ticket))
 GROUP BY interval_begin
 ORDER BY from_id;
    ;
Michal Pravda
+4  A: 
SQL> create table ticket (id,assigned_to)
  2  as
  3  select 100, 'raju' from dual union all
  4  select 101, 'raju' from dual union all
  5  select 102, 'raju' from dual union all
  6  select 103, 'anil' from dual union all
  7  select 104, 'anil' from dual union all
  8  select 105, 'sam'  from dual union all
  9  select 106, 'raju' from dual union all
 10  select 107, 'raju' from dual union all
 11  select 108, 'anil' from dual
 12  /

Tabel is aangemaakt.

SQL> select min(id) from_id
  2       , max(id) to_id
  3       , assigned_to
  4    from ( select id
  5                , assigned_to
  6                , id - row_number() over (partition by assigned_to order by id) grp
  7             from ticket
  8         )
  9   group by assigned_to
 10       , grp
 11   order by from_id
 12  /

   FROM_ID      TO_ID ASSIGNED_TO
---------- ---------- -----------
       100        102 raju
       103        104 anil
       105        105 sam
       106        107 raju
       108        108 anil

5 rijen zijn geselecteerd.

**UPDATE with the results of a performance comparison with tuinstoel's solution:

On 11.1.0.7:

SQL> exec runstats_pkg.rs_start

PL/SQL procedure successfully completed.

SQL> set termout off
SQL> select min(id) from_id
  2       , max(id) to_id
  3       , assigned_to
  4    from ( select id
  5                , assigned_to
  6                , id - row_number() over (partition by assigned_to order by id) grp
  7             from ticket
  8         )
  9   group by assigned_to
 10       , grp
 11   order by from_id
 12  /

   FROM_ID      TO_ID ASSI
---------- ---------- ----
       100        102 raju
       103        104 anil
       105        105 sam
       106        107 raju
       108        108 anil
       109        111 raju
<snip>
    589921     589922 raju
    589923     589923 anil

327680 rows selected.

SQL> set termout on
SQL> exec runstats_pkg.rs_middle

PL/SQL procedure successfully completed.

SQL> set termout off
SQL> select * from table(testpl.pltest)
  2  /

   FROM_ID      TO_ID ASSI
---------- ---------- ----
       100        102 raju
       103        104 anil
       105        105 sam
       106        107 raju
       108        108 anil
       109        111 raju
<snip>
    589921     589922 raju
    589923     589923 anil

327680 rows selected.

SQL> set termout on

And the results:

SQL> exec runstats_pkg.rs_stop(100)
Run1 draaide in 547 hsecs
Run2 draaide in 549 hsecs
Run1 draaide in 99.64% van de tijd

Naam                                                      Run1        Run2    Verschil
STAT.recursive cpu usage                                     2         106         104
LATCH.row cache objects                                     91         217         126
STAT.bytes received via SQL*Net from client             37,496      37,256        -240
STAT.recursive calls                                         7       5,914       5,907
STAT.table scan rows gotten                            615,235     589,824     -25,411
STAT.sorts (rows)                                      917,504     589,824    -327,680

Run1 latches totaal versus run2 -- verschil en percentage
Run1      Run2  Verschil     Pct
10,255    10,471       216  97.94%

PL/SQL procedure successfully completed.

Regards, Rob.

Rob van Wijk
Pretty good, it's simple to understand, assuming one knows how analytic functions work.
Andrew from NZSG
Nice and simple. I didn't realized that it is in this case possible to partition by assigned to as well as grp. So I used a bit more complex way to compute "grp" :( +1Messays' is even better in terms of performance (one sort less), but he assumes that id is dense (two consecutive ids diffs by 1) and we don't.
Michal Pravda
+1  A: 

I did some benchmarking with Oracle express edition 10.2.0.1.0

I used this script to fill table ticket with 1179648 rows:

create table ticket (id,assigned_to)
as
select 100, 'raju' from dual union all
select 101, 'raju' from dual union all
select 102, 'raju' from dual union all
select 103, 'anil' from dual union all
select 104, 'anil' from dual union all
select 105, 'sam' from dual union all
select 106, 'raju' from dual union all
select 107, 'raju' from dual union all
select 108, 'anil' from dual
/


begin
  for i in 1..17 loop
    insert into ticket 
    select id + (select count(*) from ticket), assigned_to
    from ticket;
  end loop;
end;
/

commit;

SQL> select count(*) from ticket;

  COUNT(*)                                                                      
----------                                                                      
   1179648

Rob van Wijk's select statement takes 1.6 seconds on average, Mesays's select statement 2.8 seconds on average, Micheal Pravda's select statement 4.2 seconds and Andrew from NZSG's statement 9.6 seconds on average.

So a pipelined function is slower in Oracle XE. Or maybe someone has to improve the pipelined function...?

Theo
+1 Thanks for the test!
Rob van Wijk
+2  A: 

Let's say that Andrew from NZSG inspired me. I made a pipe lined function too.

create or replace package testpl is

 type outrec_type is record
 ( from_id ticket.id%type
 , to_id   ticket.id%type
 , assigned_to ticket.assigned_to%type);

 type outrec_table is table of outrec_type;

 function pltest return outrec_table pipelined;

end;
/

create or replace package body testpl is

  function pltest return outrec_table pipelined
  is
    l_outrec outrec_type;
    l_first_time boolean := true;
  begin

     for r_tick in (select id, assigned_to from ticket order by id) loop

       if (r_tick.assigned_to != l_outrec.assigned_to or l_first_time) then
          if not l_first_time then
            pipe row (l_outrec);
          else
            l_first_time := false;
          end if;
          l_outrec.assigned_to := r_tick.assigned_to;
          l_outrec.from_id := r_tick.id;
       end if;
       l_outrec.to_id := r_tick.id;
    end loop;

    pipe row (l_outrec);

    return;
  end;

end;
/

You can test it with:

select * from table(testpl.pltest);

It is approximately twice as fast as Rob van Wijk's solution on my Windows XP Oracle 11.1.0.6.0 system.

The

for r_tick in (select ....) loop
  ....
end loop;

construction has a very decent performance in Oracle 10 and 11. Most of the time SQL only solutions are faster but I think that here PL/SQL is faster.

tuinstoel
+1 nice clean PL/SQL solution. However, I see same response times in my test. Did you adjust your SQL*Plus arraysize to 100? I'll update my post with the results of that test.
Rob van Wijk