views:

114

answers:

4

I have a table that outputs similar to this (although in thousands):

     EMPNO ENAME      TRANDATE      AMT
---------- ---------- --------- -------
       100 Alison     21-MAR-96   45000
       100 Alison     12-DEC-78   23000
       100 Alison     24-OCT-82   11000
       101 Linda      15-JAN-84   16000
       101 Linda      30-JUL-87   17000
       102 Celia      31-DEC-90   78000
       102 Celia      17-SEP-96   21000
       103 James      21-MAR-96   45000
       103 James      12-DEC-78   23000
       103 James      24-OCT-82   11000
       104 Robert     15-JAN-84   16000
       104 Robert     30-JUL-87   17000

My desired output would be similar to this:

     EMPNO ENAME      TRANDATE      AMT PAGE
---------- ---------- --------- ------- ----
       100 Alison     21-MAR-96   45000    1
       100 Alison     12-DEC-78   23000    1
       100 Alison     24-OCT-82   11000    1
       101 Linda      15-JAN-84   16000    2
       101 Linda      30-JUL-87   17000    2
       102 Celia      31-DEC-90   78000    2
       102 Celia      17-SEP-96   21000    2
       103 James      21-MAR-96   45000    3
       104 Robert     12-DEC-78   23000    4
       104 Robert     24-OCT-82   11000    4
       104 Robert     15-JAN-84   16000    4
       104 Robert     30-JUL-87   17000    4

Basically, it should insert a new field to identify the page it belongs to. The page break is based on the rows. And, as if "kept together" in EMPNO, it adds 1 to PAGE when the rows cannot add the next EMPNO batch. It's for the Excel limit since Excel does not allow more than 65000 rows (or so) in a single Sheet. In the sample's case, it's only 4 rows. The limit number is static.

+1  A: 

The following SQL statement splits the twenty records in my EMP table into five pages of four rows each:

SQL> select empno
  2         , ename
  3         , deptno
  4         , ceil((row_number() over (order by deptno, empno)/4)) as pageno
  5  from emp
  6  /

     EMPNO ENAME          DEPTNO     PAGENO
---------- ---------- ---------- ----------
      7782 BOEHMER            10          1
      7839 SCHNEIDER          10          1
      7934 KISHORE            10          1
      7369 CLARKE             20          1
      7566 ROBERTSON          20          2
      7788 RIGBY              20          2
      7876 KULASH             20          2
      7902 GASPAROTTO         20          2
      7499 VAN WIJK           30          3
      7521 PADFIELD           30          3
      7654 BILLINGTON         30          3
      7698 SPENCER            30          3
      7844 CAVE               30          4
      7900 HALL               30          4
      8083 KESTELYN           30          4
      8084 LIRA               30          4
      8060 VERREYNNE          50          5
      8061 FEUERSTEIN         50          5
      8085 TRICHLER           50          5
      8100 PODER              50          5

20 rows selected.

SQL>
APC
@APC will not work. Important condition - that empno can't be in two groups. In your case empno 101 will be in two separated groups.
Michael Pakhantsov
Thanks! This code works for me! (Update: Ok... maybe not. Good find, Michael.)
Kei
Just not work - wrong results. E.g. where is counted additional empty lines at the end of page for 3rd page and later?
ThinkJet
+1  A: 

It's too tricky or even impossible to do such thing in plain SQL.

But with some limitations problem can be resolved with help of user-defined aggregate functions .

First, create object with ODCIAggregate interface implementation:

create or replace type page_num_agg_type as object
(
  -- Purpose : Pagination with "leave together" option

  -- Attributes             

  -- Current page number
  cur_page_number    number,                                 

  -- Cumulative number of rows per page incremented by blocks
  cur_page_row_count number,

  -- Row-by-row counter for detect page overflow while placing single block
  page_row_counter   number,

  -- Member functions and procedures

  static function ODCIAggregateInitialize(
    sctx in out page_num_agg_type
  )
  return number,

  member function ODCIAggregateIterate(
    self        in out page_num_agg_type,
    value       in     number
  )
  return number,

  member function ODCIAggregateTerminate(
    self        in   page_num_agg_type,
    returnValue out  number,
    flags       in   number
  )
  return number,

  member function ODCIAggregateMerge(
    self in out page_num_agg_type,
    ctx2 in     page_num_agg_type
  )
  return number

);

Create type body:

create or replace type body PAGE_NUM_AGG_TYPE is

  -- Member procedures and functions
  static function ODCIAggregateInitialize(
    sctx in out page_num_agg_type
  )
    return number
  is
  begin
      sctx := page_num_agg_type(1, 0, 0);
      return ODCIConst.Success;
  end;

  member function ODCIAggregateIterate(
    self        in out page_num_agg_type,
    value       in     number
 )
   return number
 is
   -- !!! WARNING: HARDCODED !!!
   RowsPerPage number := 4;
 begin

   self.page_row_counter := self.page_row_counter + 1;

   -- Main operations: determine number of page

   if(value > 0) then 
     -- First row of new block

    if(self.cur_page_row_count + value > RowsPerPage) then
       -- If we reach next page with new block of records - switch to next page.
       self.cur_page_number := self.cur_page_number + 1;
       self.cur_page_row_count := value;
       self.page_row_counter := 1;
    else
       -- Just increment rows and continue to place on current page
       self.cur_page_row_count := self.cur_page_row_count + value;
    end if;

   else                       
     -- Row from previous block

     if(self.page_row_counter > RowsPerPage) then 
       -- Single block of rows exceeds page size - wrap to next page.
       self.cur_page_number := self.cur_page_number + 1;
       self.cur_page_row_count := self.cur_page_row_count - RowsPerPage;
       self.page_row_counter := 1;
     end if;

   end if;

   return ODCIConst.Success;
 end;

 member function ODCIAggregateTerminate(
   self        in page_num_agg_type,
   returnValue out number,
   flags       in number
 )
   return number
 is
 begin
   -- Returns current page number as result
   returnValue := self.cur_page_number;
   return ODCIConst.Success;
 end;

 member function ODCIAggregateMerge(
   self in out page_num_agg_type,
   ctx2 in     page_num_agg_type

 )
   return number
 is
 begin
   -- Can't act in parallel - error on merging attempts
   raise_application_error(-20202,'PAGE_NUM_AGG_TYPE can''t act in parallel mode');
   return ODCIConst.Success;
 end;

end;

Create agrreation function to use with type:

create function page_num_agg (
  input number
) return number aggregate using page_num_agg_type;

Next prepare data and use new function to calculate page numbers:

with data_list as (
  -- Your example data as source
  select 100 as EmpNo, 'Alison' as EmpName, to_date('21-MAR-96','dd-mon-yy') as TranDate, 45000 as AMT from dual union all
  select 100 as EmpNo, 'Alison' as EmpName, to_date('12-DEC-78','dd-mon-yy') as TranDate, 23000 as AMT from dual union all
  select 100 as EmpNo, 'Alison' as EmpName, to_date('24-OCT-82','dd-mon-yy') as TranDate, 11000 as AMT from dual union all
  select 101 as EmpNo, 'Linda'  as EmpName, to_date('15-JAN-84','dd-mon-yy') as TranDate, 16000 as AMT from dual union all
  select 101 as EmpNo, 'Linda'  as EmpName, to_date('30-JUL-87','dd-mon-yy') as TranDate, 17000 as AMT from dual union all
  select 102 as EmpNo, 'Celia'  as EmpName, to_date('31-DEC-90','dd-mon-yy') as TranDate, 78000 as AMT from dual union all
  select 102 as EmpNo, 'Celia'  as EmpName, to_date('17-SEP-96','dd-mon-yy') as TranDate, 21000 as AMT from dual union all
  select 103 as EmpNo, 'James'  as EmpName, to_date('21-MAR-96','dd-mon-yy') as TranDate, 45000 as AMT from dual union all
  select 103 as EmpNo, 'James'  as EmpName, to_date('12-DEC-78','dd-mon-yy') as TranDate, 23000 as AMT from dual union all
  select 103 as EmpNo, 'James'  as EmpName, to_date('24-OCT-82','dd-mon-yy') as TranDate, 11000 as AMT from dual union all
  select 104 as EmpNo, 'Robert' as EmpName, to_date('15-JAN-84','dd-mon-yy') as TranDate, 16000 as AMT from dual union all
  select 104 as EmpNo, 'Robert' as EmpName, to_date('30-JUL-87','dd-mon-yy') as TranDate, 17000 as AMT from dual union all
  select 105 as EmpNo, 'Monica' as EmpName, to_date('30-JUL-88','dd-mon-yy') as TranDate, 31000 as AMT from dual union all
  select 105 as EmpNo, 'Monica' as EmpName, to_date('01-JUL-87','dd-mon-yy') as TranDate, 19000 as AMT from dual union all
  select 105 as EmpNo, 'Monica' as EmpName, to_date('31-JAN-97','dd-mon-yy') as TranDate, 11000 as AMT from dual union all
  select 105 as EmpNo, 'Monica' as EmpName, to_date('17-DEC-93','dd-mon-yy') as TranDate, 33000 as AMT from dual union all
  select 105 as EmpNo, 'Monica' as EmpName, to_date('11-DEC-91','dd-mon-yy') as TranDate, 65000 as AMT from dual union all
  select 105 as EmpNo, 'Monica' as EmpName, to_date('22-OCT-89','dd-mon-yy') as TranDate, 19000 as AMT from dual 
),
ordered_data as (
  select            
    -- Source table fields
    src_data.EmpNo, src_data.EmpName, src_data.TranDate, src_data.AMT,
    -- Calculate row count per one employee
    count(src_data.EmpNo) over(partition by src_data.EmpNo)as emp_row_count,
    -- Calculate rank of row inside employee data sorted in output order
    rank() over(partition by src_data.EmpNo order by src_data.EmpName, src_data.TranDate) as emp_rnk
  from 
    data_list src_data
)  
-- Final step: calculate page number for rows
select 
    -- Source table data
    ordered_data.EmpNo, ordered_data.EmpName, ordered_data.TranDate, ordered_data.AMT, 
    -- Aggregate all data with our new function
    page_num_agg(
      -- pass count of rows to aggregate function only for first employee's row
      decode(ordered_data.emp_rnk, 1, ordered_data.emp_row_count, 0) 
    ) 
      over (order by ordered_data.EmpName, ordered_data.TranDate) as page_number
from    
  ordered_data    
order by 
  ordered_data.EmpName, ordered_data.TranDate

And, finally ...

Disadvantages of this solution:

  1. Hardcoded page row count.
  2. Requires some specific data preparation in query to use aggregate function properly.

Advantages of this solution:

  1. Just works :)

Updated: improved to handle oversized blocks, example modified.

ThinkJet
I haven't really done any user-defined aggregate functions before. I'm allowed to have procedures though.
Kei
A: 

ThinkJet is right that that some of the other answers don't cater for the 'keep together' requirement. However I think this can be done without resorting to a user-defined aggregate.

Sample data

create table test (empno number, ename varchar2(20), trandate date, amt number);
insert into test values (100, 'Alison'   ,  to_date('21-MAR-1996') ,   45000);
insert into test values (100, 'Alison'   ,  to_date('12-DEC-1978') ,   23000);
insert into test values (100, 'Alison'   ,  to_date('24-OCT-1982') ,   11000);
insert into test values (101, 'Linda'    ,  to_date('15-JAN-1984') ,   16000);
insert into test values (101, 'Linda'    ,  to_date('30-JUL-1987') ,   17000);
insert into test values (102, 'Celia'    ,  to_date('31-DEC-1990') ,   78000);
insert into test values (102, 'Celia'    ,  to_date('17-SEP-1996') ,   21000);
insert into test values (103, 'James'    ,  to_date('21-MAR-1996') ,   45000);
insert into test values (103, 'James'    ,  to_date('12-DEC-1978') ,   23000);
insert into test values (103, 'James'    ,  to_date('24-OCT-1982') ,   11000);
insert into test values (104, 'Robert'   ,  to_date('15-JAN-1984') ,   16000);
insert into test values (104, 'Robert'   ,  to_date('30-JUL-1987') ,   17000);

Now, determine the end row of each empno segment (using RANK to find the start and COUNT..PARTITION BY to find the number in the segment).

Then use ceil/4 from APC's solution to group them into their 'pages'. Again, as pointed out by ThinkJet, there is a problem in the specification as it doesn't cater for the situation when there are more records in the empno 'keep together' segment than can fit in a page.

select empno, ename,
       ceil((rank() over (order by empno) +
         count(1) over (partition by empno))/6) as chunk
from test
order by 1;

As pointed out by ThinkJet, this solution isn't bullet proof.

drop table test purge;

create table test (empno number, ename varchar2(20), trandate date, amt number);
declare
    cursor csr_name is
    select rownum emp_id, 
             decode(rownum,1,'Alan',2,'Brian',3,'Clare',4,'David',5,'Edgar',
             6,'Fred',7,'Greg',8,'Harry',9,'Imran',10,'John',
             11,'Kevin',12,'Lewis',13,'Morris',14,'Nigel',15,'Oliver',
             16,'Peter',17,'Quentin',18,'Richard',19,'Simon',20,'Terry',
             21,'Uther',22,'Victor',23,'Wally',24,'Xander',
             25,'Yasmin',26,'Zac') emp_name
    from dual connect by level <= 26;
begin
  for c_name in csr_name loop
    for i in 1..11 loop
       insert into test values 
           (c_name.emp_id, c_name.emp_name, (date '2010-01-01') + i,
             to_char(sysdate,'SS') * 1000);
    end loop;
  end loop;
end;
/

select chunk, count(*)
from
  (select empno, ename,
       ceil((rank() over (order by empno) +
         count(1) over (partition by empno))/25) as chunk
  from test)
group by chunk
order by chunk
;

So with chunk size of 25 and group size of 11, we get the jumps where it fits 33 people in the chunk despite the 25 limit. Large chunk sizes and small groups should make this infrequent, but you'd want to allow some leeway. So maybe set the chunks to 65,000 rather than going all the way to 65,536.

Gary
Thanks so much! This works for my case! And I really thought the query would be longer. I understand about the problem of the 65k+ records with the same EMPNO, but in my case, it would most likely be not happening.
Kei
There are same error as in other answers: random gaps at the end of page not handled in calculation. E.g. suppose you have 3 groups of rows (A,B,C) with 6 rows in each group. Try to place it on pages, 6 row per page: Group A - rank() = 1 - Ok, 1st page; Group B - rank() = 7 - Ok, 2nd page filled from top and 4 rows left blank on 1st page; Group C - rank() = 13 - ERR!, seems to fit to 2nd page but must be placed on 3rd, because REAL rank() value must include empty rows from 1st page: 6+4+6+1=17. No way to predict number of empty rows before real page construction.
ThinkJet
This can happen even for 65k+ rows, even at the end of 2nd page:E.g. 5 free rows on 1st page + 4 free rows on 2nd page + next block have size from 5 to 9 rows ...
ThinkJet
Ah, I get ThinkJet's issue. There is room for 5 rows at the end of page 1, but the next group is six rows. Those six rows get pushed to page 2 correctly, but those six rows aren't correctly counted as being part of page 2. So when it comes to working out how many can fit on page 2, it may get it wrong.
Gary
I think I kinda get the issue. I was just too excited and quick to check on my tests (with a 50k chunk that correctly output a two-page file). But I did notice the decrease on the chunks when I tested the query with a smaller chunk size (in 300's).
Kei
A: 

How about this one: (100 is the row limit per page)

select
    a.*, floor(count(1) over (order by empno, empname)/100)+1 as page
from source_table a
order by page, empno, empname;
rand0m