views:

70

answers:

1

Can someone provide an example of how to use parallel table function in oracle pl/sql. We need to run massive queries for 15 years and combine the result.

SELECT * 
  FROM Table(TableFunction(cursor(SELECT * FROM year_table))) 

...is what we want effectively. The innermost select will give all the years, and the table function will take each year and run massive query and returns a collection. The problem we have is that all years are being fed to one table function itself, we would rather prefer the table function being called in parallel for each of the year. We tried all sort of partitioning by hash and range and it didn't help.

Also, can we drop the keyword PIPELINED from the function declaration? because we are not performing any transformation and just need the aggregate of the resultset.

+1  A: 

There's an excellent write-up here.

There are alternative approaches (eg a 'master' job that cursors through YEAR_TABLE and submits a DBMS_JOB to process each year. Each 'year job' would insert its results into a table.

Once all the spawned jobs are finished, you just pull the results from the table.

PS. I suspect parallel pipelined won't do what you want though. I created a large table with just three rows with a specific value. I then created a parallel pipelined function that just pushed out the SID of the executing process (see below) and the number of rows it processes. I had an SQL that picked out those three rows, and passed that as the the cursor into the function. Mostly the function pushed out two different SIDs (which is what EXPLAIN PLAN told me it picked as the parallelism degree). Sometimes it showed two processes had ed executed, but all three rows were processed by one of those processes.

So it isn't the case that rows will be picked from the cursor and passed to the parallel slaves to be processed, but rather each parallel process will be given a slice of the driving table to deal with. With a small table, it probably wouldn't consider parallel and even if it did, it may just allocate the first 50 rows to the first process etc.

CREATE OR REPLACE FUNCTION test_pp(p_source     IN SYS_REFCURSOR)
   RETURN TAB_CHAR_4000  PIPELINED
   PARALLEL_ENABLE (PARTITION p_source BY ANY)
IS
   v_num NUMBER;
BEGIN
   FETCH p_source INTO v_num;
   WHILE p_source%FOUND LOOP
            PIPE ROW(sys_context('USERENV','SID'));
            FETCH p_source INTO v_num;
   END LOOP;
     PIPE ROW(sys_context('USERENV','SID')||':'||p_source%ROWCOUNT);
   CLOSE p_source;
   RETURN;
END test_pp;
/
Gary