views:

62

answers:

3

Hello,

I'm trying to fix some errors on a big database of stock exchange data. One column (quantity) has the traded volume on each tick, and other column stores the cumulative volume (i.e., the sum of the previous ticks of the day). This second column is wrong in some cases (not a lot, so we can safely assume that no to adjacent ticks are wrong). So theoretically the fix is easy: just search for a tick where the cumulative volume decreases (this suffices), and then pick the cumulative volume from the last tick and sum the quantity of the current tick. The thing is that i've been trying to get to work a query that does this in oracle, but i'm struggling due to my lack of expertise in sql. This is what i've gotten so far:

update
( 
    select m.cumulative_volume, q.cum_volume_ant, q.quantity from 
    market_data_intraday_trades m
    join
    (
          select * from
          (select
            product_key,
            sequence_number,
            lead(product_key) over (order by product_key, sequence_number) as product_key_ant,
            to_char(trade_date_time, 'yyyymmdd') as fecha,
            to_char(lag(trade_date_time) over (order by product_key, sequence_number), 'yyyymmdd') as fecha_ant,
            cumulative_volume,
            lead(cumulative_volume) over (order by product_key, sequence_number) as cum_volume_ant,
            cumulative_volume - lead(cumulative_volume) over (order by product_key, sequence_number) as dif 
          from market_data_intraday_trades)
          where product_key = product_key_ant
          and fecha = fecha_ant
          and dif < 0 
          and rownum < 10
    ) q
    on m.sequence_number = q.sequence_number
)
set m.cumulative_volume = q.cum_volume_ant + q.quantity

The current problem being that i can't seem to be able to use quantities from the inner query in the outside calculations.

Perhaps all of this would be clearer and/or easier with temporal tables or pl/sql or cursors, but due to corporate policies, i have no priviledges to do that, just selects and updates.

I would be very grateful if you could point me in some direction to solve this.

Thanks in advance!

PS. Fecha is date in spanish, just in case :)

+2  A: 

Did you try MERGE statement ? Maybe and depending of your Oracle version it could be a way to investigate, at least it could make your statement simpler.

Nicolas.

N. Gasparotto
+5  A: 

Here is some test data. As you can see the CUMULATIVE_VOLUME for the fourth row is wrong.

SQL> select product_key
  2         , trade_date_time
  3         , quantity
  4         , cumulative_volume
  5         , sum (quantity) over (partition by product_key order by sequence_number) as running_total
  6  from  market_data_intraday_trades
  7  order by sequence_number
  8  /

PROD TRADE_DAT   QUANTITY CUMULATIVE_VOLUME RUNNING_TOTAL
---- --------- ---------- ----------------- -------------
ORCL 23-JUN-10        100               100           100
ORCL 23-JUN-10         50               150           150
ORCL 25-JUN-10        100               250           250
ORCL 26-JUN-10        100               250           350
ORCL 26-JUN-10         50               400           400
ORCL 27-JUN-10         75               475           475

6 rows selected.

SQL>

The easiest solution is just to update all the rows with the calculated running total:

SQL> update market_data_intraday_trades m
  2  set m.cumulative_volume =
  3          ( select inq.running_total
  4            from (
  5                      select sum (quantity) over (partition by product_key
  6                                                  order by sequence_number) as running_total
  7                             , cumulative_volume
  8                             , rowid as row_id
  9                      from  market_data_intraday_trades
 10                  ) inq
 11             where m.rowid = inq.row_id
 12          )
 13  /

6 rows updated.

SQL> select product_key
  2         , trade_date_time
  3         , quantity
  4         , cumulative_volume
  5         , sum (quantity) over (partition by product_key
  6                                order by sequence_number) as running_total
  7         , rowid as row_id
  8  from  market_data_intraday_trades
  9  order by sequence_number
 10  /

PROD TRADE_DAT   QUANTITY CUMULATIVE_VOLUME RUNNING_TOTAL 
---- --------- ---------- ----------------- ------------- 
ORCL 23-JUN-10        100               100           100 
ORCL 23-JUN-10         50               150           150 
ORCL 25-JUN-10        100               250           250 
ORCL 26-JUN-10        100               350           350 
ORCL 26-JUN-10         50               400           400 
ORCL 27-JUN-10         75               475           475 

6 rows selected.

SQL> 

However, if you have a lot of data and you really don't want all those unnecessary updates then use the same query again to restricted the hits:

SQL> update market_data_intraday_trades m
  2  set m.cumulative_volume =
  3          ( select inq.running_total
  4            from (
  5                      select sum (quantity) over (partition by product_key
  6                                                  order by sequence_number) as running_total
  7                             , cumulative_volume
  8                             , rowid as row_id
  9                      from  market_data_intraday_trades
 10                  ) inq
 11             where m.rowid = inq.row_id
 12          )
 13  where m.rowid in
 14      ( select inq.row_id
 15            from (
 16                      select sum (quantity) over (partition by product_key
 17                                                  order by sequence_number) as running_total
 18                             , cumulative_volume
 19                             , rowid as row_id
 20                      from  market_data_intraday_trades
 21                  ) inq
 22             where m.cumulative_volume != running_total
 23          )
 24
SQL> /

1 row updated.

SQL> select product_key
  2         , trade_date_time
  3         , quantity
  4         , cumulative_volume
  5         , sum (quantity) over (partition by product_key
  6                                order by sequence_number) as running_total
  7  from  market_data_intraday_trades
  8  order by sequence_number
  9  /

PROD TRADE_DAT   QUANTITY CUMULATIVE_VOLUME RUNNING_TOTAL
---- --------- ---------- ----------------- -------------
ORCL 23-JUN-10        100               100           100
ORCL 23-JUN-10         50               150           150
ORCL 25-JUN-10        100               250           250
ORCL 26-JUN-10        100               350           350
ORCL 26-JUN-10         50               400           400
ORCL 27-JUN-10         75               475           475

6 rows selected.

SQL> 

I tried Nicolas's suggestion of using MERGE. If you are using 10g or higher, then this would work. You need a recent version of Oracle because 9i didn't support MERGE with an UPDATE but no INSERT (and 8i didn't support MERGE at all).

SQL> merge into market_data_intraday_trades m
  2  using ( select running_total
  3                 , row_id
  4          from
  5              (   select sum (quantity) over (partition by product_key
  6                                              order by sequence_number) as running_total
  7                         , cumulative_volume
  8                         , rowid as row_id
  9                  from  market_data_intraday_trades
 10               )
 11           where cumulative_volume != running_total
 12          ) inq
 13  on ( m.rowid = inq.row_id  )
 14  when matched then
 15      update set m.cumulative_volume = inq.running_total
 16  /

1 row merged.

SQL>

This solution is tidier then the other solution.

APC
+1 You beat me to it. I was just about to write an answer combining your and Nicolas' suggestions. Please note that the MERGE variant is not only tidier, but above all: faster. Second note: in 9i you can add a dummy WHEN NOT MATCHED THEN clause. Since your USING clause only contains a subset of the table in the MERGE INTO clause, it will never go to that branch.
Rob van Wijk
+3  A: 

Just adding a performance comparison to APC's answer:

SQL> update market_data_intraday_trades m
  2  set m.cumulative_volume =
  3          ( select inq.running_total
  4            from (
  5                      select sum (quantity) over (partition by product_key
  6                                                  order by sequence_number) as running_total
  7                             , cumulative_volume
  8                             , rowid as row_id
  9                      from  market_data_intraday_trades
 10                  ) inq
 11             where m.rowid = inq.row_id
 12          )
 13  /

6 rows updated.

SQL> select * from table(dbms_xplan.display_cursor(null,null,'iostats last'))
  2  /

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  4mgw11769k00r, child number 0
-------------------------------------
update market_data_intraday_trades m set m.cumulative_volume =         ( select inq.running_total
      from (                     select sum (quantity) over (partition by product_key
                                order by sequence_number) as running_total
, cumulative_volume                            , rowid as row_id                     from
market_data_intraday_trades                 ) inq            where m.rowid = inq.row_id         )

Plan hash value: 3204855846

--------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name                        | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------------
|   1 |  UPDATE              | MARKET_DATA_INTRADAY_TRADES |      1 |        |      0 |00:00:00.01 |      35 |
|   2 |   TABLE ACCESS FULL  | MARKET_DATA_INTRADAY_TRADES |      1 |      6 |      6 |00:00:00.01 |       3 |
|*  3 |   VIEW               |                             |      6 |      6 |      6 |00:00:00.01 |      18 |
|   4 |    WINDOW SORT       |                             |      6 |      6 |     36 |00:00:00.01 |      18 |
|   5 |     TABLE ACCESS FULL| MARKET_DATA_INTRADAY_TRADES |      6 |      6 |     36 |00:00:00.01 |      18 |
--------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("INQ"."ROW_ID"=:B1)


25 rows selected.

Look at those 36's. That's O(N^2).

SQL> update market_data_intraday_trades m
  2  set m.cumulative_volume =
  3          ( select inq.running_total
  4            from (
  5                      select sum (quantity) over (partition by product_key
  6                                                  order by sequence_number) as running_total
  7                             , cumulative_volume
  8                             , rowid as row_id
  9                      from  market_data_intraday_trades
 10                  ) inq
 11             where m.rowid = inq.row_id
 12          )
 13  where m.rowid in
 14      ( select inq.row_id
 15            from (
 16                      select sum (quantity) over (partition by product_key
 17                                                  order by sequence_number) as running_total
 18                             , cumulative_volume
 19                             , rowid as row_id
 20                      from  market_data_intraday_trades
 21                  ) inq
 22             where m.cumulative_volume != running_total
 23          )
 24
SQL> /

1 row updated.

SQL> select * from table(dbms_xplan.display_cursor(null,null,'iostats last'))
  2  /

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  8fg3vnav1t742, child number 0
-------------------------------------
update market_data_intraday_trades m set m.cumulative_volume =         ( select inq.running_total
     from (                     select sum (quantity) over (partition by product_key
                              order by sequence_number) as running_total                            ,
cumulative_volume                            , rowid as row_id                     from
market_data_intraday_trades                 ) inq            where m.rowid = inq.row_id         )
where m.rowid in     ( select inq.row_id           from (                     select sum (quantity)
over (partition by product_key                                                 order by
sequence_number) as running_total                            , cumulative_volume
     , rowid as row_id                     from  market_data_intraday_trades                 ) inq
       where m.cumulative_volume != running_total         )

Plan hash value: 1087408236

---------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name                        | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------------
|   1 |  UPDATE               | MARKET_DATA_INTRADAY_TRADES |      1 |        |      0 |00:00:00.01 |      14 |
|*  2 |   HASH JOIN SEMI      |                             |      1 |      5 |      1 |00:00:00.01 |       6 |
|   3 |    TABLE ACCESS FULL  | MARKET_DATA_INTRADAY_TRADES |      1 |      6 |      6 |00:00:00.01 |       3 |
|   4 |    VIEW               |                             |      1 |      6 |      6 |00:00:00.01 |       3 |
|   5 |     WINDOW SORT       |                             |      1 |      6 |      6 |00:00:00.01 |       3 |
|   6 |      TABLE ACCESS FULL| MARKET_DATA_INTRADAY_TRADES |      1 |      6 |      6 |00:00:00.01 |       3 |
|*  7 |   VIEW                |                             |      1 |      6 |      1 |00:00:00.01 |       4 |
|   8 |    WINDOW SORT        |                             |      1 |      6 |      6 |00:00:00.01 |       4 |
|   9 |     TABLE ACCESS FULL | MARKET_DATA_INTRADAY_TRADES |      1 |      6 |      6 |00:00:00.01 |       4 |
---------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("M".ROWID="INQ"."ROW_ID")
       filter("M"."CUMULATIVE_VOLUME"<>"RUNNING_TOTAL")
   7 - filter("INQ"."ROW_ID"=:B1)


36 rows selected.

That's much better.

SQL> merge into market_data_intraday_trades mdit1
  2  using ( select product_key
  3               , sequence_number
  4               , running_total
  5            from ( select product_key
  6                        , sequence_number
  7                        , cumulative_volume
  8                        , sum(quantity) over (partition by product_key order by sequence_number) as running_total
  9                     from market_data_intraday_trades
 10                 )
 11           where cumulative_volume != running_total
 12        ) mdit2
 13     on (   mdit1.product_key = mdit2.product_key
 14        and mdit1.sequence_number = mdit2.sequence_number
 15        )
 16   when matched then
 17        update set mdit1.cumulative_volume = mdit2.running_total
 18  /

1 row merged.

SQL> select * from table(dbms_xplan.display_cursor(null,null,'iostats last'))
  2  /

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  cjafdk3jg4gzz, child number 0
-------------------------------------
merge into market_data_intraday_trades mdit1 using ( select product_key              , sequence_number
            , running_total           from ( select product_key                       , sequence_number
                      , cumulative_volume                       , sum(quantity) over (partition by
product_key order by sequence_number) as running_total                    from
market_data_intraday_trades                )          where cumulative_volume != running_total       )
mdit2    on (   mdit1.product_key = mdit2.product_key       and mdit1.sequence_number =
mdit2.sequence_number       )  when matched then       update set mdit1.cumulative_volume =
mdit2.running_total

Plan hash value: 2367693855

----------------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name                        | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------------
|   1 |  MERGE                 | MARKET_DATA_INTRADAY_TRADES |      1 |        |      1 |00:00:00.01 |       9 |
|   2 |   VIEW                 |                             |      1 |        |      1 |00:00:00.01 |       6 |
|*  3 |    HASH JOIN           |                             |      1 |      6 |      1 |00:00:00.01 |       6 |
|*  4 |     VIEW               |                             |      1 |      6 |      1 |00:00:00.01 |       3 |
|   5 |      WINDOW SORT       |                             |      1 |      6 |      6 |00:00:00.01 |       3 |
|   6 |       TABLE ACCESS FULL| MARKET_DATA_INTRADAY_TRADES |      1 |      6 |      6 |00:00:00.01 |       3 |
|   7 |     TABLE ACCESS FULL  | MARKET_DATA_INTRADAY_TRADES |      1 |      6 |      6 |00:00:00.01 |       3 |
----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("MDIT1"."PRODUCT_KEY"="PRODUCT_KEY" AND "MDIT1"."SEQUENCE_NUMBER"="SEQUENCE_NUMBER")
   4 - filter("CUMULATIVE_VOLUME"<>"RUNNING_TOTAL")


31 rows selected.

But merge beats them all with one table scan less.

Regards, Rob.

Rob van Wijk
Wow, guys, thanks a lot for the thorough response! I wasn't expecting anything so detailed! Now i'm ashamed of my responses on other questions :P
Alex Ati
+1 Thanks for comparison. For on-off updates performance isn't always a consideration. But when the most performant is also the most straightforward ...
APC