views:

80

answers:

4

I have a really big table with dozens of columns and lots of rows. Lets call this table FT. Everyday I run a script that reads data from the FT table, perform some calculations, and update a smaller table (table FA) that I use to generate reports.

The query that update FA is something like:

INSERT INTO FA (A, B, C) 
    (SELECT sum(X), sum(x) * sum(y), sum(x) + sum(z)) group by..

As I use sum(x) a lot of times, will it be faster if I create a temporary table with sum(x), sum(y) and sum(z) and use it to update my FA table?

+2  A: 

every db i know has this kind of thign optimized so the values are calculated only once.

if you're not certain look at the execution plan and the reads for the current query and your changed to temp table query.

Mladen Prajdic
+2  A: 

As a general rule of thumb, the time taken to retreive the data from disk is the slowest operation a database does (particularly on a large table)

I would expect relatively straight-forward arithmetic operations such as these to be negligable in comparison.

cagcowboy
You say that like someone who's never accidentally written a cartesian product of two 1m+ row tables sent over a network. There's very little physical I/O in such a query, but scads of logical I/O and even more network time.
Adam Musch
A: 

Benchmark your query against:

insert into fa (a, b, c)
select sum_x, sum_x * sum_y, sum_x * sum_z
  from (select sum(x) as sum_x, sum(y) as sum_y, sum(z) as sum_z
          from my_table
         group by my_grouping_columns)

My strong suspicion is that Oracle's got to build the intermediate set first regardless -- the sums as grouped by - and then transform that into the final result set, regardless.

It will definitely not be easier or faster to force Oracle to materialize the intermediate result set into a global temporary table; you're adding direct path I/O without having a good reason to do so. That said, if the intermediate result set is expensive to build and used in multiple inserts, it may be worthwhile to materialize it into a temporary table.

Adam Musch
A: 

Considering that you have tagged this post with data-warehouse and datamart, I can only assume that your FT table is some kind of a fact and that the query looks something like:

select 
    CalendarMonth
  , sum(x) as Tot_1 
  , sum(x) * sum(y) as Tot_2
  , sum(x) + sum(z) as Tot_3
from FT         as f
join dimDate    as d on d.DateKey    = f.DateKey
join dimUser    as u on u.UserKey    = f.UserKey
join dimProduct as p on p.ProductKey = f.ProductKey
where CalendarYear between 2008 and 2010
  and Country = 'United States'
  and ProductCategory = 'Cool Gadget'
  and UserGender = 'Female'
group by CalendarMonth ;

Which is exactly how an aggregation over measures in a fact table should look like.

Now, for reporting purposes, it seems that you have an aggregation table (FA) to speed-up reports. I can only guess that the warehouse is loaded over night and that your query prepares the aggregation sometimes in the morning, before business hours, so it runs once per day -- or at least is supposed to. If this query takes too long to run, consider adding few key fields to your aggregation table (FA) -- usually DateKey -- then update the FA table periodically.

For example, if you have 10,000 sales per day than the above query sums ~ 300,000 rows for each month. If the aggregation table is aggregated per day, than it takes sum of 10,000 rows once per day to update the table, and sum of only 30 rows per month for a report.

To summarize, in order to speed-up fact aggregation queries focus on number of rows that are aggregated -- not on aggregate functions. Also, make sure that dimension tables have indexes on columns mentioned in the WHERE clause of the query.

Admittedly, I may have assumed too much here, so this may or may not be helpful.

Damir Sudarevic