ansaurus

Question

Answer 1

+1 A:

try this (I have no idea how fast it will be - that depends on your indices)

  Select Count(*) TotalPurchases, 
     Sum(Case When c.transaction_id Is Not Null 
          Then 1 Else 0 End) MatchablePurchases
  From unmatched_purchases u
     Join purchases p 
        On p.purchases_id = u.unmatched_purchases_id
     Left Join credit_card_transactions c
        On customer_id = p.customer_id
           And item_id = u.item_id 
           And trans_timestamp - purchase_date < @DelayThreshold
  Where Location_id = @Location

Charles Bretana 2009-11-20 15:15:13

I'm not entirely sure how to ask what I'm think but I'll try: Will this build up the entire results table (IE after all the joins) then do the aggragation, or will will this look at each unmatch_purchase once, change the count and sum values, and move on? (does my question make sense? I'm concerned about how much memory this will take up while running)

David Oneill 2009-11-20 15:31:43

This depends on the query optimizer/processor and depends to some degree on the data, and the server's current statistics about the data. These factors help the optimizer decide on what type of joins to use, and in doing so, whether to create "aggregation buckets" to dump data into as it processes the joins, or to defer this until later, after the intermediate unaggregated result set has been constructed. Look at whatever oracle tool functions as a query plan display tool (Oracle equivilent of SQL Server ShowPlan) to see what it is actually doing.

Charles Bretana 2009-11-20 15:59:51

Ok, I'll check that.

David Oneill 2009-11-20 16:05:14

@David, did some ad'tl research (asked Oracle guy that sits near me) and he said Oracle and/or Toad has a tool called "Display Plan" that does this... You should check that out...

Charles Bretana 2009-11-20 16:10:21

Note for posterity: make sure to add u.location_id = p.location_id and c.location_id = u.location_id to the "on" statements of the joins.

David Oneill 2009-11-20 16:11:16

Answer 2

+1 A:

At least, you'll need more indexes. I propose at least the folloging:

An index on unmatched_purchases.purchases_id, one on purchases.location_id and another index on credit_card_transactions.(location_id, customer_id, item_id, trans_timestamp).

Without those indexes, there is little hope IMO.

ammoQ 2009-11-20 15:16:26

Will that last index help even though I'm looking for a range of values? IE it'll be something like "trans_timestamp between date - fuzz and date + fuzz"

David Oneill 2009-11-20 15:25:57

Yes, exactly; if you have exact values for the other three columns (through the join), the index will help you with the range selection.

ammoQ 2009-11-20 15:39:48

It might. This may sound insulting, and I really don't mean it to be, but doesn't a business that has processed half a billion purchases and 80 million credit card transactions have enough money to employ an Oracle DBA who knows how to index tables appropriately?!

Tony Andrews 2009-11-20 15:43:42

Yes we do, and I am not that person. I am working with our DBA in setting this up, but this query is a one-off query, and there are concerns more important that this one query that dictate what our partitioning and indexing.

David Oneill 2009-11-20 15:59:27

Answer 3

A:

I suggest you to query ALL locations at once. It will cost you 3 full scans (each table once) + sorting. I bet this will be faster then querying locations one by one.

But if you want not to guess, you at least need to examine EXPLAIN PLAN and 10046 trace of your query...

egorius 2009-11-20 15:28:45

I'll check into that, but we have a lot of locations, and location_id is indexed on all three tables. I thought that meant that when doing something "where locaction_id = 123" meant it didn't have to do a truly full scan.

David Oneill 2009-11-20 15:33:26

That's true but it also implies accessing the tables using single block reads, with a very high number of logical reads compared to using full table scans. Reading 1,000 rows from a very large table using an index might involve several thousand logical reads and many of those could easily be physical reads because of the number of blocks across which the rows are distributed (depending on clustering_factor)

David Aldridge 2009-11-20 15:43:06

So reading an entire table through an index is much less efficient than reading it as a full table scan.

David Aldridge 2009-11-20 15:46:07

Ah, I see your confusion. I wasn't clear: I don't need to run this for all locations. I need to run it on 3 of our 70 or so locations.

David Oneill 2009-11-20 16:01:34

David Oneill, in this case my suggestion is not suitable, of course. I thought you need them all.

egorius 2009-11-20 17:07:56

Answer 4

A:

The query ought to be straightforward, but the tricky part is to get it to perform. I'd question why you need to run it once for each location when it would probably be more eficient to run it for every location in a single query.

The join would be a big challenge, but the aggregation ought to be straightforward. I would guess that your best hope performance-wise for the join would be a hash join on the customer and item columns, with a subsequent filter operation on the date range. You might have to fiddle with putting the customer and item join in an inline view and then try to stop the date predicate from being pushed into the inline view.

The hash join would be much more efficient with tables that are being equi-joined both having the same hash partitioning key on all join columns, if that can be arranged.

Whether to use the location index or not ...

Whether the index is worth using or not depends on the clustering factor for the location index, which you can read from the user_indexes table. Can you post the clustering factor along with the number of blocks that the table contains? That will give a measure of the way that values for each location are distributed throughout the table. You could also extract the execution plan for a query such as:

select some_other_column
from   my_table
where  location_id in (value 1, value 2, value 3)

... and see if oracle thinks the index is useful.

David Aldridge 2009-11-20 15:40:44

Ah: I was unclear. I only need to run it for 2 or 3 of my locations, not all of them. *edits question*

David Oneill 2009-11-20 15:45:05

Ah hah ... how many locations do you have in total?

David Aldridge 2009-11-20 15:48:30

ansaurus

tags:

views:

answers:

SQL aggregation question

Whether to use the location index or not ...

related questions