views:

258

answers:

4

I am trying to CROSS JOIN two tables, customers and items, so I can then create a sales by customer by item report. I have 2000 customer and 2000 items.

SELECT customer_name FROM customers; --Takes 100ms

SELECT item_number FROM items; --Takes 50ms

SELECT customer_name, item_number FROM customers CROSS JOIN items; Takes 200000ms

I know this is 4 million rows, but is it possible to get this to run any faster? I want to eventually join this with a sales table like this:

SELECT customer_name, item_number, sales_total FROM customers CROSS JOIN items LEFT JOIN sales ON (customer.customer_name = sales.customer_name, item.item_number=sales.item_number);

The sales table will obviously not have all customers or all items, so the goal here is to have a report that shows all customers and all items along with what was sold and not sold.

I'm using PostgreSQL 8.4

+2  A: 

To answer your question: No, you can't do a cross join faster than that - if you could then that would be how CROSS JOIN would be implemented.

But really you don't want a cross join. You probably want two separate queries, one which lists all customers, and another which lists all items and whether or not they were sold.

Mark Byers
I'll have to try two queries at the application level and see if it's faster at looping through all the customer and item combos. However, I'll be surprised if the application can do it faster than the database can.
A: 

I can't imagine that there will be a third-party solution to this, the PostgreSQL programmers know their system best and will heavily optimise it.

ridecar2
A: 

If you wish to see all items for a given client (even if the cient has no items), i would rather try

SELECT c.customer_name, i.item_number, s.sales_total
FROM customers c LEFT JOIN 
    sales s ON c.customer_name = s.customer_name LEFT OIN
    items i on i.item_number=s.item_number

This should give you a list of all clients, and all items joined by sales.

astander
This only gives you what items each customer ordered, not all items ordered or not for each customer. So I cannot tell what each customer has not purchased.
A: 

This really needs to be multiple reports. I can think of several off the top of my head that will yield more efficient packaging of information:

  1. Report: count of all purchases by customer/item (obvious).
  2. Report: list of all items not purchased, by customer.
  3. Report: Summary of Report #2 (count of items) in order to prioritize which customers to focus on.
  4. Report: list of all customer that have not bought an item by item.
  5. Report: Summary of Report #3 (count of customers) in order to identify both the most popular and unpopular items for further action.
  6. Report: List of all customers who purchased an item in the past, but did not purchase it his reporting period. This report is only relevant when the sales table has a date and the customers are expected to be regular buyers (i.e. disposable widgets). Won't work as well for things like service contracts.

The point here is that one should not insist that the tool process every possible outcome at once and generate more data and anyone could possibly digest manually. One should engage the end-users and consumers of the data as to what their needs are and tailor the output to meet those needs. It will make both sides' lives much easier in the long run.

Matthew Wood