tags:

views:

68

answers:

2

I've got a query that presently looks something like this (I'll simplify it to get to the point):

select shipment.f1, type1detail.f2
from shipment
join type1detail using (shipmentid)
where shipment.otherinfo='foo'
union
select shipment.f1, type2detail.f2
from shipment
join type2detail using (shipmentid)
where shipment.otherinfo='foo'

That is, we have two types of detail that can be on a shipment, and I need to find all the records for both types for all the shipments meeting a given condition. (Imagine that the two detail tables have some fields with matching data. In real life there are several more joins in there; as I say I'm trying to simplify the query to get to the question I'm trying to ask today.)

But the "where" clause requires a full-file scan, so it's pretty inefficient to do this full file scan twice. Logically, the database engine should be able to find the shipment records that meet the condition once, and then find all the records from type1detail and type2detail with shipmentid matching those values.

But how do I say this in SQL?

select shipment.f1, detail.f2
from shipment
join
(select shipmentid, f2
from type1detail
union
selection shipmdentid, f2
from type2detail
) d using (shipmentid)
where shipment.otherinfo='foo'

would required a full file scan of type1detail, union'ed with a full file scan of type2detail, and then join this to shipment, which would be worse than reading shipment twice. (As each of the detail tables are bigger than shipment.)

This seems like a straightforward thing to want to do, but I just can't think of way that I could express it.

I'm using Postgres if you know a solution particular to that engine, but I was hoping for a generic SQL solution.

+1  A: 

I would use a temp table to dump shipmentid's into that match your "otherinfo" where clause. Then use that temp table in your joins (not replacing any existing join, but adding another one) so that the join criteria is only using shipmentids and no where clause is needed (unless you are also filtering on columns not in the shipment table, but the point is you still scanned that table only once).

Dane
Well, I was hoping for a prettier solution, but I think this is, in fact, the right answer. (Sorry if that didn't sound like a rousing endorsement. Reminds me of the time a man I know told a woman that she was "not unattractive". So I said, "Wow Jim, that was a really rousing complement. You must have women falling at your feet when you flatter them like that.")
Jay
A: 

If you run this query more than occasionally, you're probably better off indexing the otherinfo column to avoid the full scans altogether. I don't work much with Postgres, but it looks like there is at least one full text indexing option already.

Randy
That's absolutely true, and would be a decent solution to the immediate problem. (It's not a full text search but a simple field-equals test anyway.) But once I saw the issue I started thinking more generally. While a new index would solve it here, (a) the way our company is structured there's no way I'd get that done for the next release, and (b) it brings up many other hypothetical cases.
Jay