tags:

views:

131

answers:

4

I've got the following SQL Statement that needs some major speed up. The problem is I need to search on two fields, where each of them is calling several sub-selects. Is there a way to join the two fields together so I call the sub-selects only once?

SELECT billyr, billno, propacct, vinid, taxpaid, duedate, datepif, propdesc
FROM trcdba.billspaid
WHERE date(datepif) > '01/06/2009'
AND date(datepif) <= '01/06/2010'
AND custno in
 (select custno from cwdba.txpytaxid where taxpayerno in
  (select taxpayerno from cwdba.txpyaccts where accountno in
   (select accountno from rtadba.reasacct where controlno = 1234567)))
OR custno2 in
 (select custno from cwdba.txpytaxid where taxpayerno in
  (select taxpayerno from cwdba.txpyaccts where accountno in
   (select accountno from rtadba.reasacct where controlno = 1234567)))
+12  A: 

I would use joins instead of the embedded sub-queries.

Big Endian
you can find more information on joins here:www.devshed.com/c/a/MySQL/Understanding-SQL-Joins/
Leslie
I've rewritten this query with Inner Joins, but how do I search on both fields (custno and custno2)? My new query:SELECT BILLYR, BILLNO, PROPACCT, VINID, TAXPAID, DUEDATE, DATEPIF, PROPDESCFROM TRCDBA.BILLSPAIDINNER JOIN cwdba.txpytaxid t1 ON custno = t1.custnoINNER JOIN cwdba.txpyaccts t2 ON t1.taxpayerno = t2.taxpayernoINNER JOIN rtadba.reasacct t3 ON t2.accountno = t3.accountnoWHERE DATE(DATEPIF) > '01/06/2009'AND DATE(DATEPIF) <= '01/06/2010'AND t3.controlno = 0950000000472
I decided to use a Union on the query and repeat it with the custno2. Extremely fast now. But if there is a way to include the custno2 without a union, I'd still love to hear it.
@jeffself, Using an OR in the ON clause of your join, as demonstrated in my answer will avoid having to do a UNION, however, UNION may actually be faster, so I'd try both.
Marcus Adams
Most competent database engines will produce exactly the same query plan with a **non** correlated subquery. I know that SQL Server is quite capable of transforming it into an `INNER JOIN` (and not nested loops either - merge or hash join). Unless there's some database-specific handicap at work here, this is a complete non-answer and may actually be harmful, suggesting that join orders significantly influence query plans (they almost never do). The slowness of this query is due to the fact that it's poorly-written, using non-sargable functions and an orphaned `OR` as Larry pointed out.
Aaronaught
+5  A: 

Here's the same thing using JOIN instead of sub queries.

SELECT billyr, billno, propacct, vinid, taxpaid, duedate, datepif, propdesc
FROM billspaid
INNER JOIN txpytaxid
  ON txpytaxid.custno = billspaid.custno OR txpytaxid.custno = billspaid.custno2
INNER JOIN txpyaccts
  ON txpyaccts.taxpayerno = txpytaxid.taxpayerno
INNER JOIN reasacct
  ON reasacct.accountno = txpyaccts.accountno AND reasacct.controlno = 1234567
WHERE date(datepif) > '01/06/2009'
  AND date(datepif) <= '01/06/2010'

However, if the OR in the JOIN is giving you performance problems, you can always try using a union:

(SELECT billyr, billno, propacct, vinid, taxpaid, duedate, datepif, propdesc
FROM billspaid
INNER JOIN txpytaxid
  ON txpytaxid.custno = billspaid.custno
INNER JOIN txpyaccts
  ON txpyaccts.taxpayerno = txpytaxid.taxpayerno
INNER JOIN reasacct
  ON reasacct.accountno = txpyaccts.accountno AND reasacct.controlno = 1234567
WHERE date(datepif) > '01/06/2009'
  AND date(datepif) <= '01/06/2010')
UNION
(SELECT billyr, billno, propacct, vinid, taxpaid, duedate, datepif, propdesc
FROM billspaid
INNER JOIN txpytaxid
  ON txpytaxid.custno = billspaid.custno2
INNER JOIN txpyaccts
  ON txpyaccts.taxpayerno = txpytaxid.taxpayerno
INNER JOIN reasacct
  ON reasacct.accountno = txpyaccts.accountno AND reasacct.controlno = 1234567
WHERE date(datepif) > '01/06/2009'
  AND date(datepif) <= '01/06/2010')
Marcus Adams
Your OR statement does nothing and you left out custno2, but this is right up the correct path.
NickLarsen
@NickLarsen, thanks. I noticed the problem and fixed it.
Marcus Adams
+1 for the `UNION`. -1 for implying that using an `INNER JOIN` will produce a different execution plan from `IN` with a derived table.
Aaronaught
@Aaronaught, if I was implying anything, it was that the OR was the performance problem. In fact, I came right out and said it.
Marcus Adams
+5  A: 

when you use a function on the column:

date(datepif) > '01/06/2009'
AND date(datepif) <= '01/06/2010'

an index will NOT be used. Try something like this

datepif > someconversionhere('01/06/2009')
AND datepif <= someconversionhere('01/06/2010')

Use inner joins too. There isn't any info in the question to indicate table size or if there is an index or not, so this is a guess and should work best if there are many more rows in billspaid for the date range vs rows that match the joining tables for r.controlno = 1234567, which I suspect is the case:

SELECT 
    COALESCE(b1.billyr,b2.billyr)           AS billyr
        ,COALESCE(b1.billno,b2.billno)      AS billno
        ,COALESCE(b1.propacct,b2.propacct)  AS propacct
        ,COALESCE(b1.vinid,b2.vinid)        AS vinid
        ,COALESCE(b1.taxpaid,b2.taxpaid)    AS taxpaid
        ,COALESCE(b1.duedate,b2.duedate)    AS duedate
        ,COALESCE(b1.datepif,b2.datepif)    AS datepif
        ,COALESCE(b1.propdesc,b2.propdesc)  AS propdesc
    FROM rtadba.reasacct                  r
        INNER JOIN cwdba.txpyaccts        a ON r.accountno=t.accountno
        INNER JOIN cwdba.txpytaxid        t ON a.taxpayerno=t.taxpayerno
        LEFT OUTER JOIN trcdba.billspaid b1 ON t.custno=b1.custno AND b1.datepif > someconversionhere('01/06/2009') AND b1.datepif <= someconversionhere('01/06/2010')
        LEFT OUTER JOIN trcdba.billspaid b2 ON t.custno2=b2.custno AND b2.datepif > someconversionhere('01/06/2009') AND b2.datepif <= someconversionhere('01/06/2010')
    WHERE r.controlno = 1234567
      AND COALESCE(b1.custno,b2.custno) IS NOT NULL

create an index for each of these:

rtadba.reasacct.controlno and cover on accountno
cwdba.txpyaccts.accountno and cover on taxpayerno
cwdba.txpytaxid.taxpayerno and cover on custno
trcdba.billspaid.custno +datepif
trcdba.billspaid.custno2 +datepif
KM
Good point about the function on the column.
Marcus Adams
And if you are having to use a function on date information to convert it to a date, then you need to store the information in a field with the correct datetime datatype. Any time you are having to convert data in order to use it, that is an indicator that your table design is bad.
HLGEM
Very nice answer, +1. Couldn't you move the condition `r.controlno = 1234567` onto the first inner join for just a hair more performance?
NickLarsen
moving the r.controlno = 1234567 condition onto a join has impact on mysql (if there's an index covering it). postgres (and some other databases I assume) would not need it, for example.
Unreason
+1 for indexes and sargability, which are almost definitely the real problems at work here.
Aaronaught
The primary table is trcdba.billspaid, not rtadba.reasacct. That is one of the tables that gets joined. The trcdba.billspaid table has the two fields custno and custno2. The cwdba.txpytaxid table has the field custno that gets compared to both the custno and custno2 fields in trcdba.billspaid.
A: 

Use EXISTS instead of IN ( unless the result set of the IN subquery is very small).

If you do UNION instead of OR ( which should be functionally equivalent ) use UNION ALL instead.

Lluis Martinez
I don't think he needs UNION ALL in this case since he's using an OR expression with custno and custno2.
Marcus Adams