views:

59

answers:

3

Sorry, I am pretty much an SQL noob. This has to work in MSFT SQL, Oracle as well as Sybase. In the following snippet I need to change an inner join between IJ and KL on IJ.PO_id = KL.PO_id into a left join also on IJ.PO_id = KL.PO_id. So, I believe I have to re-factor this. Well, implicit joins are not the most readable, at least in my co-worker's eyes. I guess I will agree until I develop my own taste. Sorry, I mangled the table and field names just in case.

/* @IJ_id is an input stored proc patrameter. */
from AB, 
     CD,
     EF,
     GH,
     IJ,
     KL
where
    EF.EF_id = IJ.EF_id and
    IJ.EF_id = AB.EF_id and
    EF.ZY_id = IJ.ZY_id and
    IJ.ZY_id = AB.ZY_id and
    IJ.IJ_id = AB.IJ_id and
    IJ.IJ_id = @IJ_id and
    EF.XW_id = GH.GH_id and
    AB.VU_code = CD.VU_code and
    IJ.TS > 0 and
    IJ.RQ = 0 and
    EF.RQ = 0 and
    AB.RQ = 0 and
    IJ.PO_id = KL.PO_id;

Now, my difficulty is that there is a lot going on in the where clause. Things that do not look like a.b = c.d will remain in the where clause, but not all stuff that does look like a.b = c.d look easy to convert into an explicit join. The difficult part is that ideally the conditions would be between neighbors - AB+CD, CD+EF, EF+GH, GH+IJ, IJ+KL but they are not that organized right now. I could re-order some, but ultimately I do not want to forget my goal: I want the new query to be no slower, and I want the new query to be no less readable. It seems that I might be better off hacking just the part that I need to change, and leave it mostly the same. I am not sure if I can do that.

If you understood my intent, please suggest a better query. if you did not, then please tell me how I can improve the question. Thanks.

+3  A: 

I think it should be something like this:

FROM AB
JOIN CD ON AB.VU_code = CD.VU_code
JOIN IJ ON IJ.EF_id = AB.EF_id AND IJ.ZY_id = AB.ZY_id AND IJ.IJ_id = AB.IJ_id
JOIN EF ON EF.EF_id = IJ.EF_id AND EF.ZY_id = IJ.ZY_id
JOIN GH ON EF.XW_id = GH.GH_id
JOIN KL ON IJ.PO_id = KL.PO_id
WHERE
    IJ.IJ_id = @IJ_id AND
    IJ.TS > 0 AND
    IJ.RQ = 0 AND
    EF.RQ = 0 AND
    AB.RQ = 0

I have tried to arrange the tables such that the following rules hold:

  • Every join condition mentions the new table that it joining on one side.
  • No table is mentioned in a join condition if that table has not been joined yet.
  • Conditions where one of the operands is a constant are left as a WHERE condition.

The last rule is a difficult one - it is not possible to tell from your mangled names whether a condition ought to be part of a join or part of the where clause. Both will give the same result for an INNER JOIN. Whether the condition should be part of the join or part of the where clause depends on the semantics of the relationship between the tables.

You need to consider each condition on a case-by-case basis:

  • Does it define the relationship between the two tables? Put it in the JOIN.
  • Is it a filter on the results? Put it in the WHERE clause.

Some guidelines:

  • A condition that includes a parameter from the user is unlikely to be something that should be moved to a join.
  • Inequalities are not usually found in join conditions.
Mark Byers
@Mark, from other examples I can see that indeed some conditions can be part of a join or part of a where clause. My naive understanding is that I would help the query optimizer do its job if I help it eliminate as many things as possible early on. This query is part of a "serious" stored procedure, which operates on somewhat large tables (from few hundred to few thousand records). In this case, if I have to chose between readability and performance, then I would go with performance. 75% of the users this will run on MSFT SQL, 20% of the time on oracle, and 5% of the time on Sybase if it helps
Hamish Grubijan
@Hamish Grubijan: The choice between WHERE or ON is not a question of performance - the performance ought to be the same. When you change it to LEFT JOIN you will get different results depending on whether you have the constant conditions inside the join or as part of the where clause. I don't think I can give you a rule that always works. You need to figure out which go where on a case-by-case basis.
Mark Byers
@Hamish Grubijan: When using ANSI-92 JOIN syntax, having criteria in the WHERE vs the JOIN can really impact the result returned. Criteria in the join (IE: IJ.RQ = 0) is run before the table is joined.
OMG Ponies
Mark, so as an exercise I started filling in an Excel sheet to help me make sure that the translation was kosher, but that quickly made me invent a notation. My next thought was to write a Python script, which then made me think - someone must have done this before. So, my question is - are you aware of a re-factoring tool for databases. We support Ora, Syb and MSFT SQL, but we are an MSFT shop and get most of their software via bulk subscription. We would look at other vendors as well, as MSFT is not always the best. I would like something like re-sharper for SQL code.
Hamish Grubijan
Actually, I just found http://stackoverflow.com/questions/2192655/is-there-a-tool-for-refactoring-sql-a-bit-like-a-resharper-for-sql
Hamish Grubijan
+1  A: 

It couldn't possibly get any less readable than the example you gave...

from AB a
join CD c on a.VU_Code = c.VU_Code
join EF e on a.EF_id = e.EF_id and e.RQ = 0
join GH g on e.XW_id = g.GH_id
join IJ i on a.IJ_id = i.IJ_id and e.EF_id = i.EF_id
         and a.EF_id = i.EF_id and e.ZY_id = i.ZY_id
         and a.ZY_id = i.ZY_id and i.TS > 0 and i.RQ = 0
LEFT join KL k on i.PO_id = k.PO_id
where 
    i.IJ_id = @IJ_id and 
    a.RQ = 0
Fosco
'It couldn't possibly get any less readable than the example you gave..." You took the words right out of my mouth.
HLGEM
lol you nearly wrote something less readable. Not quite though, good try nethertheless :P
Ash Burlaczenko
+1  A: 

Use:

 FROM AB t1
 JOIN CD t2 ON t2.VU_code = t1.VU_code
 JOIN GH t4 ON t4.gh_id = t3.xw_id
 JOIN IJ t5 ON t5.ZY_id = t1.ZY_id
           AND t5.IJ_id = t1.IJ_id
           AND t5.EF_id = t1.EF_id 
           AND t5.IJ_id = @IJ_id 
           AND t5.TS > 0 
           AND t5.RQ = 0 
 JOIN EF t3 ON t3.ef_id = t5.ef_id
           AND t3.zy_id = t5.zy_id
           AND t3.RQ = 0
 JOIN KL t6 ON t6.po_id = t5.po_id -- Add LEFT before JOIN for LEFT JOIN
WHERE ab.qu = 0

They're aliased in the sequence of the original ANSI-89 syntax, but the order is adjusted due to alias reference - can't reference a table alias before it's been defined.

This is ANSI-92 JOIN syntax - there's no performance benefit, but it does mean that OUTER join syntax is consistent. Just have to add LEFT before the "JOIN KL ..." to turn that into a LEFT JOIN.

OMG Ponies
Btw, one of them should be a left join. Are t1, t2, etc. aliases? if so, I am not sure if I will want to use them in production code. Sorry again, I would rather not reveal real names. Just curious - did you use t1 ... t6 to help you arrive at this?
Hamish Grubijan
@Hamish Grubijan: I was updating as you were commenting to address the LEFT JOIN. See update. Table aliases are your choice - use what you wish.
OMG Ponies