ansaurus

Question

Answer 1

+2 A:

LEFT JOIN
joinedTable jt ON jt.someCol = someCol AND jt.someCol = conditions AND @neededjoin ...
...

OR

LEFT JOIN
(
SELECT col1, someCol, col2 FROM joinedTable WHERE someCol = conditions AND @neededjoin ...
) jt ON jt.someCol = someCol
...

OR

;WITH jtCTE AS
(SELECT col1, someCol, col2 FROM joinedTable WHERE someCol = conditions AND @neededjoin ...)
SELECT
...
LEFT JOIN
jtCTE ON jtCTE.someCol = someCol
...

To be honest, there is no such construct as a conditional JOIN unless you use literals.

If it's in the SQL statement it's evaluated... so don't have it in the SQL statement by using dynamic SQL or IF ELSE

gbn 2010-09-24 18:44:38

@gbn - when I do that, it still includes all of the columns from the joinedTable, and the execution plan includes a HashMatch (Right Outer Join) which is quite expensive

Brett 2010-09-24 18:48:55

@Brett: see my update please.

gbn 2010-09-24 18:49:45

@gbn - I guess your last point makes the most sense, that if its in the SQL statement it is evaluated, so it will be evaluated even though it isn't being joined, and the query will take longer... hrm.

Brett 2010-09-24 18:58:00

Answer 2

+1 A:

the dynamic sql solution is usually the best for these situations, but if you really need to get away from that a series of if statments in a stroed porc will do the job. It's a pain and you have to write much more code but it will be faster than trying to make joins conditional in the statement itself.

HLGEM 2010-09-24 18:54:02

use d-sql. If you use parameterized queries (in SP or straight from proggy), modern RDBMS including the latest 2 versions of sql server express will cache all the permutations and their execution plans. PS how do you get 9? I only get 7.

FastAl 2010-09-24 19:06:46

@FastAl - Interesting, I didn't know that sql server will cache the permutations and their execution plans, that's pretty cool! So does that mean that I do not want to use `WITH RECOMPILE`?.. yeah, 7 is correct, my bad.

Brett 2010-09-24 19:12:02

Not sure how to do it; I just read a few years ago that it would try to cache any queries it could and that means parameter queries more likely, because the text of the query is the same. Then the permutations would get cached as you used them (SQL Srvr wouldn't know that they were related, but it wouldn't haveto). I would imagine that if you told it to recompile, it would, but I'm not too familiar with all the reasons why you would or wouldn't do that in a production program.

FastAl 2010-09-24 20:29:58

Answer 3

+1 A:

I would go for a simple and straightforward approach like this:

DECLARE @ret TABLE(...) ;

IF <coondition one> BEGIN ;
  INSERT INTO @ret() SELECT ...
END ;

IF <coondition two> BEGIN ;
  INSERT INTO @ret() SELECT ...
END ;

IF <coondition three> BEGIN ;
  INSERT INTO @ret() SELECT ...
END ;

SELECT DISTINCT ... FROM @ret ;

Edit: I am suggesting a table variable, not a temporary table, so that the procedure will not recompile every time it runs. Generally speaking, three simpler inserts have a better chance of getting better execution plans than one big huge monster query combining all three.

However, we can not guess-timate performance. we must benchmark to determine it. Yet simpler code chunks are better for readability and maintainability.

AlexKuznetsov 2010-09-24 19:12:51

@AlexKuznetsov - My current (dynamic) query builds a common table expression, do you think that using a temp table could provide me the performance that I need? P.S. that is a good idea, I hadn't though about doing conditional inserts of the inner joined data...

Brett 2010-09-24 19:32:05

@AlexKuznetsov- I certainly agree about the readability and maintainability, that's one reason that I wanted to move away from the dynamic query that I have, because of the sheer length of the query. I'll benchmark yours, thanks for the help!

Brett 2010-09-24 20:31:12

Answer 4

A:

The dynamic SQL solution is best in most respects; you are trying to run different queries with different numbers of joins without rewriting the query to do different numbers of joins - and that doesn't work very well in terms of performance.

When I was doing this sort of stuff an æon or so ago (say the early 90s), the language I used was I4GL and the queries were built using its CONSTRUCT statement. This was used to generate part of a WHERE clause, so (based on the user input), the filter criteria it generated might look like:

a.column1 BETWEEN 1 AND 50 AND
b.column2 = 'ABCD' AND
c.column3 > 10

In those days, we didn't have the modern JOIN notations; I'm going to have to improvise a bit as we go. Typically there is a core table (or a set of core tables) that are always part of the query; there are also some tables that are optionally part of the query. In the example above, I assume that 'c' is the alias for the main table. The way the code worked would be:

Note that table 'a' was referenced in the query:
- Add 'FullTableName AS a' to the FROM clause
- Add a join condition 'AND a.join1 = c.join1' to the WHERE clause
Note that table 'b' was referenced...
- Add bits to the FROM clause and WHERE clause.
Assemble the SELECT statement from the select-list (usually fixed), the FROM clause and the WHERE clause (occasionally with decorations such as GROUP BY, HAVING or ORDER BY too).

The same basic technique should be applied here - but the details are slightly different.

First of all, you don't have the string to analyze; you know from other circumstances which tables you need to add to your query. So, you still need to design things so that they can be assembled, but...

The SELECT clause with its select-list is probably fixed. It will identify the tables that must be present in the query because values are pulled from those tables.
The FROM clause will probably consist of a series of joins.
- One part will be the core query:
```
FROM CoreTable1 AS C1
JOIN CoreTable2 AS C2
     ON C1.JoinColumn = C2.JoinColumn
JOIN CoreTable3 AS M
     ON M.PrimaryKey = C1.ForeignKey
```
- Other tables can be added as necessary:
```
JOIN AuxilliaryTable1 AS A
     ON M.ForeignKey1 = A.PrimaryKey
```
- Or you can specify a full query:
```
JOIN (SELECT RelevantColumn1, RelevantColumn2
        FROM AuxilliaryTable1
       WHERE Column1 BETWEEN 1 AND 50) AS A
```
- In the first case, you have to remember to add the WHERE criterion to the main WHERE clause, and trust the DBMS Optimizer to move the condition into the JOIN table as shown. A good optimizer will do that automatically; a poor one might not. Use query plans to help you determine how able your DBMS is.
Add the WHERE clause for any inter-table criteria not covered in the joining operations, and any filter criteria based on the core tables. Note that I'm thinking primarily in terms of extra criteria (AND operations) rather than alternative criteria (OR operations), but you can deal with OR too as long as you are careful to parenthesize the expressions sufficiently.
Occasionally, you may have to add a couple of JOIN conditions to connect a table to the core of the query - that is not dreadfully unusual.
Add any GROUP BY, HAVING or ORDER BY clauses (or limits, or any other decorations).

Note that you need a good understanding of the database schema and the join conditions. Basically, this is coding in your programming language the way you have to think about constructing the query. As long as you understand this and your schema, there aren't any insuperable problems.

Good luck...

Jonathan Leffler 2010-09-24 19:40:39

Answer 5

+1 A:

Try this:

LEFT JOIN joinedTable jt
   ON jt.someCol = someCol
   AND jt.someCol = conditions
   AND @neededJoin = 1 -- or whatever indicates join is needed

I think you'll find it is good performance and does what you need.

Update

If this doesn't give the performance I claimed, then perhaps that's because the last time I did this using joins to a table. The value I needed could come from one of 3 tables, based on 2 columns, so I built a 'join-map' table like so:

Col1  Col2 TableCode
  1     2    A
  1     4    A
  1     3    B
  1     5    B
  2     2    C
  2     5    C
  1     11   C

Then,

SELECT
   V.*,
   LookedUpValue =
      CASE M.TableCode
      WHEN 'A' THEN A.Value
      WHEN 'B' THEN B.Value
      WHEN 'C' THEN C.Value
      END
FROM
    ValueMaster V
    INNER JOIN JoinMap M ON V.Col1 = M.oOl1 AND V.Col2 = M.Col2
    LEFT JOIN TableA A ON M.TableCode = 'A'
    LEFT JOIN TableB B ON M.TableCode = 'B'
    LEFT JOIN TableC C ON M.TableCode = 'C'

This gave me a huge performance improvement querying these tables (most of them dozens or hundreds of million-row tables).

This is why I'm asking if you actually get improved performance. Of course it's going to throw a join into the execution plan and assign it some cost, but overall it's going to do a lot less work than some plan that just indiscriminately joins all 3 tables and then Coalesce()s to find the right value.

If you find that compared to dynamic SQL it's only 5% more expensive to do the joins this way, but with the indiscriminate joins is 100% more expensive, it might be worth it to you to do this because of the correctness, clarity, and simplicity over dynamic SQL, all of which are probably more valuable than a small improvement (depending on what you're doing, of course).

Whether the cost scales with the number of rows is also another factor to consider. If even with a huge amount of data you only save 200ms of CPU on a query that isn't run dozens of times a second, it's a no-brainer to use it.

The reason I keep hammering on the fact that I think it's going to perform well is that even with a hash match, it wouldn't have any rows to probe with, or it wouldn't have any rows to create a hash of. The hash operation is going to stop a lot earlier compared to using the WHERE clause OR-style query of your initial post.

Emtucifor 2010-09-24 20:02:33

@Emtucifor - Unfortunately, with this method, the join is still being evaluated, despite the fact that it does not affect the output. This ends up hampering performance significantly.

Brett 2010-09-24 20:29:51

Brett, did you actually try it? It's **markedly** better than the performance of the query you gave. Please, if only to shut me up :) would you post performance results? Don't go by execution plan. Go by CPU and Reads with Query Profiler or `SET STATISTICS IO ON`.

Emtucifor 2010-09-24 20:35:57

@Emtucifor - Yeah, I did try doing that as well, but having that join in there can't compare to the performance that I'm getting if I don't have the join (doing things dynamically)

Brett 2010-09-24 20:37:18

Can't compare or didn't compare? What kind of performance gains are we talking about, here? Is it a marginal difference or huge difference? Numbers, please!!!!

Emtucifor 2010-09-24 20:38:01

@Emtucifor- It ends up having a "Hash Match (Right Outer Join)" in the execution plan, which makes a significant impact, I was looking at the execution plan, and the duration in the profiler... give me a minute on numbers

Brett 2010-09-24 20:41:09

Just because there is a hash match in there doesn't mean it really takes up much time. If you do very much performance tuning you will eventually learn that you can't trust execution plan costs as very reliable. Only CPU and Reads are reliable. Also, make sure you **don't** have the execution plan showing at the same time as doing the statistics because this will artificially inflate the statistics.

Emtucifor 2010-09-24 20:44:51

@Emtucifor - So, as I was saying, it takes a bit longer because of the join. I created a much simpler version of my query, but when I have the LEFT JOIN, the CPU is 750, and without the LEFT JOIN the CPU is 546, so while in this case the time addition is not noticeable, in my full query it is a significant hit, I do appreciate all of your comments and thoughts!\

Brett 2010-09-24 20:49:23

So is that 200 ms of CPU time scalable with the number of rows or is it a fixed cost? If it's a fixed cost because of the join, but doesn't scale with the number of rows, then it would be a really good trade-off to avoid dynamic SQL. Please test with more data.

Emtucifor 2010-09-24 20:53:44

@Emtucifor- I've gotta take off for today, but I'll post my result here when I get it, I need to do some further testing and modify my query so that I rule out a few other factors. Thanks and I'll get back to you :)

Brett 2010-09-24 21:10:11

@Emtucifor - Thanks for your persistence in this matter. As it turns out, after some further testing: my new query not using dynamic SQL is faster than the dynamic query. I guess that the scale of the simple query I wrote to 'test' the idea of always left joining was not scalable to the full query I am running. Again thanks for your persistence, I appreciate it!

Brett 2010-09-27 12:01:00

You're welcome. Which version did you settle on? I'm sure we'd all appreciate a performance stats table of the various queries you tried.

Emtucifor 2010-09-27 16:22:44

Answer 6

A:

Just because no one else mentioned this, here's something that you could use (not dynamic). If the syntax looks weird, it's because I tested it in Oracle.

Basically, you turn your joined tables into sub-selects that have a where clause that returns nothing if your condition does not match. If the condition does match, then the sub-select returns data for that table. The Case statement lets you pick which column is returned in the overall select.

with m as (select 1 Num, 'One' Txt from dual union select 2, 'Two' from dual union select 3, 'Three' from dual),
t1 as (select 1 Num from dual union select 11 from dual),
t2 as (select 2 Num from dual union select 22 from dual),
t3 as (select 3 Num from dual union select 33 from dual)
SELECT m.*
      ,CASE 1
         WHEN 1 THEN
          t1.Num
         WHEN 2 THEN
          t2.Num
         WHEN 3 THEN
          t3.Num
       END SelectedNum
  FROM m
  LEFT JOIN (SELECT * FROM t1 WHERE 1 = 1) t1 ON m.Num = t1.Num
  LEFT JOIN (SELECT * FROM t2 WHERE 1 = 2) t2 ON m.Num = t2.Num
  LEFT JOIN (SELECT * FROM t3 WHERE 1 = 3) t3 ON m.Num = t3.Num

John Fisher 2010-09-24 20:46:32

Actually, John, that is the same essence as what I (originally) posted.

Emtucifor 2010-09-24 20:52:59

And—actually, again—it looks like @gbn also beat me to the punch. Somehow I missed that what I posted was essentially the same as his. The laugh's on me.

Emtucifor 2010-09-24 20:57:51

Emtucifor: Actually, this is different enough from yours that it could significantly change the query execution plan.

John Fisher 2010-09-24 21:19:24

Hmmm. I'm looking more closely. How does `WHERE 1 = 2` do anything useful? I think you need to expand your example to show which rows are injected and which are the simulated 'real table' rows.

Emtucifor 2010-09-24 21:21:15

@Emtucifor: The "WHERE 1 = 2" replaces the unspecified condition from the question being answered. Brett would substitute his real condition there.

John Fisher 2010-09-24 22:39:51

ansaurus

tags:

views:

answers:

I Need some sort of Conditional Join.

related questions