views:

300

answers:

6

Hi,

I got a query with five joins on some rather large tables (largest table is 10 mil. records), and I want to know if rows exists. So far I've done this to check if rows exists:

SELECT TOP 1 tbl.Id
FROM table tbl
INNER JOIN ... ON ... = ... (x5)
WHERE tbl.xxx = ...

Using this query, in a stored procedure takes 22 seconds and I would like it to be close to "instant". Is this even possible? What can I do to speed it up?

I got indexes on the fields that I'm joining on and the fields in the WHERE clause.

Any ideas?

+1  A: 

Doing a filter early on your first select will help if you can do it; as you filter the data in the first instance all the joins will join on reduced data.

Select top 1 tbl.id
From 
(
Select top 1 * from 
table tbl1
Where Key = Key
) tbl1
inner join ...

After that you will likely need to provide more of the query to understand how it works.

u07ch
+3  A: 

switch to EXISTS predicate. In general I have found it to be faster than selecting top 1 etc.

So you could write like this IF EXISTS (SELECT * FROM table tbl INNER JOIN table tbl2 .. do your stuff

no_one
I think using top implies a sequence (even if you don't specify one), so you will still examine the entire set. Exists can exit as soon as it finds a single match.
Alex Peck
EXISTS returns as soon as it finds a matching row as far as I know.
no_one
Using EXISTS takes about the same time
Tommy Jakobsen
This is still a big join with an exists afterwards: it's not that different to TOP 1.
gbn
+3  A: 

Depending on your RDBMS you can check what parts of the query are taking a long time and which indexes are being used (so you can know they're being used properly).

In MSSQL, you can use see a diagram of the execution path of any query you submit.

In Oracle and MySQL you can use the EXPLAIN keyword to get details about how the query is working.

But it might just be that 22 seconds is the best you can do with your query. We can't answer that, only the execution details provided by your RDBMS can. If you tell us which RDBMS you're using we can tell you how to find the information you need to see what the bottleneck is.

Welbog
Sounds interesting. I'm using MSSQL 2005.
Tommy Jakobsen
Open up SQL Server Management Studio, then open up a new query. Enter your query, click on Query in the menu, then "Include Actual Execution Plan". Run your query, and one of the tabs at the bottom will be a diagram of every step executed by the server to give you the result of your query, along with what tables are used and how long every step took. Use it to find which parts take the longest times and see if there are any changes or additional indexes you might be able to add to speed it up.
Welbog
I got a Hash Match Inner Join where the cost is 25 % and a Clustered Index Scan where the cost is 63 %. Both related to a single table. What can I do about that?
Tommy Jakobsen
A clustered index scan means the query is looking at every entry in the index. You should try using a WHERE clause to turn that into an index **seek** instead. As for the hash match, that's the best inner join type you can get, so if it's slow all you can do is try to reduce the number of records being joined.
Welbog
I've just used the DTA as suggested by @gbn and it turns out that my indexes could be much better. I now know why, and I've actually learned alot from this. Also big thanks to you @Welbog for "learning" me to see the execution plan :-) It also helped me alot. My query is now down to ~1 sec.
Tommy Jakobsen
Wish I could choose two answers. You got a vote up.
Tommy Jakobsen
A: 

Maybe you could offload/cache this fact-finding mission. Like if it doesn't need to be done dynamically or at runtime, just cache the result into a much smaller table and then query that. Also, make sure all the tables you're querying to have the appropriate clustered index. Granted you may be using these tables for other types of queries, but for the absolute fastest way to go, you can tune all your clustered indexes for this one query.

Edit: Yes, what other people said. Measure, measure, measure! Your query plan estimate can show you what your bottleneck is.

Mark Canlas
+1  A: 

4 options

  • Try COUNT(*) in place of TOP 1 tbl.id

  • An index per column may not be good enough: you may need to use composite indexes

Are you on SQL Server 2005? If som, you can find missing indexes. Or try the database tuning advisor

  • Also, it's possible that you don't need 5 joins.

Assuming parent-child-grandchild etc, then grandchild rows can't exist without the parent rows (assuming you have foreign keys)

So your query could become

SELECT TOP 1
   tbl.Id --or count(*)
FROM
   grandchildtable tbl
   INNER JOIN
   anothertable ON ... = ...
WHERE
   tbl.xxx = ...
  • Try EXISTS.

For either for 5 tables or for assumed heirarchy

SELECT TOP 1 --or count(*)
   tbl.Id
FROM
   grandchildtable tbl
WHERE
   tbl.xxx = ...
   AND
   EXISTS (SELECT *
       FROM
           anothertable T2
       WHERE
           tbl.key = T2.key /* AND T2 condition*/)
-- or
SELECT TOP 1 --or count(*)
   tbl.Id
FROM
   mytable tbl
WHERE
   tbl.xxx = ...
   AND
   EXISTS (SELECT *
       FROM
           anothertable T2
       WHERE
           tbl.key = T2.key /* AND T2 condition*/)
   AND
   EXISTS (SELECT *
       FROM
           yetanothertable T3
       WHERE
           tbl.key = T3.key /* AND T3 condition*/)
gbn
How do I find missing indexes?
Tommy Jakobsen
Nevermind. Using the DTA :-)
Tommy Jakobsen
gbn
DTA helped me alot. Thanks @gbn!
Tommy Jakobsen
A: 

Use the maximun row table first in every join and if more than one condition use in where then sequence of the where is condition is important use the condition which give you maximum rows.

use filters very carefully for optimizing Query.

KuldipMCA