views:

120

answers:

5

We have a table value function that returns a list of people you may access, and we have a relation between a search and a person called search result.

What we want to do is that wan't to select all people from the search and present them.

The query looks like this

SELECT qm.PersonID, p.FullName 
FROM QueryMembership qm
INNER JOIN dbo.GetPersonAccess(1) ON GetPersonAccess.PersonID = qm.PersonID
INNER JOIN Person p ON p.PersonID = qm.PersonID
WHERE qm.QueryID = 1234

There are only 25 rows with QueryID=1234 but there are almost 5 million rows total in the QueryMembership table. The person table has about 40K people in it.

QueryID is not a PK, but it is an index. The query plan tells me 97% of the total cost is spent doing "Key Lookup" witht the seek predicate.

QueryMembershipID = Scalar Operator (QueryMembership.QueryMembershipID as QM.QueryMembershipID)

Why is the PK in there when it's not used in the query at all? and why is it taking so long time?

The number of people total 25, with the index, this should be a table scan for all the QueryMembership rows that have QueryID=1234 and then a JOIN on the 25 people that exists in the table value function. Which btw only have to be evaluated once and completes in less than 1 second.

+2  A: 

You should define indexes on the tables you query. In particular on columns referenced in the WHERE and ORDER BY clauses.

Use the Database Tuning Advisor to see what SQL Server recommends.

Oded
The DTA does a pretty good job, but it doesn't always get it right.
Mitch Wheat
A: 

You need indexes on your WHERE and ORDER BY clauses. I am not an expert but I would bet it is doing a table scan for each row. Since your speed issue is resolved by Removing the INNER JOIN or the ORDER BY I bet the issue is specifically with the join. I bet it is doing the table scan on your joined table because of the sort. By putting an index on the columns in your WHERE clause first you will be able to see if that is in fact the case.

RandomBen
+2  A: 

For specifics, of course you would need to post your query and table design.

But I have to make a couple of points here:

  • You've already jumped to the conclusion that the slowness is a result of the ORDER BY clause. I doubt it. The real test is whether or not removing the ORDER BY speeds up the query, which you haven't done. Dollars to donuts, it won't make a difference.

  • You only get the "log n" in your big-O claim when the optimizer actually chooses to use the index you defined. That may not be happening because your index may not be selective enough. The thing that makes your temp table solution faster than the optimizer's solution is that you know something about the subset of data being returned that the optimizer does not (specifically, that it is a really small subset of data). If your indexes are not selective enough for your query, the optimizer can't always reasonably assume this, and it will choose a plan that avoids what it thinks could be a worst-case scenario of tons of index lookups, followed by tons of seeks and then a big sort. Oftentimes, it chooses to scan and hash instead. So what you did with the temp table is often a way to solve this problem. Often you can narrow down your indexes or create an indexed view on the subset of data you want to work against. It all depends on the specifics of your wuery.

Dave Markle
I've clearly put it in the question, that it's either the INNER JOIN against the table value function or the ORDER BY which makes the query come to an abrupt halt. I've tried things, I wouldn't be asking if there was something I hadn't tried, that I knew of. The second point you make is interesting, I'll be back with a execution plan and actual specifics on the data.
John Leidegren
sorry about that. How are you creating the TVF exactly? Are you doing it inline, or are you doing it by using returning a table variable that you generate? How many rows to you expect the function to return in the normal case? The worst case?
Dave Markle
I've must have been drunk. That table is almost 5 million rows. The result takes about 3 minutes to compute and it's a mere 25 rows that gets returned. I'm gonna rephrase the question a bit...
John Leidegren
A: 

Have you tried restructuring the query into a CTE to separate the TVF call? So, something like:

With QueryMembershipPerson
    (
    Select QM.PersonId, P.Fullname
    From QueryMembership As qm
        Join Person As P
            On P.PersonId = QM.PersonId
    Where QM.QueryId = 1234
    )
Select PersonId, Fullname
From QueryMembershipPerson As QMP
    Join dbo.GetPersonAccess(1) As PA
        On PA.PersonId = QMP.PersonId

EDIT: Btw, I'm assuming that there is an index on PersonId in both the QueryMembership and the Person table.

EDIT What about two table expressions like so:

With 
    QueryMembershipPerson As
    (
    Select QM.PersonId, P.Fullname
    From QueryMembership As qm
        Join Person As P
            On P.PersonId = QM.PersonId
    Where QM.QueryId = 1234
    )
    , With PersonAccess As
    (
    Select PersonId
    From dbo.GetPersonAccess(1) 
    )
Select PersonId, Fullname
From QueryMembershipPerson As QMP
    Join PersonAccess As PA
        On PA.PersonId = QMP.PersonId

Yet another solution would be a derived table like so:

Select ...
From  (
        Select QM.PersonId, P.Fullname
        From QueryMembership As qm
            Join Person As P
                On P.PersonId = QM.PersonId
        Where QM.QueryId = 1234
        ) As QueryMembershipPerson
    Join dbo.GetPersonAccess(1)  As PA
        On PA.PersonId = QueryMembershipPerson.PersonId

If pushing some of the query into a temp table and then joining on that works, I'd be surprised that you couldn't combine that concept into a CTE or a query with a derived table.

Thomas
yes, it doesn't make any difference. the temporary table approach however does. not doing the order by stuff in the first query, inserting into a table varible, and the returning that but with a sort. this is immensily faster. I'm not sure about the index on PersonID in QueryMembership, I'll look into that.
John Leidegren
So, are you saying that using a temp table for the inner query against QueryMembership works fast but using a CTE in the same fashion does not? That's definitely odd. I wonder if two CTEs would do it (I adjusted my post so you can see what I mean).
Thomas
Yeah, the query plan ends up the same. The CTEs are more readable but don't really change how things get executed. It's just an alternate syntax for sub queries, albeit a really nice and useful one.
John Leidegren
It is extremely rare that restructuring a query this way will actually change the plan, unless it involves a non-sargable expression being made sargable (not the case here). This particular problem is 100% due to inadequate indexing.
Aaronaught
Btw, I don't see on what you are trying to Order. Was it removed from the OP?
Thomas
@Aaronaught - If reorganizing the query in subqueries would not change the execution plan, I guess I'm a bit surprised that the overall execution plan would be different when using a temp table (i.e. including the cost of creating and populating). In effect you are forcing the system execute one part of the query before it operates on another part.
Thomas
A temp table does, of course, change the execution plan, because any subsequent statements are operating on an entirely different set of data, with different indexes, for which the optimizer has different statistics; in some cases the whole table may even be cached in memory. But - CTEs, subqueries, joins, etc. - it's all just compiled down to a set of expressions and operations which the optimizer rearranges as it pleases, unless you use `FORCE ORDER` or write non-sargable expressions (not recommended!).
Aaronaught
Incidentally, I'm not arguing against the use of CTEs, they're very valuable for cleaning up queries; just, in this case, the OP's update makes it pretty clear that this is an indexing issue (key lookups only have one root cause).
Aaronaught
@Aaronaught. Understand. Incidentally, I replicated the setup mentioned in the OP with 5mil QM rows and 40K person rows with the TFV returning about 4k rows. With only a std index on QueryId and an index on PersonId in QM, the key lookup to the Clus Idx only takes 32% of the query instead of 90%. I'm wondering if the OP's table statistics are stale.
Thomas
That's a definite possibility, although you should also keep in mind that the cost of the bookmark look up will depend on how physically far apart the rows are for a given `PersonId`. If they're clustered, it's mostly sequential I/O, hence rather cheap; if they're sparse, it turns into random access and hence very expensive. If other design constraints don't prevent the creation of a covering index, it's best to just eliminate the bookmark lookup altogether.
Aaronaught
+2  A: 

if you want to avoid "key lookup", use covered index

create index ix_QueryMembership_NameHere on QueryMembership (QueryID)
include (PersonID);

add more column names, that you gonna select in include arguments.

for the point that, why PK's "key lookup" working so slow, try DBCC FREEPROCCACHE, ALTER INDEX ALL ON QueryMembership REBUILD, ALTER INDEX ALL ON QueryMembership REORGANIZE

This may help if your PK's index is disabled, or cache keeps wrong plan.

chaowman
So what's the deal here, this sounds interesting, what's the nuts and bolts of this rebuild and reorganize stuff? what do you mean by include arguments?
John Leidegren
This is the right answer. A key lookup is a nested loop - row-by-row retrieval - and is always the result of an index that doesn't fully cover the query. @John Leidegren, don't worry about the rebuild/reorganize yet, the `INCLUDE` is the important part. Whichever index is being searched for the `WHERE` needs to `INCLUDE` **all** columns used in the output - specifically `PersonID` in your original query. Otherwise the index does not cover, and you end up with either key lookups or table scans, depending on what the optimizer decides to do.
Aaronaught
This is making a lot of sense, I'm gonna try this out first chance I get. Do you have any good documentation which uses the same terminology? Is there any difference between (QueryID) INCLUDE (PersonID) and (PersonID) INCLUDE (QueryID), to put it in math-lingo, is it commutative?
John Leidegren
There are not commutative. The columns used in as keys (i.e. outside the include) are the ones involved in the index search. The INCLUDE columns are simply additional columns stored with the key so that the system does not have to search for them again. Thus, the second version you listed would not actually create any sort of improvement when filtering on the QueryId column.
Thomas
@John L: Very roughly speaking, when you `INCLUDE` a column in an index, it means that the DB engine does not actually have to read the row and thus perform a row lookup (i.e. key/bookmark/RID lookup) in order to get the value. Imagine something like an index of game or movie reviews, and along with the title and page number they also list the rating; it means you don't actually have to flip to the review and read it just to find out the rating. But if you want any other info, you still have to flip to the page. That's what `INCLUDE` does, it gives quick access to limited information.
Aaronaught
That's the stuff, works like a charm. Thanks! Interesting enough Management Studio didn't even allow me to set included columns. Does this in anyway relate to clustered and non-clustered indexes?
John Leidegren
I got another question, since this is a 1 to many relation what happens? There's exactly one value for QueryID but many PersonID, does it store some subset of those person ids together with the QueryID?
John Leidegren