views:

132

answers:

5

I have a relation between two tables with 600K rows and my first question is, is that a lot of data? It doesn't seem like a lot (in terms of rows, not bytes)

I can write a query like this

SELECT EntityID, COUNT(*)
FROM QueryMembership
GROUP BY EntityID

And it completes in now time at all, but when I do this.

SELECT EntityID, COUNT(*)
FROM QueryMembership
WHERE PersonID IN (SELECT PersonID FROM GetAcess(1))
GROUP BY EntityID

The thing takes 3-4 seconds to complete, despite just returning about 183 rows. SELECT * FROM QueryMembership takes about 12-13 seconds.

What I don't understand is how a filter like this would take so long, as soon as I introduce this table value function. The function it self doesn't take any time at all to return it's result and no matter if I write it as a CTE or some bizarre sub query the result is the same.

However, if it defer the filter, by inserting the result of the first select into a temporary table #temp then using the GetAccess UDF the entire thing goes about three times as fast.

I would really like some in-depth technical help on this matter. Where I should start look, and how I can analyze the execution plan to figure out what's going on.

A: 
SELECT EntityID, COUNT(*)
FROM QueryMembership
WHERE PersonID IN (SELECT PersonID FROM GetAcess(1))
GROUP BY EntityID

The embedded subquery is expensive. as you said using a temporary table is perfect alternative solution.

Henry Gao
A: 

With the execution plan you need to look out for Table Scans. These are bad, and can normally point to where a lot of your time is being wasted.

Its also an area where you can target indexes to improve performance. You should attach an image of your execution plan, it will help us to point where you can save time. Though I get the impression that subquery is the main cause.

kevchadders
+1  A: 

600k rows is not a particularly large amount. However, you are getting to the point where server configuration (disks, non-SQL load, etc) matters, so if your server wasn't carefully put together you should look at that now rather than later.

Analyzing execution plans is one of those things that you tend to pick up over time. The book "Inside SQL Server" is (was?) pretty nice for learning how things work internally, which helps guide you a bit as you're optimzing.

I would personally try rewriting the above query as a join, IN often doesn't perform as well as you might hope. Something like:

SELECT 
  EntityID, 
  COUNT(*)
FROM 
  QueryMembership q
  join GetAccess(1) a on a.PersonID = q.PersonID
GROUP BY 
  EntityID
Donnie
Unfortunately and surprisingly so the JOIN is just as slow as everything else. The only thing which is faster is the temporary table approach and I don't know why.
John Leidegren
+2  A: 

There's an excellent series of posts on execution plans and how to read and interpret them - and a totally free e-book on the topic as well! - on the excellent Simple-Talk site.

Check them out - well worth the time!

marc_s
A: 

I suspect that the reasons for your slowdown may be similar to those in this quesiton:

http://stackoverflow.com/questions/1673235/how-to-structure-an-index-for-group-by-in-sql-server/1783690#1783690

An execution plan will answer the question as to why the second query is slower, however I suspect it will be because SQL server can use indexes to look up aggregate functions (such as COUNT and MAX) using relatively inexpensive operations on some index.

If you combine a filter and a group however, SQL server can no longer use this trick and is forced to evaluate the value of COUNT or MAX based on the filtered result set, leading to expensive lookups.

600k rows is a fairly reasonable / small table size, however its big enough so that things like table scans or RDI lookups against large portions of the table will start becoming expensive.

I'd be interested to see the execution plan to understand whats going on.

Kragen