views:

117

answers:

4

i have this table of comments (little over 1 milion rows) that gets around 10.000 inserts and around 100.000 queries against it every day, and minor deletions and updates. the query that gets the comments causes performance issues that sometimes it locks up entire database and i am getting a lot of timeouts. please help me adjust my indexes and anything else so that it performs better. below i included the info about it, if you need more please ask. i rebuild all indexes daily and run a sql server 2008 web edition on a 2008 server.

thank you :)

structure:

id (int, identity)
profile_id (int)
owner_id (int)
added_date (datetime)
comments varchar(4000)
logical_delete (datetime)

indexes:

id (PK, clustered)
profile_id (70% fill)
owner_id (70% fill)
added_date (70% fill)
profile_id + logical_delete (70%)

query:

    select 
        c.id, c.owner_id, c.comments, c.is_public, c.added_date, 
        u.first_name, u.last_name, c.profile_id
    from [profile_comment] c with(nolock) 
    inner join [user] u with(nolock) on u.id = c.owner_id 
    where c.profile_id = @profile_id and c.logical_delete is null
    order by c.added_date desc 

execution plan:

  |--Nested Loops(Inner Join, OUTER REFERENCES:([c].[owner_id], [Expr1005]) WITH ORDERED PREFETCH)
       |--Sort(ORDER BY:([c].[added_date] DESC)) **[5%]**
       |    |--Nested Loops(Inner Join, OUTER REFERENCES:([c].[id], [Expr1004]) WITH UNORDERED PREFETCH) **[0%]** 
       |         |--Index Seek(OBJECT:([DB].[dbo].[profile_comment].[IX_profile_comment_combined1] AS [c]), SEEK:([c].[profile_id]=(1) AND [c].[logical_delete]=NULL) ORDERED FORWARD) **[1%]**
       |         |--Clustered Index Seek(OBJECT:([JakLeci].[dbo].[profile_comment].[PK__profile_comment__primary] AS [c]), SEEK:([c].[id]=[JakLeci].[dbo].[profile_comment].[id] as [c].[id]) LOOKUP ORDERED FORWARD) **[47%]**
       |--Clustered Index Seek(OBJECT:([DB].[dbo].[user].[PK__user__id] AS [u]), SEEK:([u].[id]=[DB].[dbo].[profile_comment].[owner_id] as [c].[owner_id]) ORDERED FORWARD)  **[47%]**
A: 

Why don't you run your query in management studio and have it show you the actual execution plan, to try and see where the problem is?

Without that, the only thing that I could suggest is NOT making your primary, identity key the clustered index, unless you regularly query on that. You could try adding an index on profile_id and added_date (or something else to make it unique) and make that the clustered index, since you are querying by profile_id and you are probably rarely querying on an identity field.

I'd need to actually have access to the database to see what is actually going on (doing what I recommended at the top) to figure out where the true culprit lies. It could be on the join, but I highly doubt it, since I'm guessing the id column in the user table is the primary, clustered index as well.

Charles Boyung
be careful though with making your clustered index much larger - it will bloat your non-clustered indices! It's best to keep the Clustered Index on a single and narrow column, e.g. a INT
marc_s
execution plan included along with the expense %s
@marc_s but based on what we know about this table, and what can be inferred based on what it is used for - most of the other indicies are probably not even needed. Your clustered index should be set to the most commonly used field or fields for querying, which will almost never be a single identity field.
Charles Boyung
A: 

you'll get the best performance with a clustered index over the search columns:

(profile_id, is_deleted, owner_id, added_date desc)

the drawback is that looking up a row by just by id will need to use a slower non-clustered index because you can only have one clustered index.

if that's not an option, the next best thing is to create a multi-column covering index including:

(profile_id, is_deleted, owner_id, added_date desc, id, comments, is_public)

because comments is rather large, you can include comments as a non-key column using the INCLUDE syntax:

create index idx on c (profile_id, is_deleted, owner_id, added_date desc, id,
  is_public) include (comments)

comments won't be indexed, but you'll be able to retrieve it quickly when the other keys are matched.

more details here:

http://www.sql-server-performance.com/tips/covering_indexes_p1.aspx

and of course u.id should be the primary key in users.

best to make is_deleted non-null (tinyint defaulted to 0).

with NOLOCK / READUNCOMMITTED, the select will not create any locks or block.

jspcal
making your clustering key this wide will massively bloat your non-clustered indices! Remember: the column(s) that make up the clustered key are included in each and every entry of each and every non-clustered index! If you have a few, plus several hundred thousand rows, that can add you very quickly to a massive waste of space!
marc_s
@jspcal: I'm well aware what a covering and a clustered index is - but you mention (just below the code snippet) "it would be even faster if this were your clustered index" - that statement is very dangerous, in my opinion, for the reasons given. Also, since you need to include a potentially 6000-byte "comments" field, I would dare to question whether creating a covering index for this (and thus basically duplicating the whole data set) is really going to be faster in the end....
marc_s
clustered index doesn't need to include the 6000-byte comments field, only the search fields
jspcal
yes, but even so: instead of 4 bytes for the ID, you get 3 INT (12 bytes), 2 dates (16 bytes) and a boolean (1 byte) - a total of 29 bytes instead of 4. Multiply that by over a million rows, and 4 or 5 non-clustered indices - and you're looking at a MASSIVE waste of space - not just on disk, but also in SQL Server RAM. Not a good thing, trust me....
marc_s
it's not an issue in this scenario: wider clust. idx (17 byte key) would require only around 81 MB vs. 19 MB in a table that might already take up several GB's (1 million rows, 5 nc indexes) - a negligible index size in exchange for much higher query speed. tho i agree w/you, it could be an issue if you're using enormous 900-bytes keys or have hundreds of indexes for instance
jspcal
A: 

A clustered index on (profile_id, added_date DESC) should do the trick. That'll give you a quick lookup by profile_id, already sorted by added_date. The only remaining operations would be filtering on logical_delete and a loop join on user (which should be clustered on user_id).

Depending on the number of rows returned though, you could still be reading quite a bit off disk. Your comments column is pretty wide. You may want to consider limiting the number of rows returned by added_date (or a TOP), or caching the results.

I can't imagine this is causing high CPU usage, and you're using NOLOCK so you shouldn't be blocking other queries. If this really is the cause of your timeouts, then it must be I/O. You may want to check out memory usage and the disk subsystem to be sure you're getting decent performance. Check logical reads and CPU time before and after to determine if you're helping.

You could also probably drop some of your indexes to speed inserts. I'm not sure that 70% fill is doing much other than wasting space either, but I might be wrong about that.

Mark Brackett
i did what you suggested Mark and it seems to be working just fine. i am not sure how good of a suggestion Remus presented. what do you think?
+1  A: 
Nested Loops(Inner Join, OUTER REFERENCES:([c].[owner_id], [Expr1005]) WITH ORDERED PREFETCH)
       |--Sort(ORDER BY:([c].[added_date] DESC)) **[5%]**
       |    |--Nested Loops(Inner Join, OUTER REFERENCES:([c].[id], [Expr1004]) WITH UNORDERED PREFETCH) **[0%]** 
       |         |--Index Seek(OBJECT:([DB].[dbo].[profile_comment].[IX_profile_comment_combined1] AS [c]), SEEK:([c].[profile_id]=(1) AND [c].[logical_delete]=NULL) ORDERED FORWARD) **[1%]**
       |         |--Clustered Index Seek(OBJECT:([JakLeci].[dbo].[profile_comment].[PK__profile_comment__primary] AS [c]), SEEK:([c].[id]=[JakLeci].[dbo].[profile_comment].[id] as [c].[id]) LOOKUP ORDERED FORWARD) **[47%]**
       |--Clustered Index Seek(OBJECT:([DB].[dbo].[user].[PK__user__id] AS [u]), SEEK:([u].[id]=[DB].[dbo].[profile_comment].[owner_id] as [c].[owner_id]) ORDERED FORWARD)  **[47%]**

Here is how I read this plan: the query starts with a seek for profile_id = @profile_id and logical_deleted is null on the IX_Profile_comment_combined, it then does a nested join loop on the clustered index, it sorts the result by added_date and then it does a nested loop on the user.

One thing that you could quickly eliminate is the SORT, by changing the definition of the IX_profile_combined to be:

CREATE INDEX IX_profile_combined
 ON profile_comment(logical_deleted, profile_id, added_date)

Because logical_deleted is a very low selectivity column, it should be the leftmost key in the index. Because the output for a specific @profile_id has to be order by added_date then added_date has to be right of profile_id in the key order.

Other than the sort, the query plan looks good to me. But I'm rather curious how come a query that supposedly is the biggest hog in the system consumes only 1% on getting all the candidate row, then whoops to 93% time on two clustered index lookups and only 5% on the sort. This doesn't add up as symptoms of a problem query. Is @profile_id a very low selectivity key? Is the plan you posted collected from a non-representative run, one that performed well?

Remus Rusanu