ansaurus

Question

Answer 1

A:

I would make sure CreatedAt is indexed properly

Jim B 2009-12-02 19:28:12

It would be nice to benchmark the query with and without the `Order By CreatedAt` clause!

p.campbell 2009-12-02 19:30:14

An index solely on CreatedAt wouldn't likely help this type of query as well as a composite on the equality filter SourceUserId - an index solely on CreatedAt would simply allow the optimizer to scan backwards down a B-Tree doing a residual filter check on the SourceUserId until it found 100 with the specified value (if the SourceUserId is very selective, this would likely cause a fairly long scan). If you composite index on SourceUserId then CreatedAt, the optimizer can skip the scan, seek to the given SourceUserId and simply pull out the pre-ordered final 100.

chadhoc 2009-12-02 19:34:17

Answer 2

+12 A:

You'll likely want to create a composite index for this type of query - when the query runs slowly it is most likely choosing to scan down an index on the CreatedAt column and perform a residual filter on the SourceUserId value, when in reality what you want to happen is to jump directly to all records for a given SourceUserId ordered properly - to achieve this, you'll want to create a composite index primarily on SourceUserId (performing an equality check) and secondarily on CreateAt (to preserve the order within a given SourceUserId value). You may want to try adding the TypeId in as well, depending on the selectivity of this column.

So, the 2 that will most likely give the best repeatable performance (try them out and compare) would be:

Index on (SourceUserId, CreatedAt)
Index on (SourceUserId, TypeId, CreatedAt)

As always, there are also many other considerations to take into account with determining how/what/where to index, as Remus discusses in a separate answer one big consideration is covering the query vs. keeping lookups. Additionally you'll need to consider write volumes, possible fragmentation impact (if any), singleton lookups vs. large sequential scans, etc., etc.

chadhoc 2009-12-02 19:31:26

+1: and `covering index` in SQL Server terminology.

OMG Ponies 2009-12-02 19:35:25

Yeap, as with soo many SQL questions, the answer is not some magic SELECT trick but good old fashion index design.

Remus Rusanu 2009-12-02 19:41:13

Clustering this index and having a striped disk would help as well.

Mark Canlas 2009-12-02 19:45:14

thank you very much, that speeded up things significantly :) i should learn more about indexes :)

pablox 2009-12-02 19:46:11

Answer 3

A:

you could split the query in two with an UNION to avoid the OR (which can cause your index not to be used), something like

   SElect * FROM(
 SELECT TOP 100 * FROM Events WITH (NOLOCK)
WHERE SourceUserId = '15b534b17-5a5a-415a-9fc0-7565199c3461'
AND TypeId IN (2, 3, 4)
UNION  SELECT TOP 100 * FROM Events WITH (NOLOCK)
WHERE SourceUserId = '15b534b17-5a5a-415a-9fc0-7565199c3461' 
 AND TypeId = 60 AND SrcMemberId != DstMemberId
)
ORDER BY CreatedAt DESC

Also, check that the uniqueidentifier indexes are not CLUSTERED.

vladhorby 2009-12-02 19:32:18

Answer 4

+1 A:

I would recomend getting the data in 2 sep var tables

INSERT INTO @Table1
SELECT * FROM Events WITH (NOLOCK)
WHERE SourceUserId = '15b534b17-5a5a-415a-9fc0-7565199c3461'
AND 
(
 TypeId IN (2, 3, 4)
)
INSERT INTO @Table2
SELECT * FROM Events WITH (NOLOCK)
WHERE SourceUserId = '15b534b17-5a5a-415a-9fc0-7565199c3461'
AND 
(
 (TypeId = 60 AND SrcMemberId != DstMemberId)
)

then apply a unoin from the selects, ordered and top. Limit the data from the get go.

astander 2009-12-02 19:32:56

Answer 5

+1 A:

I suggest using a UNION:

SELECT TOP 100 x.*
  FROM (SELECT a.*
          FROM EVENTS a
         WHERE a.typeid IN (2, 3, 4)
        UNION ALL
        SELECT b.*
          FROM EVENTS b
         WHERE b.typeid = 60 
           AND b.srcmemberid != b.dstmemberid) x
 WHERE x.sourceuserid = '15b534b17-5a5a-415a-9fc0-7565199c3461'

OMG Ponies 2009-12-02 19:34:29

Answer 6

+4 A:

I have indexes on each column except MetaId

Non-covering indexes will likely hit the 'tipping point' and the query would revert to a table scan. Just adding an index on every column because it is used in a where clause does not equate good index design. To take your query for example, a good 100% covering index would be:

INDEX ON (SourceUserId , CreatedAt) INCLUDE (TypeId, SrcMemberId, DstMemberId)

Following index is also usefull, altough it still going to cause lookups:

INDEX ON (SourceUserId , CreatedAt) INCLUDE (TypeId)

and finaly an index w/o any included column may help, but is just as likely will be ignored (depends on the column statistics and cardinality estimates):

INDEX ON (SourceUserId , CreatedAt)

But a separate index on SourceUSerId and one on CreatedAt is basically useless for your query.

See Index Design Basics.

Remus Rusanu 2009-12-02 19:35:35

Thank you! That helped me :)

pablox 2009-12-02 19:46:45

Answer 7

A:

If there are 100K records added each day, you should check your index fragmentation. And rebuild or reorganize it accordingly. More info : SQLauthority

Leon 2009-12-02 19:39:22

yep i am doing this as well

pablox 2009-12-02 19:51:09

Answer 8

+4 A:

The fact that the table has indexes built on GUID values, indicates a possible series of problems that would affect performance:

High index fragmentation: since new GUIDs are generated randomly, the index cannot organize them in a sequential order and the nodes are spread unevenly.
High number of page splits: the size of a GUID (16 bytes) causes many page splits in the index, since there's a greater chance than a new value wont't fit in the remaining space available in a page.
Slow value comparison: comparing two GUIDs is a relatively slow operation because all 33 characters must be matched.

Here a couple of resources on how to investigate and resolve these problems:

Enrico Campidoglio 2009-12-02 20:03:45

Answer 9

A:

We've realised a minor gain by moving to a BIGINT IDENTITY key for our event table; by using that as a clustered primary key, we can cheat and use that for date ordering.

Keith Williams 2009-12-02 20:27:53

ansaurus

tags:

views:

answers:

SQL Server slow select from large table

related questions