views:

523

answers:

5

In many places it's recommended that clustered indexes are better utilized when used to select range of rows using BETWEEN statement. When I select joining by foreign key field in such a way that this clustered index is used, I guess, that clusterization should help too because range of rows is being selected even though they all have same clustered key value and BETWEEN is not used.

Considering that I care only about that one select with join and nothing else, am I wrong with my guess ?

A: 

The performance gains usually come if you are selecting data sequentially within the cluster. Also, it depends entirely on the size of the table (data) and the conditions in your between statement.

northpole
Does selection by foreign key column work sequentially ?
alpav
+2  A: 

An index on the FK column will help the JOIN because the index itself is ordered: clustered just means that the data on disk (leaf) is ordered rather then the B-tree.

If you change it to a covering index, then clustered vs non-clustered is irrelevant. What's important is to have a useful index.

gbn
+1 thanks for some well needed pragmatism! :-)
marc_s
@gbn: thank you for your most precise answer, is my understanding correct that you disagree with marc_s and mjv, you say irrelevant, they say relevant ?
alpav
I answered about Kk indexes: marc_s answered about clustered indexes in general (and commented on my answer too positively). mjv seems to say the same as me.
gbn
+1  A: 

It depends on the database implementation.

For SQL Server, a clustered index is a data structure where the data is stored as pages and there are B-Trees and are stored as a separate data structure. The reason you get fast performance, is that you can get to the start of the chain quickly and ranges are an easy linked list to follow.

Non-Clustered indexes is a data structure that contains pointers to the actual records and as such different concerns.

Refer to the documentation regarding Clustered Index Structures.

An index will not help in relation to a Foreign Key relationship, but it will help due to the concept of "covered" index. If your WHERE clause contains a constraint based upon the index. it will be able to generate the returned data set faster. That is where the performance comes from.

Digicoder
So my question was, will "start of the chain quickly and ranges are an easy linked list" be utilized in select with join by FK field ?
alpav
Not by the join, but by the WHERE clause. If both the FK and the covered index and the WHERE constraint are the same column, then yes.
Digicoder
So if I use FK column only inside join, but not inside WHERE then clustered vs non-clustered is irrelevant ?
alpav
Mostly...because if you put a constraint in the ON X.c1 = Y.c2 AND (X.c1 >= 30 AND X.c1 <= 40) the the optimizer may use the index.MJV's post below correctly sums up that its a case by case situation. The best choice is to use the profiler and observer. SQL Server also uses a query optimizer that may choose not to do things you think it should.
Digicoder
What if I just use ON X.c1 = Y.c2 ?
alpav
+3  A: 

Discussing this type of issue in the absolute isn't very useful.

It is always a case-by-case situation !

Essentially, access by way of a clustered index saves one indirection, period.

Assuming the key used in the JOIN, is that of the clustered index, in a single read [whether from an index seek or from a scan or partial scan, doesn't matter], you get the whole row (record).

One problem with clustered indexes, is that you only get one per table. Therefore you need to use it wisely. Indeed in some cases, it is even wiser not to use any clustered index at all because of INSERT overhead and fragmentation (depending on the key and the order of new keys etc.)

Sometimes one gets the equivalent benefits of a clustered index, with a covering index, i.e. a index with the desired key(s) sequence, followed by the column values we are interested in. Just like a clustered index, a covering index doesn't require the indirection to the underlying table. Indeed the covering index may be slightly more efficient than the clustered index, because it is smaller.
However, and also, just like clustered indexes, and aside from the storage overhead, there is a performance cost associated with any extra index, during INSERT (and DELETE or UPDATE) queries.

And, yes, as indicated in other answers, the "foreign-key-ness" of the key used for the clustered index, has absolutely no bearing on the the performance of the index. FKs are constraints aimed at easing the maintenance of the integrity of the database but the underlying fields (columns) are otherwise just like any other field in the table.

To make wise decisions about index structure, one needs

  • to understands the way the various index types (and the heap) work
    (and, BTW, this varies somewhat between SQL implementations)
  • to have a good image of the statistical profile of the database(s) at hand:
    which are the big tables, which are the relations, what's the average/maximum cardinality of relation, what's the typical growth rate of the database etc.
  • to have good insight regarding the way the database(s) is (are) going to be be used/queried

Then and only then, can one can make educated guesses about the interest [or lack thereof] to have a given clustered index.

mjv
Behavior changes. I've had one query that was optimized and worked one way in SQL Server 2000 and then it failed and worked differently in SQL Server 2005. Absolutes are not the case as it can be vendor specific.
Digicoder
@alpav: Assuming that your query effectively accesses at least one field in the JOINed table (as is likely the case), the short answer is "YES!, the fact that the clustered index is used in the query you mentions improves performance [as compared with a plain non-clustered index on just the PK]". The long answer is "There are many other factors; the query could possibly get yet bigger performance gains from other sources; this shouldn't be an endorsement of this or even clustered indexes in general;..." So, yes, your case was specific enough ...
mjv
...to warrant a _technically_ precise answer but this probably doesn't address the big picture. Also: this wasn't [completely] a complaint against your question, but a remark indicating that exposing a sufficient context in a brief question isn't easy to do, hence the semi-relevance of Q/A about SQL performance on this forum.
mjv
@mjv: yes, of course you did - very clearly and perfectly. Sorry, mea cupla - I didn't catch that part....
marc_s
@marc_s: NP, no mea culpa needed ;-) I often do a terrible job at explaining things (see my avatar photo). I also started deleting comment on this particular question to keep things neat (since all is covered in answer).
mjv
@mjv: love your avatar picture :-)
marc_s
+2  A: 

I would ask something else: would it be wise to put my clustered index on a foreign key column just to speed a single JOIN up? It probably helps, but..... at a price!

A clustered index makes a table faster, for every operation. YES! It does. See Kim Tripp's excellent The Clustered Index Debate continues for background info. She also mentions her main criteria for a clustered index:

  • narrow
  • static (never changes)
  • unique
  • if ever possible: ever increasing

INT IDENTITY fulfills this perfectly - GUID's do not. See GUID's as Primary Key for extensive background info.

Why narrow? Because the clustering key is added to each and every index page of each and every non-clustered index on the same table (in order to be able to actually look up the data row, if needed). You don't want to have VARCHAR(200) in your clustering key....

Why unique?? See above - the clustering key is the item and mechanism that SQL Server uses to uniquely find a data row. It has to be unique. If you pick a non-unique clustering key, SQL Server itself will add a 4-byte uniqueifier to your keys. Be careful of that!

So those are my criteria - put your clustering key on a narrow, stable, unique, hopefully ever-increasing column. If your foreign key column matches those - perfect!

However, I would not under any circumstances put my clustering key on a wide or even compound foreign key. Remember: the value(s) of the clustering key are being added to each and every non-clustered index entry on that table! If you have 10 non-clustered indices, 100'000 rows in your table - that's one million entries. It makes a huge difference whether that's a 4-byte integer, or a 200-byte VARCHAR - HUGE. And not just on disk - in server memory as well. Think very very carefully about what to make your clustered index!

SQL Server might need to add a uniquifier - making things even worse. If the values will ever change, SQL Server would have to do a lot of bookkeeping and updating all over the place.

So in short:

  • putting an index on your foreign keys is definitely a great idea - do it all the time!
  • I would be very very careful about making that a clustered index. First of all, you only get one clustered index, so which FK relationship are you going to pick? And don't put the clustering key on a wide and constantly changing column
marc_s
@marc_s: Is my understanding correct that you disagree with gbn, you say clusterization is relevant for joins and gbn says it's not ?
alpav
@alpav: I don't think gbn is saying that :-) But he points out very correctly: the most important point is having an index on your foreign key columns. Whether it's clustered or not is secondary.
marc_s
@marc_s: I know that and most of what is mentioned in all other answers, but I still didn't get clear answer on what I don't know - will clusterization improve speed of joins without BETWEEN, >=,<=. I was expecting yes/no answer, not education about clustered indexes and foreign keys, otherwise it would be duplicate of many other questions.
alpav
@alpav: I don't think anyone can say for sure, just based on SQL Server theory. There's just too many factors in play that we don't know about. The only real reliable way to find that out is : measure, change, measure again, compare. Sorry - SQL Server esp. performance tuning is hardly ever a "yes or no" scenario.....
marc_s
@alpav: if you want my gut feeling for "yes or no" - but that's all it is, no facts to back it up - I would have to say "no, making such an index a clustered index won't give you any benefits". But again: just a plain gut feeling, no data to back it up.
marc_s