ansaurus

Question

When does a database table get large enough that an index is beneficial?

Answer 1

A:

I believe as soon as you start doing joins on those int fields your table is big enough. If the table is small enough that it wouldn't benefit from an index then the overhead wouldn't be significant enough that you would want to opt out.

When I think about the overhead due to an index I usually consider how often the table index will be changing--through inserts, deletes and updates to indexed columns.

nvuono 2009-06-22 17:21:14

Indexes add overhead on SELECT statements too, not just INSERTs and UPDATEs.

Robert Harvey 2009-06-22 17:26:26

Answer 2

+1 A:

The index will nearly always increase the performance of the query, at the cost of extra memory and performance cost for insert/deletion (since it needs to maintain the index at that point). Profiling will be the only definite way to tell whether or not the index, in your particular case, is beneficial.

In general, you're trading memory for speed when you create an index (other than the additional cost of insertion). If you're doing many queries (selects or updates) relative to the number of inserted/deleted rows, indexes will pretty much always increase your performance.

Reed Copsey 2009-06-22 17:21:15

If the case is relatively standard (as in the example of two ints in a table), is the tip-over point relatively level, or will it vary as a function of the number of additional columns in the outer tables, and other unknown factors?

Robert Harvey 2009-06-22 17:25:36

The optimizer knows where the tipping point is, so you don't need to worry about it. If the loading penalty is significant because you're loading rows fast enough to notice, you'll reach that point in no time anyway.

le dorfier 2009-06-22 17:30:17

The indexing will speed up your queries, nearly across the board. The number of columns will make little difference, since it's a matter of locating the appropriate rows to return (which is somewhat separate from the number of columns). If you have very little data, the index may not help much, but it also is nearly free - I personally always index columns which I will be using frequently for any location queries (including updating rows [as long as you don't change the indexed column), since that will be much faster with an index, even in relatively small cases.

Reed Copsey 2009-06-22 17:32:46

Answer 3

+1 A:

it depends on the selectivity of your data, if your data is not selective enough then the index might not even be used since the cost would be too expensive. If you have only 2 values in the table and these values are evenly distributed then you will get a scan not a seek

I still believe every table should have a Primary Key, if you have that then you also have an index already

SQLMenace 2009-06-22 17:23:16

It is true that in my hypothetical (a many-to-many join) the outer tables would already have indexes.

Robert Harvey 2009-06-22 17:36:05

A two-column junction table? It would be highly unusual to have low selectivity on primary keys from two other tables.

le dorfier 2009-06-22 17:57:39

@Robert: what about your linking table? It should have a primary key as well - if nothing else, a composite primary key on the two foreign key columns. How does your current schema protect you from duplicate records?

GalacticCowboy 2009-06-22 17:57:47

I always put a primary key in my tables. In the scenario I have in mind, there should be no duplicate records. The join is created through a UI and some checkboxes, so if there are duplicate records, there is something wrong with the UI. It may sound like heresy, but I dislike composite keys, and until you mentioned the duplicate records scenario, I had never found a use for them. Something to think about.

Robert Harvey 2009-06-22 18:21:33

You could also do it with a composite unique or something, rather than the PK.

GalacticCowboy 2009-06-22 19:51:25

Robert, I'm not a fan of composite indexes but a two-column joining table is an exception. Usually these are two int columns that are already primary keys elsewhere. You need a way to ensure uniqueness of data in the table, so you need a unique index in any event on these two columns. Adding another primary key on a table like this is really wasted overhead, it is better to just make the composite key in this particular case.

HLGEM 2009-06-22 20:14:36

Answer 4

+1 A:

The penalty for insertion will be negligible until long after the benefit of the indexes will appear. The optimizer is smart enough to ignore the indexes anyway until that point clicks in. So just index the table from the start.

le dorfier 2009-06-22 17:23:19

Are you saying that SQL Server will create an execution plan using the indexes only if it determines that the index will provide a benefit?

Robert Harvey 2009-06-22 17:30:39

Right. There have been several questions in SO about why indexes aren't being used on small tables, and the answer has been "you don't have enough data yet."

le dorfier 2009-06-22 17:32:32

So my question morphs to, "At what number of records does the SQL Server Optimizer kick in the index?" And the answer is, "Don't Care?"

Robert Harvey 2009-06-22 17:45:38

I'm assuming the case you're giving, which is a junction table with the primary keys from two other tables. And in all but the most pathological cases (e.g. only one value in one of the columns for all rows), yes. One index on both columns (the most selective first) as your Primary Key, and another index on the other column only (if you are specificying both values, it'll use the first.) Or omit the second index if you're sure you'll only ever query on the first column, which is relatively rare.

le dorfier 2009-06-22 17:56:26

Answer 5

+1 A:

Another thing to think about is the concept of coding performance-- sometimes having an index can streamline the mental overhead of thinking about how to manage the relationship between different pieces of data. sometimes it can complicate it...

larson4 2009-06-22 17:24:09

One thing I didn't mention is that I use Linq to SQL, which seems to take field names and indexes as cues for what gets included in the model; specifically, foreign key joins get special preference.

Robert Harvey 2009-06-22 18:00:37

Answer 6

+9 A:

For the queries involving small portions of the table rows, indexes are always beneficial, be there 100 rows or 1,000,000.

See this entry in my blog for examples with plans and performance details:

Indexing tiny tables

The queries like this:

SELECT  *
FROM    table1 t1
JOIN    table2 t2
ON      t2.col = t1.col

will most probably use HASH JOIN. A hash table for the smaller table will be built, and the rows from the larger table will be used to probe the hash table.

To do this, no index is needed.

However, this query:

SELECT  *
FROM    table1 t1
JOIN    table2 t2
ON      t2.col = t1.col
WHERE   t1.othercol = @value

will use NESTED LOOPS: the rows from the outer table (table1) will be searched using an index on table1.othercol, and the rows from the inner table (table2) will be searched using an index on table2.col.

If you don't have an index on col1, a HASH JOIN will be used which requires scanning all rows from both tables and some more resources to built a hash table.

Indexes are also useful for the queries like this:

SELECT  t2.col
FROM    table1 t1
JOIN    table2 t2
ON      t2.col = t1.col

, in which case the engine doesn't need to read table2 itself at all: eveything you need for this query can be found in the index, which can be much smaller than the table itself and more efficient to read.

And, of course, if you need your data sorted and have indexes on both table1.col and table2.col, then the following query:

SELECT  *
FROM    table1 t1
JOIN    table2 t2
ON      t2.col = t1.col
ORDER BY
        t2.col

will probably use MERGE JOIN method, which is super fast if both input rowset are sorted, and its output is also sorted, which means that ORDER BY comes free.

Note that even if you don't have an index, an optimizer may choose to Eager Spool your small table, which means building a temporary index for the duration of the query and dropped the index after the query finishes.

If the query is small, it will be very fast, but again, an index won't hurt (for SELECT queries I mean). If the optimizer won't need it, it just will not be used.

Note, though, that creating an index may affect DML performance, but it's other story.

Quassnoi 2009-06-22 17:24:13

Actually, the database doesn's sort keys within a single page. So until it gets beyond that point there is no benefit. And probably for several pages beyond that.

le dorfier 2009-06-22 17:27:10

@Robert: they also have benefit when you use only the indexed columns in the query or when you need your data to be sorted. And no, they're not always a benefit on WHERE clause, only on very selective ones.

Quassnoi 2009-06-22 17:41:55

Thanks for the insight.

Robert Harvey 2009-06-22 18:01:08

Quassnoi, I saw your blog post. Just so you know, the final decision on indexing our database (based on additional information at this post:stackoverflow.com/questions/1033796/…) was to index all foreign keys EXCEPT those that participate in joins to lookup tables containing LESS THAN 10 RECORDS.

Robert Harvey 2009-06-25 04:02:21

We feel that, at a cardinality of 10, the query optimizer will more likely than not choose a table scan over the index, so the overhead of the additional indexes is unwarranted.

Robert Harvey 2009-06-25 04:03:50

@Robert: are you talking about indexing the referencing table (the one on which the FOREIGN KEY id defined) or a referenced table (the one mentioned in the REFERENCES clause)? You just cannot reference a table with a FOREIGN KEY unless it has a UNIQUE key defined. And an index on the referencing table is completely other story.

Quassnoi 2009-06-25 10:06:39

Answer 7

+1 A:

Regardless of size, there is always a performance benefit to using an index when doing a lookup.

Regarding overhead, the question becomes: what overhead do you mean, and how do you relate it to the value of a lookup? The two are separate values, after all.

There are two forms of overhead for an index: space (which is usually negligible, depending on how the index is structured), and re-index on insert (the server must recalculate an index after every insert).

As I mentioned, the space issue probably isn't that big a deal. But re-indexing is. Fortunately, you need to be doing lots of near-continuous inserting before that form of overhead becomes a problem.

So bottom line: You're almost always better off having an index. Start from that position and wait until re-indexing becomes a bottleneck. Then you can look into alternatives.

Randolpho 2009-06-22 17:25:29

Incorrect. Create a table with only one row, add an index, and see for yourself.

AlexKuznetsov 2009-06-22 17:39:41

Ok, by "regardless of size" I mean "for tables with a rowcount greater than 3". Better?

Randolpho 2009-06-22 17:47:21

Is the tipover point really three records? That doesn't seem likely.

Robert Harvey 2009-06-22 17:51:11

still not quite right. The following link is useful:link: "The Tipping Point Query Answers" http://www.sqlskills.com/BLOGS/KIMBERLY/post/The-Tipping-Point-Query-Answers.aspx

AlexKuznetsov 2009-06-22 18:56:30

@AlexKuznetsov: I don't see how the link applies. The article talks about the point at which a clustered index will be used in favor of an alternative (non-covering) index. A clustered index *is still an index* and therefore provides performance benefit; my reply stands. The original question clearly implies that the table is a heap table without either a clustered or FK index. Finally, keep in mind that the article even suggests that any index that can cover the query will not "tip over" to the clustered index.

Randolpho 2009-06-22 19:22:20

Randolpho, the main point of that article is to do benchmarks rather than to gues, and to think in terms of page reads. As long as all your data fits on one page, there is no need for a lookup - lookups need more than one page read.

AlexKuznetsov 2009-06-22 20:48:16

Answer 8

+1 A:

A very useful link: "The Tipping Point Query Answers" http://www.sqlskills.com/BLOGS/KIMBERLY/post/The-Tipping-Point-Query-Answers.aspx

AlexKuznetsov 2009-06-22 18:36:32

Thanks for that. It tells me that there is no fixed tipping point, that many people don't have good insight into this, and that it's a good thing we have query optimizers.

Robert Harvey 2009-06-22 18:50:37

Answer 9

+1 A:

The best thing is to let the server itself figure it out. You create index in the columns where it makes sense(Im sure there's entire chapters if not books on how to do this the best way), and let the SQL server figure out when/how to use the index.

In many cases, when optimizing, you'd need to read the docs of your particular DBMS to learn more how it uses indexes, and relate that to the queries the application you're optimizing uses. Then you can fine tune the index usage.

nos 2009-06-22 18:38:22

ansaurus

tags:

views:

answers:

When does a database table get large enough that an index is beneficial?

related questions