views:

287

answers:

3

I'm new to database indexing, if I have 2 columns in a table that are good choices for indexing like for example,

[Posts](    
   [PostID] [int] IDENTITY(1,1) NOT NULL,
   [UserName] [nvarchar](64) NOT NULL,
   [ApplicationType] [smallint] NOT NULL,
   ...
)

in this case PostID would be the PRIMARY KEY CLUSTERED index, then I want to do more indexing since it's a large table and I want to do on UserName and ApplicationType, now should I index each individually (one on UserName, one on ApplicationType) or index them as a whole (one index on UserName, ApplicationType together)? Is there a limit to the number of indexes I can have before making it bad practice? What generally is the rule of thumb on this?

Thanks,

Ray.

+2  A: 

The answer to this question really depends on how you are going to be searching on the table. If your searches will nearly always include both columns, then creating an index on both columns is appropriate. If you will be frequently searching on each field on its own, then creating separate indexes for each is appropriate. In the end, you could have all 3 indexes (one composite, 2 single column) - depending on how you're searching with the columns. Think of it like a phonebook - if you're always searching with last name & first name, you'll find what you're looking for. But if you wanted to search the phone book for everyone with the first name of Scott, you'd want a new index that wasn't (LName, FName). If you wanted to find everyone with a given last name, you are still able to do that with the multi-column index of (LName, FName).

Each database has its own limits on the number of indexes per table, number of columns per index, etc. They are generally high enough where if you're looking at 3 indexes here, you won't have to worry about them. Also, keep in mind that the more indexes you have, the more it costs to maintain them (inserts, updates, deletes, etc).

Scott Ivey
+1  A: 

IIRC, the rule of thumb is that an index can is used only for lookups that use all columns from some point and to the left. For example, an index on columns (a,b,c,d) could be used if you query on (a), (a,b), (a,b,c) or (a,b,c,d) but not on (a,c) for example.

This is a result of the way indexes are built; the left most column is indexed, then for each value of that column an index is created for the next column and so forth.


Edit: as BQ points out, the DBMS can scan over the full 'a' portion of the index and do a lookup into the 'b' part (I didn't known that was actual done). However this is not as fast as an index that can use the rule as described above (OTOH it can be faster than a full table scan).

Personally, I don't think that should be intentionally leveraged. If perf is enough of a concern for a given query that you are considering what indexes are needed, you might as well give it the right ones.

BCS
This is NOT true for most current (and recent) versions of most DBMSes. Often times if you're looking for something in that column 'c' it's faster to do a scan over the index rather than the full table. As always, analyze your query plan.
BQ
cool. nice to learn new stuff.
BCS
+7  A: 

Keep in mind the telephone-book rule for compound indexes: the phone book is effectively indexed by last-name, first-name. It's a compound index.

If you search for people named "Smith, John" then it's helpful that the first-name is part of the index. Once you find the entries with last-name "Smith" then you can find "John" quickly.

But if you need to search for everyone named "John," then the phone book's indexing is not helpful -- you have to search the whole book anyway.

So compound indexes are great if you're searching on the first column named in the index, and optionally the second, etc. But if your search skips the leftmost columns in the index, it's useless for that search.

Bill Karwin